The first thing I realized was Excel was just not powerful enough to handle the complexity and size of the task of data analysis. So I imported the information into an Access database. Access is capable of handling far more records than Excel. It is also far more powerful.
The very first thing I did was to examine how complete the records were. I want to look at annual averages. So I ran a query to order all the records by station and by year. Then I looked at how many readings each station had for each year. What I found were thousands of incomplete years. Some had only one month. Obviously, you can't compute an annual average for a year when there are not 12 months recorded. Considering how the temperatures vary in one year, a missing month can skew the average by quite a bit. It is certainly an inaccurate record which can not be used.
I also found dozens of records with more than 12 readings in one year. In fact I found as many as 99 monthly averages recorded for one station in one year. These are obviously duplicate records.
After I extracted from the original dataset only those years with complete records and eliminated all the duplicate records the number of stations dropped from over 4600 to 3127. In other words some 32% of the stations were eliminated because they consistent of incomplete or duplicate data. That is a really high casualty rate.
The final step was to create the program to generate the information I wanted to extract from the set of complete annual data. Based upon a start date and an end date, the program extracts every station with a complete record for each year in the date range, and then reports the annual average temperature of all stations for each year. Secondarily, the program also provides a count of stations.
Below is a screen grab of the program output for 1975 through 1980.
So, I now have a database tool which can almost instantly generate a record of temperatures from stations continuously reporting between any two years between 1880 and 2004. This is where my next graph comes in.
Do you see the problem here? Out of 3127 stations in the record only 2 contain a complete record from 1880 to 2004. Only 5 were continuously reporting from 1950 to 2004. That includes the original 2 by the way. There were only 44 stations reporting from 1980 to 2004. There were 380 stations reporting from 2000 to 2004. Yet, in 2004 there were 805 stations reporting.
So it appears the only usable data in the entire 120 MB's of original data is that from just two stations. The rest of it is too fragmentary, incomplete, or just does not cover enough time to be useful. Just two stations, one in Russia and one in Switzerland.
This is all they have. Unbelievable.
Just two more graphs and we will call it a day. I think these are pretty self explanatory.
Below is a graph showing the high and low annual averages for each year from 1900 to 2004. You will notice only the lowest reading vary, and they vary hugely. You are looking at temperatures in the -55° C (-67° F) range. That would be Antarctica. You are seeing the effects of 12 stations running from 1953 to 1994 for periods ranging from 42 years to 1 year. Do you think having 2, 3 or 12 annual averages at such an extreme might have some noticeable affect on the "global average"? This is an extreme example of how ridiculous this entire business truly is.
Here is the record of stations reporting by year from 1900 through 2004. Enough said, don't you think?