Thursday, March 29, 2018

GHCN Part 3: Creating a Temperature Model


This is part three of a series on Global Warming using records from the Global Historical Climatology Network. In my first post of this series I discussed my data source, my means of selecting which records to use, and the time frame of study. I used only station records which were complete for the time frame of study. I had determined not to impute or estimate any data.

The previous post of this series described the results from that data in terms of number of days above   85°F, 90°F, 95°F, and 100°F as well as the number of days where the daily highs did not exceed 32°F, 20°F, and 10°F. As we saw, for the locations involved, the number of warmer days has consistently fluctuated up and down at apparently regular intervals, but have generally decreased since the 1930’s. The number of colder days has consistently fluctuated up and down in a repeated pattern. For the number of days at or below freezing the data indicate the years of 1900 to 1930 and 1980 to 2003 are nearly identical. There is evidence to suggest that pattern began to repeat again in 2005 to 2010.

As I explained in the first post of the series there are certain limitations to this study which bear repeating here. The long term data from the GHCN daily max min tables is very limited. Most of the coverage is in the lower 48 states of the US. Canada, Europe, Central Asia, and Australia are all represented but to significantly lesser degrees. Africa, Central America, and South America are not covered.

With those limitations in mind, let me describe in general terms the process I used to develop the data into useful information.

Goal defined

When performing an analysis for this type of data the goal is to develop a model which accurately describes the data and then determine a means of applying that model to other, similar situations which fit the definitions of the model. This is a process which involves creating a model, testing the model against known data, evaluating the results, and adjusting the model accordingly. An accurate model will be able to perform accurate predictions for known data within acceptable levels of data variation. A model which cannot make accurate predictions against known data is flawed and therefore would not be useful.

Creating the Model

My first pass approximation for such a model was the simplest model available, which is a raw average of all the data. I chose to test this model by comparison to selected samples of individual station data. Without going into details, let me just say this initial model failed. For example, the model failed to describe the general time series trend of individual stations. Meaning where things started off and where they ended up. That failure informed my method for refining the model.

I refined my data model to address that failure by creating a new data set consisting of beginning and ending temperatures for each station. I analyzed that data by calculating a temperature change delta for each station and performing statistical analysis on that data set.


I found quantifying stations by the overall start to finish temperature change, the temperature delta, produced a near normal data set. Using this as a starting point I refined my model in to three separate models. One model covers the -1° to 1° range which contains 75% of all stations. The second model covers the 1° to 4° range which contains 17% of the stations. The last model covers the -1° to -2.5° range which contains the last 8% of the stations.

As before, I tested the data models against actual station data with model selection based upon the temperature delta parameter. Again, the models failed to accurately reflect all the data. Quantifying that failure was easy as all failures occurred in Australia. Separating Australia as a separate data set and performing the same analysis as above, I created two additional models. The number of models is now three for the northern hemisphere and two for Australia. A total of 5 models.

These five models, based upon two selection parameters, are accurate within ± 1° for over 75% of the individual stations and within ±1.5° for the remainder. This is an acceptable degree of error in my opinion.

Refining and Utilizing the Model

One of the primary reasons for creating a model, beyond defining what has gone before, is to act as a predictive tool. Having defined models which describe what is known to have happened it now becomes necessary to try and define and quantify those factors which affected what happened. Therefore, it is helpful to have five different models and a wide range of results. These five models can be further reduced into three essential models based upon the over all results: Temperatures rose, temperatures fell, or temperatures did not change appreciably. Those are distinctly different outcomes.

Each station in this data set has been affected by certain factors. I will define those factors as local, regional, or global. Local factors contribute to unique site results. Regional factors contribute to site results over a certain area. The scope of regional factors may vary quite a bit. Global factors would contribute to outcomes all over the world.

The process going forward is defining and quantifying local factors, regional factors, and then global factors. In the process of doing so the various models become refined to include those parameters. Ideally, they will be combined into one or two models. The process of model redefinition would as always include testing the models for descriptive and predictive accuracy.

The Example of Australia

Australia is an interesting case study for this method. There are only 10 stations with usable data. However, this data extends back to 1895. Australia not only has a distinctive regional difference, there are distinctive local differences. Australia is also a mostly sparsely populated place. One factor which became apparent immediately was population density. Given the time frame involved, it is reasonable to assume the initial population density is essentially zero. Predicting which model applies to a site by current population density proved 100% accurate. There are sites which are geographically close but far apart by population totals. The magnitude of temperature increase over the period 1950 to 2005 between these proximate sites was as much as 3.5° higher for the more heavily populated site.

When you consider the generally accepted figure for temperature rise over the past century due to CO2 is 1.5°, a 3.5° temperature increase differential over 55 years due to a population increase differential would be significant. Accepting those numbers as reasonably accurate and assuming 1.5° as the result of Global factors, and assuming linearity, the inference is the localized factor of population growth has a greater influence on local temperatures than global factors by several orders of magnitude.

Moving Forward

Understand, this process is far from complete. I am presenting this information on what I am working on essentially in real time as the process progresses. I am letting the data lead me and not the other way around. I may end up somewhere totally unexpected. Even so, I think it is worthwhile to make these posts. I would certainly welcome constructive input.

 

Next post: The five models.

No comments:

Post a Comment