Updated Method of Predicting Temperature Anomalies
Posted by The Diatribe Guy on April 24, 2008
I have continued to tweak my predictive model that is based on the trend lines and how those trend lines change over time. I’ve decided to start my explanation from scratch so that everyone can understand what it is I’m doing. I suppose I could keep it a secret and potentially make millions, but in all likelihood, that would never happen, and just consider this my contribution to mankind.
The data set that I am now using is the NASA GISS Global Temperature Data that combines Land and Ocean temperatures. I’ve noted that some people have issues with the GISS data, and that there are other major temperature measurement sources available. At some point I may well extend my analysis to these other areas, but I wanted to have the model refined before moving on. Since I just do this in my spare time, I don’t have the time or energy to move at a faster pace.
The first step is to take all the month by month anomalies and order them by month. In my spreadsheet, then, the data starts with January 1880 through current, all in a single column. The entire column needs to be updated each month because the GISS incorporates smoothing techniques to estimate information that can impact the data even back to the earlier time periods. This really makes little sense, if you ask me, but it’s the reality of the data.
Not directly used in my analysis, I look at the trends in this data in a few simple ways. I determine the slope of the trend line from all the different starting points along the way, as a demonstration of how the answer differs based on starting point. I have touched on this in the past. I then translate the slope into the indicated temperature change per Century to put it in context. But this is just informational.
The predictive analysis separately projects future anomalies using 60, 120, 180, 240, 300, and 360 month trend lines. The process is the same, so I will use the 60-month example to describe the approach, and then from the process is the same for the others.
Calculate 60-data point rolling slopes from the anomaly data. The first data point is produced by calculating the slope from January 1880 – December 1885. The second data point is from February 1880 – January 1886. And so on… A rolling slope calculation is performed through the most recent 60-month period. It is clear that the slopes from period to period seldom stay the same. So, whatever you want to say about long-term slope trends, the trend line is not necessarily a great predictive measure of the next subsequent, anomaly or anomalies. Typically, the slope calculations themselves are trending in one direction or another. I have shown this “trend of the trend” analysis in previous posts.
There are a number of ways I could have used the trend of slopes to develop a predictive model. And, in fact, I tried a number of things, even going so far as to trending the trend of the trend, and then trending that, and so on until I had oodles of trend line slopes. The theory being that I could work backwards into a good prediction of the next slope change, and thus determine the anomaly that would produce that. But alas, the method had many complications that were difficult to overcome and produced unsatisfactory results. One difficulty was knowing at what point to start the most recent trend line. It was an arbitrary selection, meant to represent the current trend, but this arbitrary selection had the flaw of projecting a current trend out further than natural cycles should allow. After much experimentation with this approach, I abandoned it altogether.
After a few other model generations, I settled on tracking the latest 12-month trend of slopes on a rolling basis. The 12-month period is arbitrary. One could do the same with a longer period, though a shorter period is probably not recommended. Even thought the length of the period is arbitrary, I felt that maintaining a fixed trend period removes the subjective element of when the latest peak/trough began that perhaps gives undue weight to the most recent trend. I wanted a long enough period to use a number of data points, but I also wanted it to be short enough to be responsive to recent changes. Conceptually, I do not think it matters all that much what this time period actually is, though I have not run it through sensitivity testing because it would require a lot of work. That may be a future task.
So, the 12-month trend of the slopes will change from period to period, and taking the difference indicates whether or not the slope is increasing or decreasing, and so that is calculated. I then take the second difference to determine whether or not the rate of change is increasing or decreasing from point to point. That is simple enough, but it provides some interesting insight into the way the slope changes over time.
A positive second difference either means the trend of warming is accelerating or that a trend of cooling is decelerating, and vice-versa. Consecutive positives/negatives indicate a continuing acceleration or deceleration in that direction. I was curious to see how this acceleration/deceleration plays out over time. Remarkably, the negatives and positives are very evenly split in all the different trend line categories, and the overall average length of the run of consecutive positives/negatives are also very similar.
For example, looking at the results of the 60-month trend line, the longest historical consecutive run of negative second differences is 11, and the longest run of positive second differences is also 11. The average run of negatives is 2.83, and the average run of positives is 2.73. The overall distribution is also similar. It is not unusual at all to have a run of one month, and it is in fact the largest category. I won’t provide the details on those stats for every series of trend lines, but every average was between 2.5 and 3, the distributions are all similar, and the split is at or near 50% each way.
Now, one reason I went through that exercise is to assist in looking at predicted anomalies from a reasonableness standpoint. Anomalies based on a predicted run of positive or negative second differences that exceeds the maximum historical run may not be impossible, but is probably not the most realistic scenario. So, that will be used as a reasonableness check in the values.
The other metric I reviewed was the maximum and minimum value of the second differences for each category. This value can be quite different from one trend line to another. This is because longer trend periods will not change as rapidly as a shorter one. So each trend period has its own maximum/minimum value. This is also used as a reasonableness test. So, if projected second difference changes do not exceed a certain number of consecutive negatives/positives, and are within the bounds of the maximum and minimum values, they are considered a reasonable projection.
Now, here is the geeky part of the methodology. As an actuary, we study a number of different methods for projecting future values. A paper written by Howard Mahler has become famous in the Casualty Actuarial Society as an analysis of data with changing parameters (i.e. means and/or variance changes over time, or here that the slopes change over time). One reason this paper is famous is because in an otherwise mundane and boring sea of actuarial papers that we are required to study, this paper used baseball won/loss records to demonstrate the procedure. That alone is worthy of a gold star.
Anyway, the basic premise of the paper is to indicate how credibility weighting past history can help indicate future outcomes in a data set where the parameters are continually changing. I’ll forego explanation of the tests to determine whether or not the parameters are changing, but it is sufficient to say that temperature data passes the test with flying colors. However, unlike the baseball data, where it was evident that only the most recent years are relevant, temperature data has some very high positive and negative correlations in an uneven manner. For example, the second difference 5 months prior to the current month may have little to no correlation, but the 12th previous month’s second difference may be very highly correlated. In fact, the 24th previous month shows a fairly high negative correlation.
Basically, the simplest way to explain it is that I assigned percentage weights to previous second differences such that the result produces the Minimum Least Squares result against the actual second difference. The general trend is similar across different trend lengths, but it is not universal in either magnitude nor in the months where the highest positive/negative correlations exist. The first previous month always has a significant positive correlation, and the 12th previous month has the highest positive correlation. The 11th, 13th, and 24th previous months have a high negative correlation. This is fairly consistent. The other months (I have weights back to 28 months previous currently) vary from insignificant correlation to somewhat significant correlation, and the months previous aren’t always consistent in this area. I actually want to continue to go back further – all the way to 11 years previous – to see if there are any cycles that need to get considered in the weighting. I’ll get there eventually, but it’s a time-consuming process.
I did not limit the weights to either positive nor negative numbers, and while I did not limit any values in the range of -1 to +1, all values did fall in that range. I have not implemented a constraint that all factors add to one. This implies that any remaining weight is weighted against a second difference value of 0. Given the earlier results I provided, such an assumption seems reasonable.
Once the optimum weights are determined, then, they can be used to apply against the most recent 28 second differences to produce the anticipated second difference in the subsequent period. Through recursive means, then, the anomaly can be determined that produces a 60-month slope that produces the 12-month trend of slopes, that produces the first difference in the 12-month trend of slopes, that produces the projected second difference. It is that anomaly value that is the predicted anomaly for the next period.
I have further projected the subsequent n anomalies under the assumption that the next predicted value comes to fruition. The one drawback in this is that the Minimum Least Squares determination was driven by the first predicted second difference, and not the totality of the predictions in future months. This would have been a daunting task, and it seems like a reasonable shortcut to suggest that if the data is predicting an April anomaly of X, that the value X is the best value to be used in weighting for the determination of the subsequent value to that, and so on.
I was hoping to delve into the results of the analysis, but time won’t allow for that at the moment. But I felt it was important to outline the methodology since it is a departure from my previous method. I will provide the actual predicted anomalies based on the various trend line data in my next post.