Help with regime shift detection software

Sequential Regime Shift Detection Software

The software is designed to automatically detect statistically significant shifts in the mean level and the magnitude of fluctuations in time series. It is written in Visual Basic for Application (VBA) for the Excel 2002 environment. Although not tested, it will probably work for other versions of Excel too.

Installation
Data Preparation
Shifts in the Mean
Handling Outliers
Red Noise Estimation
Shifts in the Variance
Version History
Help on Help
Feedback
References

Installation

The downloaded file - "Shift detection.xla" - is an Excel add-in. It can be placed in any folder of your choice and can be run from there. For more convenience, you can add it to the existing list of add-ins. In Excel, go to Tools --> Add-Ins... to open a list of existing add-ins. Click Browse to navigate to the folder where you saved "Shift detection.xla." Choose the file and click OK. Now, when you have "Shift detection" in the list, you can check (uncheck) the associated box to enable (disable) it. To remove it permanently, uncheck the box and delete the file "Shift detection.xla" from its original folder. When you go Tools --> Add-Ins and click Browse, you will receive the message "Cannot find add-in [path]/Shift detection.xla. Delete from list?" Click Yes to finish. After the installation you should see the Shifts menu at the end of the main worksheet menu bar as in Fig. 1 below.

Fig. 1. The Shifts menu after the installation.

Data Preparation

Open a new workbook or the one that contains your data. The time series are organized by columns. The first column is always the time of observations, e.g., years. The first row must always contain labels (names of time series). These labels will be used to name new output worksheets. The time series in the data matrix can start and end at different years, but no missing data within an individual time series is allowed.

If you want to experiment with normally distributed random numbers, click Shifts --> Random Data to populate the spreadsheet. Maximum number of time series is 250.

Shifts in the Mean

Click Shifts --> Shift Detection to open the entry form as in Fig. 2. The entire data range is automatically selected. You can select your own data range by clicking the button with underscore.

Fig. 2. Entry form for calculating regime shifts.

There are two parameters that control the magnitude and scale of the regimes to be detected. The significance level is the level at which the null hypothesis that the mean values of the two regimes are equal is rejected by the two-tailed Student t-test. The lower the significance level, the larger the magnitude of the shift should be in order to be detected. It is important to note that if the regime shift is detected, the difference between the mean values of the old and new regimes is statistically significant at least at the given level. The program also calculates the actual significance level between the two regimes.

The cut-off length is similar to the 100% cut-off point in filtering. The regimes that are longer than the cut-off length will all be detected. If the regimes are shorter than the cut-off length, the probability for them to be detected reduces proportionally to their length. Some of them, however, may still be selected if the magnitude of the shift is significant enough. Generally speaking, the shorter the cut-off length, the shorter the regimes that will be selected (and vice versa), but it's not always true. The reason is that the cut-off length also affects the critical magnitude of the shift between the regimes to be detected. For example, the difference of two units between the mean values of two regimes is statistically significant at the 0.01 level if the cut-off length is 10 years. But if the cut-off length is reduced to 5 years, the critical magnitude of the shift increases (for the same significance level), and the regimes may not be selected. It is recommended to experiment first with different significance levels and cut-off lengths to better understand their mutual effect on regime detection.

The program also requires the Huber's weight parameter that controls the weights assigned to the outliers (see below for more information). Therefore this parameter affects the average value of the regimes .

For each time series, the program calculates the regime shift index (RSI), the mean value of the regimes with equal and unequal weights, regime length, final confidence levels for the shifts and the weights of the outliers. This information for each variable is placed in a separate worksheet along with the corresponding graphs. The program also calculates the combined RSI ("Summary" worksheet) and residuals after the stepwise regime function is removed ("Residuals" worksheet). You can apply the method again to the residual worksheet, if you wish, but it has to be renamed first if the output is placed in the same workbook. The residuals can also be used to check for regime shift in the variance (no need to rename the worksheet in this case).

Handling Outliers

Due to outliers, the average is not representative for the mean value of the regimes, and this may significantly affect the results of the regime shift detection. Ideally the weight for the data value should be chosen such that it is small if that value is considered as an outlier. To handle the outliers, the program uses the Huber's weight function (Huber, 2005), which is calculated here as

weight = min (1, parameter/(|anomaly|)),

where anomaly is the deviation from the expected mean value of the new regime normalized by the standard deviation averaged for all consecutive sections of the cut-off length in the series. If anomalies are less than or equal to the value of the parameter then their weights are equal to one. Otherwise, the weights are inversely proportional to the distance from the expected mean value of the new regime.

After the timing of the regime shifts is determined, the mean values of the regimes are determined using the following iterative procedure. First, a simple unweighed arithmetic mean is calculated as the initial estimate of the mean value of the regime. Then a weighed mean is calculated with the weights determined by the distance from that first estimate. The procedure is repeated one more time with the new estimate of the regime mean.

Figure 3 below illustrates the effect of the outliers on the timing of regime shifts in mean winter (DJF) temperature in central England for the period 1900-1933. The top graph shows that if the Huber's weight parameter is set to 6 (i.e., all temperature values that are less than six standard deviations have equal weights), a regime shift is detected 1920. The temperature value for 1917, however, appears to be an outlier. Reducing the Huber's weight parameter to 1 (bottom graph) changes the regime shift year to 1911.

Fig. 3. The results of the regime shift detection for the winter (DJF) surface air temperature in central England using two different Huber's weight parameters: 6 (top panel) and 1 (bottom panel). Note changes in the onset and termination of the second regime.

Red Noise Estimation

This is a new set of procedures implemented in version 3 to handle the time series where red noise (or serial correlation) is present. Red noise is modeled by the first order autoregressive model (AR1). If serial correlation is present in the time series, it is necessary to either recalculate the significance level of the shifts by taking this into account or use a so-called "prewhitening" procedure prior to application of a regime shift detection method. In any case, it requires an estimate of the AR1 coefficient, which can be really tricky to obtain for the time series containing both red noise and regime shifts. Two methods of estimating AR1 have been implemented here. The first method, MPK, is based on the formula for the bias in the ordinary least squares (OLS) estimate of AR1 suggested by Marriott and Pope (1954) and Kendall (1954). The second method, IP4 (Inverse Proportionality with 4 corrections), is based on the assumption that the bias is approximately inversely proportional to the size of the sample. Both methods are described by Rodionov (2006) in a paper recently submitted to the Geophysical Research Letters.

Figure 4 schematically explains the options available in this section. If "None" is chosen, then no AR1 estimation is performed. The results are the same as in the previous version of the program. All other options require the AR1 estimate. Note that the OLS estimate is calculated using the entire time series. The MPK and IP4 methods break the time series into subsamples, estimate bias corrected AR1 for each subsample and then use the median value of all estimates. The suggested subsample size m is calculated as m = (l + 1)/3, where l is the cutoff length. It is recommended to experiment with different subsample sizes to see how it affects the AR1 estimate.

If the Prewhitening box is unchecked, the AR1 estimate is used to calculate the adjusted degrees of freedom (DF_adj) for the RSI: DF_adj = 2l_eq - 2, where l_eq is the equivalent cutoff length, calculated using the formula in Von Storch and Zwiers (1999, p. 115) for the equivalent sample size. This formula is also used to calculate the final significance level for the shifts adjusted for serial correlation.

If the Prewhitening box is checked, the regime shifts are detected for the filtered time series. No adjustments for serial correlation are used when calculating the DF. The user has an option to have the filtered time series as the output (the Filtered box checked), or return to the original time series (the Filtered box unchecked). In the latter case, the final significance level is adjusted using the equivalent sample size formula.

Fig. 4. Schematic for the red noise estimation options in the program.

Shifts in the Variance

It is assumed that all regime shifts in the mean are removed, that is, the values of the time series are deviations from the zero mean. The same two parameters, the significance level and the cut-off length, control the magnitude of the shifts and the length of the regimes to be detected. The Huber's weight function is not used for the variance and the red noise options are disabled.

Version History

12/03/2004: Ver. 1 released.

06/07/2005: Ver. 2 released. The Huber's weight function is added to handle the outliers.

01/30/2006: Ver. 3 released. The red noise estimation is added.

Help on Help

Clicking the help button will bring this file located on the Bering Climate website.

Feedback

Send your comments/suggestions/bug reports to Sergei Rodionov.

References

Huber, P. J. (2005), Robust estimation of a location parameter, Annals Mathematical Statistics, 35, 73-101

Kendall, M. G. (1954), Note on bias in the estimation of autocorrelation, Biometrika, 41, 403-404.

Marriott, F. H. C., and J. A. Pope (1954), Bias in the estimation of autocorrelations, Biometrika, 41, 390-402.

Rodionov, S. (2006), The problem of red noise in climate regime shift detection, Geophys. Res. Lett.(submitted).

Von Storch, H., and F. W. Zwiers (1999), Statistical analysis in climate research, Cambridge University Press.