Gas Price regression... This is based on data file GasolineMarket.mpj. This is certainly a time series. We can see very strong patterns in the correlation matrix. This comes out in this form... Correlations: Expenditure, Population, GasPrice, Income, P_NewCars,... Expenditure Population GasPrice Income Population.9 GasPrice.97.927 Income.96.993.934 P_NewCars.942.9.936.96 P_UsedCars.936.946.923.93 P_PublicTrans.966.96.927.964 P_Durables.92.94.939.949 P_Nondurables.979.97.963.97 P_Services.977.96.939.97 P_NewCars P_UsedCars P_PublicTrans P_Durables P_UsedCars.994 P_PublicTrans.9.92 P_Durables.993.9.9 P_Nondurables.99.92.99.977 P_Services.97.977.99.96 P_Nondurables P_Services.994 Cell Contents: Pearson correlation It looks nicer if we re-organize the layout: Expend- Gas P_New P_Used P_Public P_ P_Noniture Popn Price Income Cars Cars Trans Durables durables Population.9 GasPrice.97.927 Income.96.993.934 P_NewCars.942.9.936.96 P_UsedCars.936.946.923.93.994 P_PublicTrans.966.96.927.964.9.92 P_Durables.92.94.939.949.993.9.9 P_Nondurables.979.97.963.97.99.92.99.977 P_Services.977.96.939.97.97.977.99.96.994 The problem is that everything is moving forward in time together. So what explains GasPrice?
Let s try the run on just P_NewCars, P_UsedCars, Population. Regression Analysis: GasPrice versus P_NewCars, P_UsedCars, Population The regression equation is GasPrice = -.2 +.4 P_NewCars -.42 P_UsedCars +.326 Population Predictor Coef SE Coef T P VIF Constant -.9 2. -3.4. P_NewCars.364.36 2.7.6 7.64 P_UsedCars -.423.24 -.66.4 2.269 Population.3263.23 2.7.9.23 S =.223 R-Sq = 9.7% R-Sq(adj) = 9.% Analysis of Variance Source DF SS MS F P Regression 3 4346 447 3.9. Error 4 6 4 Total 4467 Source DF Seq SS P_NewCars 42466 P_UsedCars 226 Population 76 Unusual Observations Obs P_NewCars GasPrice Fit SE Fit St Resid 29 94 4.2 9.9 2.76 24.43 2.4R 3 97 79.77 9.3.62 2.64 2.R 32 3 76. 6. 3.9 9.9 2.R 46 4 7.7 92.3 2.49-2.44-2.6R 2 34 23.9 9.36 4.24 2.4 2.7R R denotes an observation with a large standardized residual. Durbin-Watson statistic =.4399 With no critical thought, this looks great! But... here are facts about the residuals: 2
99 Normal Probability Plot Plots for GasPrice Year Versus Fits Percent 9 2 - -3-3 -2 - - Fitted Value Histogram Versus Order Frequency 2 9 6 3 2 - -2-2 -2 2 2 3 3 Observation Order 4 4 The plot in sequence order is a clear indication that the residuals have some type of time-based dependence. Moreover, the Durbin-Watson statistic is very small. As a side note, we ll record the residuals and then get the autocorrelation function plot. Here s what that plot looks like: Autocorrelation...6.4.2. -.2 -.4 -.6 -. -. Autocorrelation Function for RESI (with % significance limits for the autocorrelations) 2 3 4 Lag 6 7 9 3
And here are the autocorrelations: Lag ACF T LBQ.727.7 27.2 2.399 2. 3.7 3.2994.39 4.67 4.6749.74 42.2 -.79 -.32 42.49 6 -.79 -.4 44.6 7 -.7926 -. 46.6 -.96 -.7 49.6 9 -.24262 -.6 2.9 -.2694 -.2 7.49 This is a common situation. We note that the first autocorrelation is large (.7) and statistically significant. Correction Attempt. Use time itself as a predictor. Regression Analysis: GasPrice versus P_NewCars, P_UsedCars,... The regression equation is GasPrice = - 26 +.937 P_NewCars -.3 P_UsedCars -.7 Population +. Year Predictor Coef SE Coef T P VIF Constant -26 3434 -.63.3 P_NewCars.936.39 2.3.23.7 P_UsedCars -.33.264 -.44. 7.74 Population -.699.663 -..97 3.27 Year.2.4.6.47 364.94 S =.2 R-Sq = 9.% R-Sq(adj) =.9% Analysis of Variance Source DF SS MS F P Regression 4 43 7 2.9. Error 47 4967 6 Total 4467 Source DF Seq SS P_NewCars 42466 P_UsedCars 226 Population 76 Year 39 Unusual Observations Obs P_NewCars GasPrice Fit SE Fit St Resid 29 94 4.2 9.6 2. 24.6 2.44R 32 3 76. 7.67 4.69.33 2.R 2 34 23.9 96.9 4.9 27.2 2.99R R denotes an observation with a large standardized residual. Durbin-Watson statistic =.466 This has failed. The Durbin-Watson statistic is very small. Plots involving the residuals are bad also, but they are not shown here. 4
Correction Attempt 2: Use the differenced data. The dependent variable and all the independent variables should be differenced. In Minitab, use Stat Time Series Differences. This will reduce the sample size by. The plots look much better. Plots for GasPriceDiff 99 Normal Probability Plot 2 Versus Fits Percent 9-2 - 2 - -2..2 2.4 Fitted Value 3.6 4. 3 Histogram 2 Versus Order Frequency 2 - -2-2 -2 2 2 3 3 Observation Order 4 4 The Durbin-Watson statistic is.46699, which is at the low end of borderline values.
Correction attempt 3: Use the lagged version of the dependent variable. In Minitab, use Stat Time Series Lag. Again, this will drop the sample size by. Regression Analysis: GasPrice versus P_NewCars, P_UsedCars,... The regression equation is GasPrice = - 49.3 +.497 P_NewCars -.43 P_UsedCars +.27 Population +.9 GasPriceLag cases used, cases contain missing values Predictor Coef SE Coef T P VIF Constant -49.2 4. -3.2. P_NewCars.4966.224 2.2.32 92.46 P_UsedCars -.4297.3-2.79. 2.2 Population.2673.7 2.6..292 GasPriceLag.997.964 9.26..66 S = 6.237 R-Sq = 96.3% R-Sq(adj) = 96.% Analysis of Variance Source DF SS MS F P Regression 4 43 37 32.96. Error 46 72 3 Total 4724 Source DF Seq SS P_NewCars 422 P_UsedCars 2 Population 2 GasPriceLag 329 Unusual Observations Obs P_NewCars GasPrice Fit SE Fit St Resid 2 7.9 63.34 2.723 2.64 2.22R 34 6.7 76.3.72-6.663-2.R 46 4 7.74 6.463.62-4.9-2.47R 4 4.. 2.3.92 3.29R R denotes an observation with a large standardized residual. Durbin-Watson statistic =.6793 This is not perfect either, but the Durbin-Watson statistic has crossed, just barely, into the zone at which we can accept ρ =. 6
Here are the relevant plots: Plots for GasPrice 99 Normal Probability Plot 2 Versus Fits Percent 9 - -2-2 -2 2 7 Fitted Value 2 Histogram 2 Versus Order Frequency - -6-6 -2 2 2 3 3 Observation Order 4 4 This is tough to live with, but we could do it. Note that the coefficient on GasPriceLag is.9, close to. The fitted equation was GasPrice = - 49.3 +.497 P_NewCars -.43 P_UsedCars +.27 Population +.9 GasPriceLag This can be rearranged as GasPrice -.9 GasPriceLag = - 49.3 +.497 P_NewCars -.43 P_UsedCars +.27 Population -.9 GasPriceLag The left side is almost the same as GasPriceDiff. An objection to using the lagged variable on the right side of the equation is that we are mixing up dependent and independent variables. 7
Correction Attempt 4: Use the Cochrane-Orcutt method. There are several variations on this method. The essence of the concept is estimating the * autocorrelation coefficient ρ and then computing Y i = Y ˆ i ρ Yi and doing the same thing for each independent variable. There are several ways to get ˆρ. Start with the initial regression, which is where we were on pages -3. In the printout from the autocorrelation, we found that the first autocorrelation was DW computed as.727, and we can call this ˆρ. Some like to use ˆρ = ; here 2.4399 this is =.7732. These are not all that far apart. Let s use 2 ˆρ =.7. In Minitab, we will need to create the lagged variable, and then use Calc Calculator to perform (original).7 (lagged). Here are the results: Regression Analysis: GasPriceAdj versus NewCarAdj, UsedCarAdj, PopnAdj The regression equation is GasPriceAdj = - 66.3 +.42 NewCarAdj -.76 UsedCarAdj +.2 PopnAdj cases used, cases contain missing values Predictor Coef SE Coef T P VIF Constant -66.3 3.6-4.6. NewCarAdj.42.32 2.63..22 UsedCarAdj -.76.43 -.9.24 6.977 PopnAdj.79.23.6. 6.36 S = 6.346 R-Sq = 7.7% R-Sq(adj) = 69.9% Analysis of Variance Source DF SS MS F P Regression 3 49.6 66. 39.7. Error 47 97.9 4.4 Total 677. Source DF Seq SS NewCarAdj 934.6 UsedCarAdj 22.2 PopnAdj 32. Unusual Observations Obs NewCarAdj GasPriceAdj Fit SE Fit St Resid 2 46.3 37.42 23.4 2.67 3.7 2.36R 46 34.9 4.69 29.24.3 -.79-2.46R 4 33.2 4.2 29.4.9.797 2.9R 2 33.9.293 3.9 3.3 4.33 2.9RX R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large leverage. Durbin-Watson statistic =.2766
This has actually made the Durbin-Watson statistic a little worse (lower). Here are the plots: Plots for GasPriceAdj 99 Normal Probability Plot 2 Versus Fits Percent 9 - - 2 2 Fitted Value 3 4 6 Histogram 2 Versus Order Frequency 2 4 - -6-6 2 2 3 3 Observation Order 4 4 This particular data set may have incurable issues. The series are all very smooth for the first (about) twenty years and then become irregular. The statistical word for this problem is non-stationarity. 9