问题描述
我想要一个相关的系数和 Newey-West 标准误差.
I want to have a coefficient and Newey-West standard error associated with it.
我正在寻找可以执行以下 R 代码正在执行的操作的 Python 库(理想情况下,但任何可行的解决方案都可以):
I am looking for Python library (ideally, but any working solutions is fine) that can do what the following R code is doing:
library(sandwich) library(lmtest) a <- matrix(c(1,3,5,7,4,5,6,4,7,8,9)) b <- matrix(c(3,5,6,2,4,6,7,8,7,8,9)) temp.lm = lm(a ~ b) temp.summ <- summary(temp.lm) temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest)) print (temp.summ$coefficients)
结果:
Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0576208 2.5230532 0.8155281 0.4358205 b 0.5594796 0.4071834 1.3740235 0.2026817
我得到系数并与它们相关联的标准误差.
I get the coefficients and associated with them standard errors.
我看到 statsmodels.stats.sandwich_covariance.cov_hac 模块,但我不知道如何使它与 OLS 一起使用.
I see statsmodels.stats.sandwich_covariance.cov_hac module, but I don't see how to make it work with OLS.
推荐答案
已编辑(2015 年 10 月 31 日)以反映 2015 年秋季 statsmodels 的首选编码风格.
Edited (10/31/2015) to reflect preferred coding style for statsmodels as fall 2015.
在 statsmodels 0.6.1 版中,您可以执行以下操作:
In statsmodels version 0.6.1 you can do the following:
import pandas as pd import numpy as np import statsmodels.formula.api as smf df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]}) reg = smf.ols('a ~ 1 + b',data=df).fit(cov_type='HAC',cov_kwds={'maxlags':1}) print reg.summary() OLS Regression Results ============================================================================== Dep. Variable: a R-squared: 0.281 Model: OLS Adj. R-squared: 0.201 Method: Least Squares F-statistic: 1.949 Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196 Time: 03:15:46 Log-Likelihood: -22.603 No. Observations: 11 AIC: 49.21 Df Residuals: 9 BIC: 50.00 Df Model: 1 Covariance Type: HAC ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272 b 0.5595 0.401 1.396 0.163 -0.226 1.345 ============================================================================== Omnibus: 0.361 Durbin-Watson: 1.468 Prob(Omnibus): 0.835 Jarque-Bera (**): 0.331 Skew: 0.321 Prob(**): 0.847 Kurtosis: 2.442 Cond. No. 19.1 ============================================================================== Warnings: [1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction
也可以在拟合模型后使用get_robustcov_results方法:
Or you can use the get_robustcov_results method after fitting the model:
reg = smf.ols('a ~ 1 + b',data=df).fit() new = reg.get_robustcov_results(cov_type='HAC',maxlags=1) print new.summary() OLS Regression Results ============================================================================== Dep. Variable: a R-squared: 0.281 Model: OLS Adj. R-squared: 0.201 Method: Least Squares F-statistic: 1.949 Date: Sat, 31 Oct 2015 Prob (F-statistic): 0.196 Time: 03:15:46 Log-Likelihood: -22.603 No. Observations: 11 AIC: 49.21 Df Residuals: 9 BIC: 50.00 Df Model: 1 Covariance Type: HAC ============================================================================== coef std err z P>|z| [95.0% Conf. Int.] ------------------------------------------------------------------------------ Intercept 2.0576 2.661 0.773 0.439 -3.157 7.272 b 0.5595 0.401 1.396 0.163 -0.226 1.345 ============================================================================== Omnibus: 0.361 Durbin-Watson: 1.468 Prob(Omnibus): 0.835 Jarque-Bera (**): 0.331 Skew: 0.321 Prob(**): 0.847 Kurtosis: 2.442 Cond. No. 19.1 ============================================================================== Warnings: [1] Standard Errors are heteroscedasticity and autocorrelation robust (HAC) using 1 lags and without small sample correction
statsmodels 的默认值与 R 中等效方法的默认值略有不同.通过将 vcov, 调用更改为以下内容,可以使 R 方法等效于 statsmodels 默认值(我在上面所做的):
The defaults for statsmodels are slightly different than the defaults for the equivalent method in R. The R method can be made equivalent to the statsmodels default (what I did above) by changing the vcov, call to the following:
temp.summ$coefficients <- unclass(coeftest(temp.lm, vcov. = NeweyWest(temp.lm,lag=1,prewhite=FALSE))) print (temp.summ$coefficients) Estimate Std. Error t value Pr(>|t|) (Intercept) 2.0576208 2.6605060 0.7733945 0.4591196 b 0.5594796 0.4007965 1.3959193 0.1962142
您仍然可以在 pandas (0.17) 中执行 Newey-West,尽管我认为该计划是在 pandas 中弃用 OLS:
You can also still do Newey-West in pandas (0.17), although I believe the plan is to deprecate OLS in pandas:
print pd.stats.ols.OLS(df.a,df.b,nw_lags=1) -------------------------Summary of Regression Analysis------------------------- Formula: Y ~ <x> + <intercept> Number of Observations: 11 Number of Degrees of Freedom: 2 R-squared: 0.2807 Adj R-squared: 0.2007 Rmse: 2.0880 F-stat (1, 9): 1.5943, p-value: 0.2384 Degrees of Freedom: model 1, resid 9 -----------------------Summary of Estimated Coefficients------------------------ Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5% -------------------------------------------------------------------------------- x 0.5595 0.4431 1.26 0.2384 -0.3090 1.4280 intercept 2.0576 2.9413 0.70 0.5019 -3.7073 7.8226 *** The calculations are Newey-West adjusted with lags 1 ---------------------------------End of Summary---------------------------------