问题描述
我从日期"列创建了一个 DatetimeIndex:
I created a DatetimeIndex from a "date" column:
sales.index = pd.DatetimeIndex(sales["date"])
现在索引如下:
DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-04', '2003-01-06', '2003-01-07', '2003-01-08', '2003-01-09', '2003-01-10', '2003-01-11', '2003-01-13', ... '2016-07-22', '2016-07-23', '2016-07-24', '2016-07-25', '2016-07-26', '2016-07-27', '2016-07-28', '2016-07-29', '2016-07-30', '2016-07-31'], dtype='datetime64[ns]', name='date', length=4393, freq=None)
如您所见,freq 属性为无.我怀疑未来的错误是由缺少 freq 引起的.但是,如果我尝试明确设置频率:
As you see, the freq attribute is None. I suspect that errors down the road are caused by the missing freq. However, if I try to set the frequency explicitly:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-148-30857144de81> in <module>() 1 #### DEBUG ----> 2 sales_train = disentangle(df_train) 3 sales_holdout = disentangle(df_holdout) 4 result = sarima_fit_predict(sales_train.loc[5002, 9990]["amount_sold"], sales_holdout.loc[5002, 9990]["amount_sold"]) <ipython-input-147-08b4c4ecdea3> in disentangle(df_train) 2 # transform sales table to disentangle sales time series 3 sales = df_train[["date", "store_id", "article_id", "amount_sold"]] ----> 4 sales.index = pd.DatetimeIndex(sales["date"], freq="d") 5 sales = sales.pivot_table(index=["store_id", "article_id", "date"]) 6 return sales /usr/local/lib/python3.6/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs) 89 else: 90 kwargs[new_arg_name] = new_arg_value ---> 91 return func(*args, **kwargs) 92 return wrapper 93 return _deprecate_kwarg /usr/local/lib/python3.6/site-packages/pandas/core/indexes/datetimes.py in __new__(cls, data, freq, start, end, periods, copy, name, tz, verify_integrity, normalize, closed, ambiguous, dtype, **kwargs) 399 'dates does not conform to passed ' 400 'frequency {1}' --> 401 .format(inferred, freq.freqstr)) 402 403 if freq_infer: ValueError: Inferred frequency None from passed dates does not conform to passed frequency D
显然已经推断出频率,但既没有存储在 DatetimeIndex 的 freq 属性中,也没有存储在 inferred_freq 属性中 - 两者都是 None.有人能解惑吗?
So apparently a frequency has been inferred, but is stored neither in the freq nor inferred_freq attribute of the DatetimeIndex - both are None. Can someone clear up the confusion?
推荐答案
这里有几个选项:
- pd.infer_freq
- pd.tseries.frequencies.to_offset
我怀疑后面的错误是由缺少频率引起的.
I suspect that errors down the road are caused by the missing freq.
你完全正确.这是我经常使用的:
You are absolutely right. Here's what I use often:
def add_freq(idx, freq=None): """Add a frequency attribute to idx, through inference or directly. Returns a copy. If `freq` is None, it is inferred. """ idx = idx.copy() if freq is None: if idx.freq is None: freq = pd.infer_freq(idx) else: return idx idx.freq = pd.tseries.frequencies.to_offset(freq) if idx.freq is None: raise AttributeError('no discernible frequency found to `idx`. Specify' ' a frequency string with `freq`.') return idx
一个例子:
idx=pd.to_datetime(['2003-01-02', '2003-01-03', '2003-01-06']) # freq=None print(add_freq(idx)) # inferred DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='B') print(add_freq(idx, freq='D')) # explicit DatetimeIndex(['2003-01-02', '2003-01-03', '2003-01-06'], dtype='datetime64[ns]', freq='D')
使用 asfreq 实际上会重新索引(填充)缺失的日期,因此如果这不是您要查找的内容,请小心.
Using asfreq will actually reindex (fill) missing dates, so be careful of that if that's not what you're looking for.
改变频率的主要函数是 asfreq 函数.对于 DatetimeIndex,这基本上只是一个简单但方便的reindex 的包装器,它生成一个 date_range 并调用 reindex.
The primary function for changing frequencies is the asfreq function. For a DatetimeIndex, this is basically just a thin, but convenient wrapper around reindex which generates a date_range and calls reindex.