问题描述
我正在使用 Pandas 使用 Data Frames 存储股票价格数据.数据集中有 2940 行.数据集快照如下所示:
I'm using Pandas to store stock prices data using Data Frames. There are 2940 rows in the dataset. The Dataset snapshot is displayed below:
时间序列数据不包含周六和周日的值.因此,必须填充缺失值.
这是我写的代码,但没有解决问题:
The time series data does not contain the values for Saturday and Sunday. Hence missing values have to be filled.
Here is the code I've written but it is not solving the problem:
import pandas as pd import numpy as np import os os.chdir('C:/Users/*****/Analytics/stock-prices') data = pd.read_csv('stock-data.csv') # PriceDate Column - Does not contain Saturday and Sunday stock entries data['PriceDate'] = pd.to_datetime(data['PriceDate'], format='%m/%d/%Y') data = data.sort_index(by=['PriceDate'], ascending=[True]) # Starting date is Aug 25 2004 idx = pd.date_range('08-25-2004',periods=2940,freq='D') data = data.set_index(idx) data['newdate']=data.index newdate=data['newdate'].values # Create a time series column data = pd.merge(newdate, data, on='PriceDate', how='outer')
如何填补周六周日的缺失值?
推荐答案
我觉得你可以使用 resample 与 ffill 或 bfill,但在 set_index 来自 PriceDate 列:
I think you can use resample with ffill or bfill, but before set_index from column PriceDate:
print (data) ID PriceDate OpenPrice HighPrice 0 1 6/24/2016 1 2 1 2 6/23/2016 3 4 2 2 6/22/2016 5 6 3 2 6/21/2016 7 8 4 2 6/20/2016 9 10 5 2 6/17/2016 11 12 6 2 6/16/2016 13 14
data['PriceDate'] = pd.to_datetime(data['PriceDate'], format='%m/%d/%Y') data = data.sort_values(by=['PriceDate'], ascending=[True]) data.set_index('PriceDate', inplace=True) print (data) ID OpenPrice HighPrice PriceDate 2016-06-16 2 13 14 2016-06-17 2 11 12 2016-06-20 2 9 10 2016-06-21 2 7 8 2016-06-22 2 5 6 2016-06-23 2 3 4 2016-06-24 1 1 2 data = data.resample('D').ffill().reset_index() print (data) PriceDate ID OpenPrice HighPrice 0 2016-06-16 2 13 14 1 2016-06-17 2 11 12 2 2016-06-18 2 11 12 3 2016-06-19 2 11 12 4 2016-06-20 2 9 10 5 2016-06-21 2 7 8 6 2016-06-22 2 5 6 7 2016-06-23 2 3 4 8 2016-06-24 1 1 2
<小时>
data = data.resample('D').bfill().reset_index() print (data) PriceDate ID OpenPrice HighPrice 0 2016-06-16 2 13 14 1 2016-06-17 2 11 12 2 2016-06-18 2 9 10 3 2016-06-19 2 9 10 4 2016-06-20 2 9 10 5 2016-06-21 2 7 8 6 2016-06-22 2 5 6 7 2016-06-23 2 3 4 8 2016-06-24 1 1 2