问题描述
我有一个包含一些(数亿)行的数据框.我想有效地将??日期时间转换为时间戳.我该怎么做?
我的示例df:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1),结束=dt.datetime(2016,1,2,0,0,1), 频率='H')).reset_index().rename(columns={'index':'datetime'})df.head()约会时间0 2016-01-01 00:00:011 2016-01-01 01:00:012 2016-01-01 02:00:013 2016-01-01 03:00:014 2016-01-01 04:00:01
现在我使用 .apply() 将日期时间逐个转换为时间戳值,但如果我有一些(数亿)行,则需要很长时间(几个小时):
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)df.head()日期时间 ts0 2016-01-01 00:00:01 14516028011 2016-01-01 01:00:01 14516064012 2016-01-01 02:00:01 14516100013 2016-01-01 03:00:01 14516136014 2016-01-01 04:00:01 1451617201
上面的结果就是我想要的.
如果我尝试使用 pandas.Series 的 .dt 访问器,则会收到错误消息:
df['ts'] = df['datetime'].dt.timestamp
<块引用>
AttributeError: 'DatetimeProperties' 对象没有属性'时间戳'
如果我尝试创建例如.使用 .dt 访问器的日期时间的日期部分比使用 .apply() 快得多:
df['date'] = df['datetime'].dt.datedf.head()日期时间 ts 日期0 2016-01-01 00:00:01 1451602801 2016-01-011 2016-01-01 01:00:01 1451606401 2016-01-012 2016-01-01 02:00:01 1451610001 2016-01-013 2016-01-01 03:00:01 1451613601 2016-01-014 2016-01-01 04:00:01 1451617201 2016-01-01
我想要类似时间戳的东西...
但我不太了解官方文档:它谈到转换为时间戳" 但我没有看到任何时间戳;它只是谈论使用 pd.to_datetime() 转换为日期时间,而不是时间戳...
pandas.Timestamp 构造函数也不起作用(返回以下错误):
df['ts2'] = pd.Timestamp(df['datetime'])
<块引用>
TypeError:无法将输入转换为时间戳
pandas.Series.to_timestamp代码> 也做出了我想要的完全不同的东西:
df['ts3'] = df['datetime'].to_timestampdf.head()日期时间 ts ts30 2016-01-01 00:00:01 1451602801 <绑定方法 Series.to_timestamp of 0 2016...1 2016-01-01 01:00:01 1451606401 <绑定方法 Series.to_timestamp of 0 2016...2 2016-01-01 02:00:01 1451610001 <绑定方法 Series.to_timestamp of 0 2016...3 2016-01-01 03:00:01 1451613601 <绑定方法 Series.to_timestamp of 0 2016...4 2016-01-01 04:00:01 1451617201 <绑定方法 Series.to_timestamp of 0 2016...
谢谢!!
我觉得你需要先转换成 numpy array by values 并转换为 int64 - 输出在 ns,所以需要除以10 ** 9:
df['ts'] = df.datetime.values.astype(np.int64)//10 ** 9打印 (df)日期时间 ts0 2016-01-01 00:00:01 14516064011 2016-01-01 01:00:01 14516100012 2016-01-01 02:00:01 14516136013 2016-01-01 03:00:01 14516172014 2016-01-01 04:00:01 14516208015 2016-01-01 05:00:01 14516244016 2016-01-01 06:00:01 14516280017 2016-01-01 07:00:01 14516316018 2016-01-01 08:00:01 14516352019 2016-01-01 09:00:01 145163880110 2016-01-01 10:00:01 145164240111 2016-01-01 11:00:01 145164600112 2016-01-01 12:00:01 145164960113 2016-01-01 13:00:01 145165320114 2016-01-01 14:00:01 145165680115 2016-01-01 15:00:01 145166040116 2016-01-01 16:00:01 145166400117 2016-01-01 17:00:01 145166760118 2016-01-01 18:00:01 145167120119 2016-01-01 19:00:01 145167480120 2016-01-01 20:00:01 145167840121 2016-01-01 21:00:01 145168200122 2016-01-01 22:00:01 145168560123 2016-01-01 23:00:01 145168920124 2016-01-02 00:00:01 1451692801
to_timestamp 用于将 从周期索引转换为日期时间索引一个>.
I have a dataframe with some (hundreds of) million of rows. And I want to convert datetime to timestamp effectively. How can I do it?
My sample df:
df = pd.DataFrame(index=pd.DatetimeIndex(start=dt.datetime(2016,1,1,0,0,1), end=dt.datetime(2016,1,2,0,0,1), freq='H')) .reset_index().rename(columns={'index':'datetime'}) df.head() datetime 0 2016-01-01 00:00:01 1 2016-01-01 01:00:01 2 2016-01-01 02:00:01 3 2016-01-01 03:00:01 4 2016-01-01 04:00:01
Now I convert datetime to timestamp value-by-value with .apply() but it takes a very long time (some hours) if I have some (hundreds of) million rows:
df['ts'] = df[['datetime']].apply(lambda x: x[0].timestamp(), axis=1).astype(int) df.head() datetime ts 0 2016-01-01 00:00:01 1451602801 1 2016-01-01 01:00:01 1451606401 2 2016-01-01 02:00:01 1451610001 3 2016-01-01 03:00:01 1451613601 4 2016-01-01 04:00:01 1451617201
The above result is what I want.
If I try to use the .dt accessor of pandas.Series then I get error message:
df['ts'] = df['datetime'].dt.timestamp
AttributeError: 'DatetimeProperties' object has no attribute 'timestamp'
If I try to create eg. the date parts of datetimes with the .dt accessor then it is much more faster then using .apply():
df['date'] = df['datetime'].dt.date df.head() datetime ts date 0 2016-01-01 00:00:01 1451602801 2016-01-01 1 2016-01-01 01:00:01 1451606401 2016-01-01 2 2016-01-01 02:00:01 1451610001 2016-01-01 3 2016-01-01 03:00:01 1451613601 2016-01-01 4 2016-01-01 04:00:01 1451617201 2016-01-01
I want something similar with timestamps...
But I don't really understand the official documentation: it talks about "Converting to Timestamps" but I don't see any timestamps there; it just talks about converting to datetime with pd.to_datetime() but not to timestamp...
pandas.Timestamp constructor also doesn't work (returns with the below error):
df['ts2'] = pd.Timestamp(df['datetime'])
TypeError: Cannot convert input to Timestamp
pandas.Series.to_timestamp also makes something totally different that I want:
df['ts3'] = df['datetime'].to_timestamp df.head() datetime ts ts3 0 2016-01-01 00:00:01 1451602801 <bound method Series.to_timestamp of 0 2016... 1 2016-01-01 01:00:01 1451606401 <bound method Series.to_timestamp of 0 2016... 2 2016-01-01 02:00:01 1451610001 <bound method Series.to_timestamp of 0 2016... 3 2016-01-01 03:00:01 1451613601 <bound method Series.to_timestamp of 0 2016... 4 2016-01-01 04:00:01 1451617201 <bound method Series.to_timestamp of 0 2016...
Thank you!!
I think you need convert first to numpy array by values and cast to int64 - output is in ns, so need divide by 10 ** 9:
df['ts'] = df.datetime.values.astype(np.int64) // 10 ** 9 print (df) datetime ts 0 2016-01-01 00:00:01 1451606401 1 2016-01-01 01:00:01 1451610001 2 2016-01-01 02:00:01 1451613601 3 2016-01-01 03:00:01 1451617201 4 2016-01-01 04:00:01 1451620801 5 2016-01-01 05:00:01 1451624401 6 2016-01-01 06:00:01 1451628001 7 2016-01-01 07:00:01 1451631601 8 2016-01-01 08:00:01 1451635201 9 2016-01-01 09:00:01 1451638801 10 2016-01-01 10:00:01 1451642401 11 2016-01-01 11:00:01 1451646001 12 2016-01-01 12:00:01 1451649601 13 2016-01-01 13:00:01 1451653201 14 2016-01-01 14:00:01 1451656801 15 2016-01-01 15:00:01 1451660401 16 2016-01-01 16:00:01 1451664001 17 2016-01-01 17:00:01 1451667601 18 2016-01-01 18:00:01 1451671201 19 2016-01-01 19:00:01 1451674801 20 2016-01-01 20:00:01 1451678401 21 2016-01-01 21:00:01 1451682001 22 2016-01-01 22:00:01 1451685601 23 2016-01-01 23:00:01 1451689201 24 2016-01-02 00:00:01 1451692801
to_timestamp is used for converting from period to datetime index.