问题描述
我有以下数据框df:
id lat lon year month day 0 381 53.30660 -0.54649 2004 1 2 1 381 53.30660 -0.54649 2004 1 3 2 381 53.30660 -0.54649 2004 1 4
我想创建一个新列 df['Date'],其中 year、month 和 day 列按 yyyy-md 格式组合.
and I want to create a new column df['Date'] where the year, month, and day columns are combined according to the format yyyy-m-d.
在这篇文章之后,我做到了:
`df['Date']=pd.to_datetime(df['year']*10000000000 +df['month']*100000000 +df['day']*1000000, format='%Y-%m-%d%')`
结果不是我预期的,因为它是从 1970 年而不是 2004 年开始的,而且它还包含我没有指定的小时戳:
The result is not what I expected, as it starts from 1970 instead of 2004, and it also contains the hour stamp, which I did not specify:
id lat lon year month day Date 0 381 53.30660 -0.54649 2004 1 2 1970-01-01 05:34:00.102 1 381 53.30660 -0.54649 2004 1 3 1970-01-01 05:34:00.103 2 381 53.30660 -0.54649 2004 1 4 1970-01-01 05:34:00.104
由于日期应该是 2004-1-2 格式,我做错了什么?
As the dates should be in the 2004-1-2 format, what am I doing wrong?
推荐答案
有一个更简单的方法:
In [250]: df['Date']=pd.to_datetime(df[['year','month','day']]) In [251]: df Out[251]: id lat lon year month day Date 0 381 53.3066 -0.54649 2004 1 2 2004-01-02 1 381 53.3066 -0.54649 2004 1 3 2004-01-03 2 381 53.3066 -0.54649 2004 1 4 2004-01-04
来自 文档:
从 DataFrame 的多列中组装日期时间.按键可以是常见的缩写,如 [year、month、day、minute、second、ms、us、ns])或相同的复数形式
Assembling a datetime from multiple columns of a DataFrame. The keys can be common abbreviations like [year, month, day, minute, second, ms, us, ns]) or plurals of the same