问题描述
我正在尝试使用代码将数据帧转换为系列,简化后如下所示:
Im attempting to convert a dataframe into a series using code which, simplified, looks like this:
dates = ['2016-1-{}'.format(i)for i in range(1,21)] values = [i for i in range(20)] data = {'Date': dates, 'Value': values} df = pd.DataFrame(data) df['Date'] = pd.to_datetime(df['Date']) ts = pd.Series(df['Value'], index=df['Date']) print(ts)
但是,打印输出如下所示:
However, print output looks like this:
Date 2016-01-01 NaN 2016-01-02 NaN 2016-01-03 NaN 2016-01-04 NaN 2016-01-05 NaN 2016-01-06 NaN 2016-01-07 NaN 2016-01-08 NaN 2016-01-09 NaN 2016-01-10 NaN 2016-01-11 NaN 2016-01-12 NaN 2016-01-13 NaN 2016-01-14 NaN 2016-01-15 NaN 2016-01-16 NaN 2016-01-17 NaN 2016-01-18 NaN 2016-01-19 NaN 2016-01-20 NaN Name: Value, dtype: float64
NaN 是从哪里来的?DataFrame 对象上的视图是否不是 Series 类的有效输入?
Where does NaN come from? Is a view on a DataFrame object not a valid input for the Series class ?
我为 pd.Index 对象找到了 to_series 函数,DataFrames 有类似的东西吗?
I have found the to_series function for pd.Index objects, is there something similar for DataFrames ?
推荐答案
我觉得你可以使用 values,它将列 Value 转换为数组:
I think you can use values, it convert column Value to array:
ts = pd.Series(df['Value'].values, index=df['Date'])
import pandas as pd import numpy as np import io dates = ['2016-1-{}'.format(i)for i in range(1,21)] values = [i for i in range(20)] data = {'Date': dates, 'Value': values} df = pd.DataFrame(data) df['Date'] = pd.to_datetime(df['Date']) print df['Value'].values [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19] ts = pd.Series(df['Value'].values, index=df['Date'])
print(ts) Date 2016-01-01 0 2016-01-02 1 2016-01-03 2 2016-01-04 3 2016-01-05 4 2016-01-06 5 2016-01-07 6 2016-01-08 7 2016-01-09 8 2016-01-10 9 2016-01-11 10 2016-01-12 11 2016-01-13 12 2016-01-14 13 2016-01-15 14 2016-01-16 15 2016-01-17 16 2016-01-18 17 2016-01-19 18 2016-01-20 19 dtype: int64
或者你可以使用:
ts1 = pd.Series(data=values, index=pd.to_datetime(dates)) print(ts1) 2016-01-01 0 2016-01-02 1 2016-01-03 2 2016-01-04 3 2016-01-05 4 2016-01-06 5 2016-01-07 6 2016-01-08 7 2016-01-09 8 2016-01-10 9 2016-01-11 10 2016-01-12 11 2016-01-13 12 2016-01-14 13 2016-01-15 14 2016-01-16 15 2016-01-17 16 2016-01-18 17 2016-01-19 18 2016-01-20 19 dtype: int64
谢谢@ajcr 更好地解释为什么你得到 NaN:
Thank you @ajcr for better explanation why you get NaN:
当您将 Series 或 DataFrame 列提供给 pd.Series 时,它将使用 index 你指定.由于您的 DataFrame 列有一个整数 index(不是 date index),因此您会得到很多缺失值.
When you give a Series or DataFrame column to pd.Series, it will reindex it using the index you specify. Since your DataFrame column has an integer index (not a date index) you get lots of missing values.