问题描述
我从列表列表中创建了一个 pandas 数据框
I created a pandas dataframe from a list of lists
import pandas as pd df_list = [["a", "1", "2"], ["b", "3", np.nan]] df = pd.DataFrame(df_list, columns = list("ABC")) >>> A B C 0 a 1 2 1 b 3 NaN
有没有办法将数据框的所有列转换为可以转换的浮点数,即 B 和 C?如果您知道要转换哪些列,则可以使用以下方法:
Is there a way to convert all columns of the dataframe to float, that can be converted, i.e. B and C? The following works, if you know, which columns to convert:
df[["B", "C"]] = df[["B", "C"]].astype("float")
但是,如果您事先不知道哪些列包含数字,您会怎么做?当我尝试时
But what do you do, if you don't know in advance, which columns contain the numbers? When I tried
df = df.astype("float", errors = "ignore")
所有列仍然是字符串/对象.同样,
all columns are still strings/objects. Similarly,
df[["B", "C"]] = df[["B", "C"]].apply(pd.to_numeric)
转换两列(虽然B"是 int 而C"是float",因为存在 NaN 值),但是
converts both columns (though "B" is int and "C" is "float", because of the NaN value being present), but
df = df.apply(pd.to_numeric)
显然会引发错误消息,我看不出有什么方法可以抑制它.
是否有可能在不遍历每一列的情况下执行此字符串-浮点转换,以尝试 .astype("float", errors = "ignore")?
obviously throws an error message and I don't see a way to suppress this.
Is there a possibility to perform this string-float conversion without looping through each column, to try .astype("float", errors = "ignore")?
推荐答案
我觉得你需要errors='ignore'pandas-docs/stable/generated/pandas.to_numeric.html" rel="noreferrer">to_numeric:
I think you need parameter errors='ignore' in to_numeric:
df = df.apply(pd.to_numeric, errors='ignore') print (df.dtypes) A object B int64 C float64 dtype: object
如果不是混合值,它工作得很好 - 带有字符串的数字:
It working nice if not mixed values - numeric with strings:
df_list = [["a", "t", "2"], ["b", "3", np.nan]] df = pd.DataFrame(df_list, columns = list("ABC")) df = df.apply(pd.to_numeric, errors='ignore') print (df) A B C 0 a t 2.0 <=added t to column B for mixed values 1 b 3 NaN print (df.dtypes) A object B object C float64 dtype: object
您也可以将 int 向下转换为 floats:
You can downcast also int to floats:
df = df.apply(pd.to_numeric, errors='ignore', downcast='float') print (df.dtypes) A object B float32 C float32 dtype: object
同理:
df = df.apply(lambda x: pd.to_numeric(x, errors='ignore', downcast='float')) print (df.dtypes) A object B float32 C float32 dtype: object