pandas 填充缺失值
示例
In [11]: df = pd.DataFrame([[1, 2, None, 3], [4, None, 5, 6], [7, 8, 9, 10], [None, None, None, None]]) Out[11]: 0 1 2 3 0 1.0 2.0 NaN 3.0 1 4.0 NaN 5.0 6.0 2 7.0 8.0 9.0 10.0 3 NaN NaN NaN NaN
用单个值填充缺失值:
In [12]: df.fillna(0) Out[12]: 0 1 2 3 0 1.0 2.0 0.0 3.0 1 4.0 0.0 5.0 6.0 2 7.0 8.0 9.0 10.0 3 0.0 0.0 0.0 0.0
这将返回一个新的DataFrame。如果要更改原始DataFrame,请使用inplace参数(df.fillna(0,inplace=True))或将其分配回原始DataFrame(df=df.fillna(0))。
用先前的值填充缺失的值:
In [13]: df.fillna(method='pad') # this is equivalent to both method='ffill' and .ffill() Out[13]: 0 1 2 3 0 1.0 2.0 NaN 3.0 1 4.0 2.0 5.0 6.0 2 7.0 8.0 9.0 10.0 3 7.0 8.0 9.0 10.0
填写以下内容:
In [14]: df.fillna(method='bfill') # this is equivalent to .bfill() Out[14]: 0 1 2 3 0 1.0 2.0 5.0 3.0 1 4.0 8.0 5.0 6.0 2 7.0 8.0 9.0 10.0 3 NaN NaN NaN NaN
使用另一个DataFrame进行填充:
In [15]: df2 = pd.DataFrame(np.arange(100, 116).reshape(4, 4)) df2 Out[15]: 0 1 2 3 0 100 101 102 103 1 104 105 106 107 2 108 109 110 111 3 112 113 114 115 In [16]: df.fillna(df2) # takes the corresponding cells in df2 to fill df Out[16]: 0 1 2 3 0 1.0 2.0 102.0 3.0 1 4.0 105.0 5.0 6.0 2 7.0 8.0 9.0 10.0 3 112.0 113.0 114.0 115.0