假设我有一个像这样的 DataFrame(或 Series):
Value
0 0.5
1 0.8
2 -0.2
3 None
4 None
5 None
我想创建一个新的Result 列。
每个结果的值由前一个值通过任意函数f
确定。
如果之前的值不可用(None 或 NaN),我希望使用之前的结果(当然,并对其应用 f
).
使用之前的值很容易,我只需要使用shift
。然而,访问以前的结果似乎并不那么简单。
例如,下面的代码计算了结果,但如果需要则无法访问之前的结果。
df['Result'] = df['Value'].shift(1).apply(f)
请假设 f
是任意的,因此使用 cumsum
之类的解决方案是不可能的。
显然,这可以通过迭代来完成,但我想知道是否存在更 Pandas 式的解决方案。
df['Result'] = None
for i in range(1, len(df)):
value = df.iloc[i-1, 'Value']
if math.isnan(value) or value is None:
value = df.iloc[i-1, 'Result']
df.iloc[i, 'Result'] = f(value)
示例输出,给定 f = lambda x: x+1
:
差:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN NaN
5 NaN NaN
好:
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8 <-- previous Value not available, used f(previous result)
5 NaN 2.8 <-- same
看起来它对我来说必须是一个循环。我讨厌循环...所以当我循环时,我使用 numba
Numba gives you the power to speed up your applications with high performance functions written directly in Python. With a few annotations, array-oriented and math-heavy Python code can be just-in-time compiled to native machine instructions, similar in performance to C, C++ and Fortran, without having to switch languages or Python interpreters.
https://numba.pydata.org/
from numba import njit
@njit
def f(x):
return x + 1
@njit
def g(a):
r = [np.nan]
for v in a[:-1]:
if np.isnan(v):
r.append(f(r[-1]))
khác:
r.append(f(v))
return r
df.assign(Result=g(df.Value.values))
Value Result
0 0.5 NaN
1 0.8 1.5
2 -0.2 1.8
3 NaN 0.8
4 NaN 1.8
5 NaN 2.8
Tôi là một lập trình viên xuất sắc, rất giỏi!