我不知道熊猫是如何实现这一点的,但我确实根据经验进行了测试.我运行以下代码(在Jupyter笔记本中)来测试操作的速度:
def get_dummy_df(n):
return pd.DataFrame({'a': [1,2]*n, 'b': [4,5]*n, 'c': [7,8]*n})
df = get_dummy_df(100)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(1000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(10000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(100000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(1000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
df = get_dummy_df(10000000)
print df.shape
%timeit df_r = df[df.columns[::-1]]
输出是:
(200, 3)
1000 loops, best of 3: 419 ?s per loop
(2000, 3)
1000 loops, best of 3: 425 ?s per loop
(20000, 3)
1000 loops, best of 3: 498 ?s per loop
(200000, 3)
100 loops, best of 3: 2.66 ms per loop
(2000000, 3)
10 loops, best of 3: 25.2 ms per loop
(20000000, 3)
1 loop, best of 3: 207 ms per loop
正如您所看到的,在前3种情况下,操作的开销大部分时间(400-500μs),但从第4种情况开始,开始时间与数据量成正比,增加每次都在一个数量级.
因此,假设必须与n成比例,似乎我们正在处理O(m * n)