问题描述
我有以下 numpy 数组:
I have the following numpy array:
import numpy as np arr = np.array([[1,2,3,4,2000], [5,6,7,8,2000], [9,0,1,2,2001], [3,4,5,6,2001], [7,8,9,0,2002], [1,2,3,4,2002], [5,6,7,8,2003], [9,0,1,2,2003] ])
我理解 np.sum(arr, axis=0) 提供结果:
array([ 40, 28, 36, 34, 16012])
我想做的(没有for循环)是根据最后一列的值对列求和,以便提供的结果是:
what I would like to do (without a for loop) is sum the columns based on the value of the last column so that the result provided is:
array([[ 6, 8, 10, 12, 4000], [ 12, 4, 6, 8, 4002], [ 8, 10, 12, 4, 4004], [ 14, 6, 8, 10, 4006]])
我意识到如果没有循环可能会有些牵强,但希望能做到最好……
I realize that it may be a stretch to do without a loop, but hoping for the best...
如果必须使用 for 循环,那将如何工作?
If a for loop must be used, then how would that work?
我试过 np.sum(arr[:, 4]==2000, axis=0) (我会用 for 循环中的变量替换 2000),但是它给出了 2
I tried np.sum(arr[:, 4]==2000, axis=0) (where I would substitute 2000 with the variable from the for loop), however it gave a result of 2
推荐答案
你可以在纯 numpy 中使用 np.diff 和 np.add.reduceat.np.diff 将为您提供最右侧列更改的索引:
You can do this in pure numpy using a clever application of np.diff and np.add.reduceat. np.diff will give you the indices where the rightmost column changes:
d = np.diff(arr[:, -1])
np.where 会将您的布尔索引 d 转换为 np.add.reduceat 期望的整数索引:
np.where will convert your boolean index d into the integer indices that np.add.reduceat expects:
d = np.where(d)[0]
reduceat 也期望看到零索引,并且所有内容都需要移动一:
reduceat will also expect to see a zero index, and everything needs to be shifted by one:
indices = np.r_[0, e + 1]
使用 np.r_ 这里比 方便一点np.concatenate 因为它允许标量.然后总和变为:
Using np.r_ here is a bit more convenient than np.concatenate because it allows scalars. The sum then becomes:
result = np.add.reduceat(arr, indices, axis=0)
这当然可以组合成一条线:
This can be combined into a one-liner of course:
>>> result = np.add.reduceat(arr, np.r_[0, np.where(np.diff(arr[:, -1]))[0] + 1], axis=0) >>> result array([[ 6, 8, 10, 12, 4000], [ 12, 4, 6, 8, 4002], [ 8, 10, 12, 4, 4004], [ 14, 6, 8, 10, 4006]])