避免 Python sum 默认启动 arg 行为

问题描述

我正在使用一个实现 __add__ 但不继承 int 的 Python 对象.MyObj1 + MyObj2 工作正常，但 sum([MyObj1, MyObj2]) 导致 TypeError，因为 sum() 第一次尝试 0 + MyObj.为了使用 sum()，我的对象需要 __radd__ 来处理 MyObj + 0 或我需要提供一个空对象作为 start 参数.有问题的对象并非设计为空的.

I am working with a Python object that implements __add__, but does not subclass int. MyObj1 + MyObj2 works fine, but sum([MyObj1, MyObj2]) led to a TypeError, becausesum() first attempts 0 + MyObj. In order to use sum(), my object needs __radd__ to handle MyObj + 0 or I need to provide an empty object as the start parameter. The object in question is not designed to be empty.

在任何人问之前，对象不是类似列表或类似字符串的，因此使用 join() 或 itertools 无济于事.

Before anyone asks, the object is not list-like or string-like, so use of join() or itertools would not help.

编辑详情:该模块有一个 SimpleLocation 和一个 CompoundLocation.我将 Location 缩写为 Loc.SimpleLoc 包含一个右开区间，即 [start, end).添加 SimpleLoc 会产生一个 CompoundLoc，其中包含间隔列表，例如[[3, 6), [10, 13)].最终用途包括遍历联合，例如[3, 4, 5, 10, 11, 12]，检查长度，检查成员.

Edit for details: the module has a SimpleLocation and a CompoundLocation. I'll abbreviate Location to Loc. A SimpleLoc contains one right-open interval, i.e. [start, end). Adding SimpleLoc yields a CompoundLoc, which contains a list of the intervals, e.g. [[3, 6), [10, 13)]. End uses include iterating through the union, e.g. [3, 4, 5, 10, 11, 12], checking length, and checking membership.

数字可能相对较大(例如，小于 2^32，但通常为 2^20).间隔可能不会很长(100-2000，但可能更长).目前，仅存储端点.我现在正在试探性地考虑尝试对 set 进行子类化，以便将位置构造为 set(xrange(start, end)).但是，添加集合会让 Python(和数学家)适应.

The numbers can be relatively large (say, smaller than 2^32 but commonly 2^20). The intervals probably won't be extremely long (100-2000, but could be longer). Currently, only the endpoints are stored. I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

我看过的问题:

python 的 sum() 和非整数值
为什么在 python 中有一个 start 参数内置求和函数
重写 __add__ 方法后出现类型错误

我正在考虑两种解决方案.一种是避免 sum() 并使用此评论.我不明白为什么 sum() 首先将迭代的第 0 项添加到 0 而不是添加第 0 项和第 1 项(如链接注释中的循环)；我希望有一个神秘的整数优化原因.

I'm considering two solutions. One is to avoid sum() and use the loop offered in this comment. I don't understand why sum() begins by adding the 0th item of the iterable to 0 rather than adding the 0th and 1st items (like the loop in the linked comment); I hope there's an arcane integer optimization reason.

我的其他解决方案如下；虽然我不喜欢硬编码的零校验，但这是我能够使 sum() 工作的唯一方法.

My other solution is as follows; while I don't like the hard-coded zero check, it's the only way I've been able to make sum() work.

# ...
def __radd__(self, other):
    # This allows sum() to work (the default start value is zero)
    if other == 0:
        return self
    return self.__add__(other)

总而言之，还有其他方法可以对既不能加整数也不能为空的对象使用sum()?

In summary, is there another way to use sum() on objects that can neither be added to integers nor be empty?

推荐答案

代替sum，使用:

import operator
from functools import reduce
reduce(operator.add, seq)

在 Python 2 中 reduce 是内置的，所以看起来像:

in Python 2 reduce was built-in so this looks like:

import operator
reduce(operator.add, seq)

Reduce 通常比 sum 更灵活——你可以提供任何二进制函数，不仅 add，而且你可以可选地提供一个初始元素，而 sum 总是使用一个.

Reduce is generally more flexible than sum - you can provide any binary function, not only add, and you can optionally provide an initial element while sum always uses one.

另请注意:(警告:数学在前面咆哮)

从代数的角度来看，为没有中性元素的 add w/r/t 对象提供支持有点尴尬.

Providing support for add w/r/t objects that have no neutral element is a bit awkward from the algebraic points of view.

请注意:

自然
真实
复数
N-d 个向量
NxM 矩阵
字符串

连同添加形式的Monoid - 即它们是关联的并且具有某种中性元素.

together with addition form a Monoid - i.e. they are associative and have some kind of neutral element.

如果您的操作不是关联的并且没有中性元素，那么它就不会类似于"加法.因此，不要期望它与一起工作得很好总和.

If your operation isn't associative and doesn't have a neutral element, then it doesn't "resemble" addition. Hence, don't expect it to work well with sum.

在这种情况下，使用函数或方法而不是运算符可能会更好.这可能不那么令人困惑，因为您的类的用户看到它支持 +，可能会期望它会以单向方式表现(就像加法通常那样).

In such case, you might be better off with using a function or a method instead of an operator. This may be less confusing since the users of your class, seeing that it supports +, are likely to expect that it will behave in a monoidic way (as addition normally does).

感谢您的扩展，我现在将参考您的特定模块:

Thanks for expanding, I'll refer to your particular module now:

这里有两个概念:

简单的地点，
复合地点.

可以添加简单的位置确实是有道理的，但是它们不会形成一个幺半群，因为它们的添加不满足闭包的基本属性——两个 SimpleLoc 的总和不是一个 SimpleLoc.它通常是一个 CompoundLoc.

It indeed makes sense that simple locations could be added, but they don't form a monoid because their addition doesn't satisfy the basic property of closure - the sum of two SimpleLocs isn't a SimpleLoc. It's, generally, a CompoundLoc.

OTOH，带有加法的 CompoundLocs 对我来说就像一个幺半群(一个可交换的幺半群，而我们正在使用它):它们的总和也是一个 CompoundLoc，它们的加法是关联的、可交换的和 中性元素是一个包含零个 SimpleLocs 的空 CompoundLoc.

OTOH, CompoundLocs with addition looks like a monoid to me (a commutative monoid, while we're at it): A sum of those is a CompoundLoc too, and their addition is associative, commutative and the neutral element is an empty CompoundLoc that contains zero SimpleLocs.

如果您同意我的观点(并且以上内容与您的实现相匹配)，那么您将能够使用 sum，如下所示:

If you agree with me (and the above matches your implementation), then you'll be able to use sum as following:

sum( [SimpleLoc1, SimpleLoc2, SimpleLoc3], start=ComplexLoc() )

确实，这似乎有效.

我现在正在尝试对 set 进行子类化，以便将位置构造为 set(xrange(start, end)).但是，添加集合会让 Python(和数学家)适应.

I am now tentatively thinking of attempting to subclass set such that the location is constructed as set(xrange(start, end)). However, adding sets will give Python (and mathematicians) fits.

嗯，位置是一组数字，所以在它们之上抛出一个类似集合的接口是有意义的(所以 __contains__、__iter__、__len__，也许 __or__ 作为 + 的别名，__and__ 作为产品等).

Well, locations are some sets of numbers, so it makes sense to throw a set-like interface on top of them (so __contains__, __iter__, __len__, perhaps __or__ as an alias of +, __and__ as the product, etc).

至于 xrange 的构造，你真的需要吗?如果您知道要存储间隔集，那么您可能会通过坚持 [start, end) 对的表示来节省空间.如果您觉得有帮助，您可以输入一个实用方法，该方法采用任意整数序列并将其转换为最佳 SimpleLoc 或 CompoundLoc.

As for construction from xrange, do you really need it? If you know that you're storing sets of intervals, then you're likely to save space by sticking to your representation of [start, end) pairs. You could throw in an utility method that takes an arbitrary sequence of integers and translates it to an optimal SimpleLoc or CompoundLoc if you feel it's going to help.