python多处理锁问题

问题描述

我想将字典列表与 python 多处理模块一起添加.

这是我的代码的简化版本:

#!/usr/bin/python2.7# -*- 编码:utf-8 -*-导入多处理导入功能工具进口时间def 合并(锁，d1，d2):time.sleep(5) # 一些耗时的东西带锁:对于 d2.keys() 中的键:如果 d1.has_key(key):d1[键] += d2[键]别的:d1[键] = d2[键]l = [{ x % 10 : x } for x in range(10000)]lock = multiprocessing.Lock()d = multiprocessing.Manager().dict()partial_merge = functools.partial(合并，d1 = d，锁 = 锁)pool_size = multiprocessing.cpu_count()池 = 多处理.池(进程 = pool_size)pool.map(partial_merge, l)池.close()pool.join()打印 d

运行此脚本时出现此错误.我该如何解决这个问题?
RuntimeError: 锁对象只能通过继承在进程之间共享
这种情况下需要merge函数中的lock吗?还是python会处理它?
我认为 map 应该做的是将某些内容从一个列表映射到另一个列表，而不是将一个列表中的所有内容转储到单个对象.那么有没有更优雅的方式来做这些事情呢?

解决方案

以下内容应该在 Python 2 和 3 中跨平台运行(即在 Windows 上).它使用进程池初始化程序将 manager dict 设置为每个子进程中的一个全局变量.

仅供参考:

对于 manager dict，使用锁是不必要的.
Pool 中的进程数默认为 CPU 计数.
如果您对结果不感兴趣，可以使用 apply_async 而不是 map.

导入多处理进口时间定义合并(d2):time.sleep(1) # 一些耗时的东西对于 d2.keys() 中的键:如果键入 d1:d1[键] += d2[键]别的:d1[键] = d2[键]定义初始化(d):全局 d1d1 = d如果 __name__ == '__main__':d1 = multiprocessing.Manager().dict()pool = multiprocessing.Pool(initializer=init, initargs=(d1, ))l = [{ x % 5 : x } for x in range(10)]对于 l 中的项目:pool.apply_async(合并，(项目，))池.close()pool.join()打印(l)打印(d1)

I want to add a list of dicts together with python multiprocessing module.

Here is a simplified version of my code:

#!/usr/bin/python2.7
# -*- coding: utf-8 -*-

import multiprocessing
import functools
import time

def merge(lock, d1, d2):
    time.sleep(5) # some time consuming stuffs
    with lock:
        for key in d2.keys():
            if d1.has_key(key):
                d1[key] += d2[key]
            else:
                d1[key] = d2[key]

l = [{ x % 10 : x } for x in range(10000)]
lock = multiprocessing.Lock()
d = multiprocessing.Manager().dict()

partial_merge = functools.partial(merge, d1 = d, lock = lock)

pool_size = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes = pool_size)
pool.map(partial_merge, l)
pool.close()
pool.join()

print d

I get this error when running this script. How shall I resolve this?

RuntimeError: Lock objects should only be shared between processes through inheritance
is the lock in merge function needed in this condition? or python will take care of it?
I think what's map supposed to do is to map something from one list to another list, not dump all things in one list to a single object. So is there a more elegant way to do such things?

解决方案

The following should run cross-platform (i.e. on Windows, too) in both Python 2 and 3. It uses a process pool initializer to set the manager dict as a global in each child process.

FYI:

Using a lock is unnecessary with a manager dict.
The number of processes in a Pool defaults to the CPU count.
If you're not interested in the result, you can use apply_async instead of map.

import multiprocessing
import time

def merge(d2):
    time.sleep(1) # some time consuming stuffs
    for key in d2.keys():
        if key in d1:
            d1[key] += d2[key]
        else:
            d1[key] = d2[key]

def init(d):
    global d1
    d1 = d

if __name__ == '__main__':

    d1 = multiprocessing.Manager().dict()
    pool = multiprocessing.Pool(initializer=init, initargs=(d1, ))

    l = [{ x % 5 : x } for x in range(10)]

    for item in l:
        pool.apply_async(merge, (item,))

    pool.close()
    pool.join()

    print(l)
    print(d1)

;