问题描述
我想将字典列表与 python 多处理模块一起添加.
这是我的代码的简化版本:
#!/usr/bin/python2.7# -*- 编码:utf-8 -*-导入多处理导入功能工具进口时间def 合并(锁,d1,d2):time.sleep(5) # 一些耗时的东西带锁:对于 d2.keys() 中的键:如果 d1.has_key(key):d1[键] += d2[键]别的:d1[键] = d2[键]l = [{ x % 10 : x } for x in range(10000)]lock = multiprocessing.Lock()d = multiprocessing.Manager().dict()partial_merge = functools.partial(合并,d1 = d,锁 = 锁)pool_size = multiprocessing.cpu_count()池 = 多处理.池(进程 = pool_size)pool.map(partial_merge, l)池.close()pool.join()打印 d
运行此脚本时出现此错误.我该如何解决这个问题?
RuntimeError: 锁对象只能通过继承在进程之间共享
这种情况下需要merge函数中的lock吗?还是python会处理它??/p>
我认为 map 应该做的是将某些内容从一个列表映射到另一个列表,而不是将一个列表中的所有内容转储到单个对象.那么有没有更优雅的方式来做这些事情呢?
以下内容应该在 Python 2 和 3 中跨平台运行(即在 Windows 上).它使用进程池初始化程序将 manager dict 设置为每个子进程中的一个全局变量.
仅供参考:
- 对于 manager dict,使用锁是不必要的.
- Pool 中的进程数默认为 CPU 计数.
- 如果您对结果不感兴趣,可以使用 apply_async 而不是 map.
导入多处理进口时间定义合并(d2):time.sleep(1) # 一些耗时的东西对于 d2.keys() 中的键:如果键入 d1:d1[键] += d2[键]别的:d1[键] = d2[键]定义初始化(d):全局 d1d1 = d如果 __name__ == '__main__':d1 = multiprocessing.Manager().dict()pool = multiprocessing.Pool(initializer=init, initargs=(d1, ))l = [{ x % 5 : x } for x in range(10)]对于 l 中的项目:pool.apply_async(合并,(项目,))池.close()pool.join()打印(l)打印(d1)
I want to add a list of dicts together with python multiprocessing module.
Here is a simplified version of my code:
#!/usr/bin/python2.7 # -*- coding: utf-8 -*- import multiprocessing import functools import time def merge(lock, d1, d2): time.sleep(5) # some time consuming stuffs with lock: for key in d2.keys(): if d1.has_key(key): d1[key] += d2[key] else: d1[key] = d2[key] l = [{ x % 10 : x } for x in range(10000)] lock = multiprocessing.Lock() d = multiprocessing.Manager().dict() partial_merge = functools.partial(merge, d1 = d, lock = lock) pool_size = multiprocessing.cpu_count() pool = multiprocessing.Pool(processes = pool_size) pool.map(partial_merge, l) pool.close() pool.join() print d
I get this error when running this script. How shall I resolve this?
RuntimeError: Lock objects should only be shared between processes through inheritance
is the lock in merge function needed in this condition? or python will take care of it?
I think what's map supposed to do is to map something from one list to another list, not dump all things in one list to a single object. So is there a more elegant way to do such things?
The following should run cross-platform (i.e. on Windows, too) in both Python 2 and 3. It uses a process pool initializer to set the manager dict as a global in each child process.
FYI:
- Using a lock is unnecessary with a manager dict.
- The number of processes in a Pool defaults to the CPU count.
- If you're not interested in the result, you can use apply_async instead of map.
import multiprocessing import time def merge(d2): time.sleep(1) # some time consuming stuffs for key in d2.keys(): if key in d1: d1[key] += d2[key] else: d1[key] = d2[key] def init(d): global d1 d1 = d if __name__ == '__main__': d1 = multiprocessing.Manager().dict() pool = multiprocessing.Pool(initializer=init, initargs=(d1, )) l = [{ x % 5 : x } for x in range(10)] for item in l: pool.apply_async(merge, (item,)) pool.close() pool.join() print(l) print(d1)
;