问题描述
我的这段代码看起来像这样:
The piece of code that I have looks some what like this:
glbl_array = # a 3 Gb array def my_func( args, def_param = glbl_array): #do stuff on args and def_param if __name__ == '__main__': pool = Pool(processes=4) pool.map(my_func, range(1000))
有没有办法确保(或鼓励)不同的进程不会获得 glbl_array 的副本而是共享它.如果无法停止复制,我将使用 memmapped 数组,但我的访问模式不是很规律,所以我希望 memmapped 数组更慢.以上似乎是首先要尝试的.这是在 Linux 上.我只是想从 Stackoverflow 获得一些建议,不想惹恼系统管理员.如果第二个参数是真正的不可变对象,如 glbl_array.tostring(),您认为会有帮助吗?
Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().
推荐答案
您可以很容易地将 multiprocessing 中的共享内存内容与 Numpy 一起使用:
You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:
import multiprocessing import ctypes import numpy as np shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10) shared_array = np.ctypeslib.as_array(shared_array_base.get_obj()) shared_array = shared_array.reshape(10, 10) #-- edited 2015-05-01: the assert check below checks the wrong thing # with recent versions of Numpy/multiprocessing. That no copy is made # is indicated by the fact that the program prints the output shown below. ## No copy was made ##assert shared_array.base.base is shared_array_base.get_obj() # Parallel processing def my_func(i, def_param=shared_array): shared_array[i,:] = i if __name__ == '__main__': pool = multiprocessing.Pool(processes=4) pool.map(my_func, range(10)) print shared_array
打印
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] [ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.] [ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.] [ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.] [ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.] [ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.] [ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.] [ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.] [ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
但是,Linux 在 fork() 上有写时复制语义,所以即使不使用 multiprocessing.Array,除非写入数据,否则数据不会被复制到.
However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.