问题描述
由于RAM内存的限制,我跟着这些指令并构建了一个生成器,它可以绘制小批量并将它们传递给 Keras 的 fit_generator.但是即使我继承了序列,Keras 也无法使用多处理准备队列.
Due to the limitation of RAM memory, I followed these instructions and built a generator that draw small batch and pass them in the fit_generator of Keras. But Keras can't prepare the queue with the multiprocessing even I inherit the Sequence.
这是我的多处理生成器.
Here is my generator for multiprocessing.
class My_Generator(Sequence): def __init__(self, image_filenames, labels, batch_size): self.image_filenames, self.labels = image_filenames, labels self.batch_size = batch_size def __len__(self): return np.ceil(len(self.image_filenames) / float(self.batch_size)) def __getitem__(self, idx): batch_x = self.image_filenames[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.labels[idx * self.batch_size:(idx + 1) * self.batch_size] return np.array([ resize(imread(file_name), (200, 200)) for file_name in batch_x]), np.array(batch_y)
主要功能:
batch_size = 100 num_epochs = 10 train_fnames = [] mask_training = [] val_fnames = [] mask_validation = []
我希望生成器通过 ID 分别在不同线程中读取文件夹中的批次(其中 ID 如下所示:{number}.csv 用于原始图像,{number}_label.csv 用于遮罩图像).我最初构建了另一个更优雅的类来将每个数据存储在一个 .h5 文件而不是目录中.但阻止了同样的问题.因此,如果你有代码可以做到这一点,我也接受.
I would like that the generator read batches in the folders seperatly in different threads by IDs (where IDs look like: {number}.csv for raw images and {number}_label.csv for mask images). I initially built another more elegant class to stock every data in one .h5 file instead of directory. But blocked of the same problem. Thus, if you have a code to do this, I'm taker also.
for dirpath, _, fnames in os.walk('./train/'): for fname in fnames: if 'label' not in fname: training_filenames.append(os.path.abspath(os.path.join(dirpath, fname))) else: mask_training.append(os.path.abspath(os.path.join(dirpath, fname))) for dirpath, _, fnames in os.walk('./validation/'): for fname in fnames: if 'label' not in fname: validation_filenames.append(os.path.abspath(os.path.join(dirpath, fname))) else: mask_validation.append(os.path.abspath(os.path.join(dirpath, fname))) my_training_batch_generator = My_Generator(training_filenames, mask_training, batch_size) my_validation_batch_generator = My_Generator(validation_filenames, mask_validation, batch_size) num_training_samples = len(training_filenames) num_validation_samples = len(validation_filenames)
在此,模型超出范围.相信不是模型的问题所以就不贴了.
Herein, the model is out of scope. I believe that it's not a problem of the model so I won't paste it.
mdl = model.compile(...) mdl.fit_generator(generator=my_training_batch_generator, steps_per_epoch=(num_training_samples // batch_size), epochs=num_epochs, verbose=1, validation_data=None, #my_validation_batch_generator, # validation_steps=(num_validation_samples // batch_size), use_multiprocessing=True, workers=4, max_queue_size=2)
报错说明我创建的类不是Iterator:
The error shows that the class I create is not an Iterator:
Traceback (most recent call last): File "test.py", line 141, in <module> max_queue_size=2) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 2177, in fit_generator initial_epoch=initial_epoch) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_generator.py", line 147, in fit_generator generator_output = next(output_generator) File "/anaconda3/lib/python3.6/site-packages/tensorflow/python/keras/utils/data_utils.py", line 831, in get six.reraise(value.__class__, value, value.__traceback__) File "/anaconda3/lib/python3.6/site-packages/six.py", line 693, in reraise raise value TypeError: 'My_Generator' object is not an iterator
推荐答案
我遇到了同样的问题,我通过定义一个 __next__ 方法设法解决了这个问题:
I was having the same problem, I managed to solve this by defining a __next__ method:
class My_Generator(Sequence): def __init__(self, image_filenames, labels, batch_size): self.image_filenames, self.labels = image_filenames, labels self.batch_size = batch_size self.n = 0 self.max = self.__len__() def __len__(self): return np.ceil(len(self.image_filenames) / float(self.batch_size)) def __getitem__(self, idx): batch_x = self.image_filenames[idx * self.batch_size:(idx + 1) * self.batch_size] batch_y = self.labels[idx * self.batch_size:(idx + 1) * self.batch_size] return np.array([ resize(imread(file_name), (200, 200)) for file_name in batch_x]), np.array(batch_y) def __next__(self): if self.n >= self.max: self.n = 0 result = self.__getitem__(self.n) self.n += 1 return result
请注意,我在 __init__ 函数中声明了两个新变量.
note that I have declared two new variables in __init__ function.