Daniel Gafni
03/14/2023, 4:54 PMAttributeError: Can't pickle local object 'SeqDataset.make_collate_fn.<locals>.collate_fn'
This is happening inside the torch.DataLoader
, I'm using multiprocessing and num_workers
> 0. But this happens even with num_workers=0
.
This did not happen outside of Dagster, so I assume it has something to do with Dagster's Multiprocess Executor
. Sadly, my understanding of multiprocessing is not the best, so I'm stuck here. I was unable to google anything relevant. Would appreciate any help...
The last lines of the error:
File "/usr/lib/python3.10/multiprocessing/process.py", line 121, in start
self._popen = self._Popen(self)
File "/usr/lib/python3.10/multiprocessing/context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
File "/usr/lib/python3.10/multiprocessing/context.py", line 288, in _Popen
return Popen(process_obj)
File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 32, in __init__
super().__init__(process_obj)
File "/usr/lib/python3.10/multiprocessing/popen_fork.py", line 19, in __init__
self._launch(process_obj)
File "/usr/lib/python3.10/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/usr/lib/python3.10/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'SeqDataset.make_collate_fn.<locals>.collate_fn'
yuhan
03/17/2023, 12:47 AMexecution:
config:
in_process: null
Daniel Gafni
03/17/2023, 7:25 AMsandy
03/17/2023, 4:47 PMin production K8s pods it won’t matter, right?Right Btw, I suspect what's going on here is that the asset function is accessing something defined at a scope outside the asset function. When the python multiprocessing library forks a new process, it needs to bring that object into the new process, so it tries to pickle is, but that thing isn't pickle-able
Daniel Gafni
03/17/2023, 5:53 PMmake_collate_fn
method that has a def collate_fn
inside. It's a closure for some variables.
I wonder why this isn't an issue for torch's multiprocessing?
I refactored the function and it's working nowsandy
03/17/2023, 6:11 PM