https://dagster.io/ logo
Title
m

Mike Davison

11/24/2021, 8:35 PM
I've got a question regarding module resolution within an op that uses multiprocessing.Pool. Given this scenario: • dagster or dagit is running in WorkingDirectoryA with VirtualEnvA •
my_job
is configured in workspace.yaml to run in WorkingDirectoryB with VirtualEnvB •
my_job
runs an op which executes
some_module.some_method
in parallel via
multiprocessing.Pool
some_module
is in WorkingDirectoryB/some_module.py (no package) Actual behaviour: • dagit or dagster looks for
some_module
in WorkingDirectoryA and VirtualEnvA ◦ when WorkingDirectoryA == WorkingDirectoryB, everything is great ◦ when WorkingDirectoryA != WorkingDirectoryB, we fail to find
some_module
w/`ModuleNotFoundError` Desired behaviour: • dagit or dagster looks for
some_module
in WorkingDirectoryB and VirtualEnvB • dagit or dagster does not look for
some_module
in WorkingDirectoryA and VirtualEnvA Is there a way to get the desired above behaviour to happen? I'll reply to this with some code to illustrate.
s

Stefan Adelbert

11/25/2021, 6:51 AM
@Mike Davison I ran into a somewhat similar issue when attempting to get a run worker to instantiate a custom logging handler. I declared the logging handler in
dagster.yaml
, something like this
python_logs:
  dagster_handler_config:
    handlers:
      myHandler:
        (): my_logging.Handler
        level: DEBUG
        formatter: myFormatter
    formatters:
      myFormatter:
        (): my_logging.Formatter
When the run launched on the run worker it was trying to import
my_logging
, but couldn't (
ModuleNotFoundError
) even though the
my_logging
module was actually there. I resolved this by installing
my_logging
as a python package in the docker image used for the run worker. I know this doesn't actually answer your question. I'd been keen to hear what you find out about this.
m

Mike Davison

11/25/2021, 8:25 PM
Thanks, @Stefan Adelbert. I am also able to get around my issue by installing the module as a package. I find that packaging is an inconvenience when the module is under active development 😢 ...certainly not the end of the world, but I'm still hoping to hear that there's another approach that would work.
s

Stefan Adelbert

11/28/2021, 11:24 PM
@Mike Davison I have a similar issue with several packages under active development, i.e. dagster user code (ops) which use common functionality (like a logger). The way I'm solving that (for now) is to have the common functionality pulled into the user code codebase as a git submodule. And then the user code (and the corresponding common functionality) get packaged together into a docker image. This approach allows user code and common functionality to be hacked on at the same time in development, but also strict packaging. Let me know if you want to know more or if you have any tips.
❤️ 1