Hi, I am working on a ray executor for Dagster.. ...
# ask-community
o
Hi, I am working on a ray executor for Dagster.. Everything seems to run fine when running ray and Dagster on the same host. but if i put them both in containers with
--network host
I get an error from ray in the first run and then every run after that there is no updates happening in the dagit UI but I am getting log events showing in dagit which indicates that the steps are completing If I restart dagit then the error returns for a single run (while using a temporary working directory) The error which looks to be coming from ray is
Copy code
TypeError: __init__() takes 2 positional arguments but 8 were given
  File "/opt/conda/lib/python3.8/site-packages/dagster/core/execution/api.py", line 822, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/app/ragster/executor_3.py", line 213, in execute
    event_or_none = next(step_iter)
  File "/app/ragster/executor_3.py", line 324, in execute_step_out_of_process
    is_finished = wait_fetch(finished)
  File "/app/ragster/executor_3.py", line 305, in wait_fetch
    return ray.get(obj)
  File "/opt/conda/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
    return getattr(ray, func.__name__)(*args, **kwargs)
  File "/opt/conda/lib/python3.8/site-packages/ray/util/client/api.py", line 43, in get
    return self.worker.get(vals, timeout=timeout)
  File "/opt/conda/lib/python3.8/site-packages/ray/util/client/worker.py", line 433, in get
    res = self._get(to_get, op_timeout)
  File "/opt/conda/lib/python3.8/site-packages/ray/util/client/worker.py", line 457, in _get
    err = cloudpickle.loads(chunk.error)
--- I have essentially just copy pasted the multiprocessing executor and adapted it to use Ray actors and I don't really have too good of an undertstanding of the dagster internals so maybe this is something obvious..? I'm guessing the UI not updating is because the default sqlite database isnt accessible from the ray host and these entries are getting written directly from the step worker ~threads~actors any ideas here appreciated 😄
oh yea sharing the
DAGSTER_HOME
between the containers seems to fix the UI issue so a hosted database would fix that I guess
c
Hey Oliver! Very cool that you're making a ray executor. The above error, does that happen when you load dagit, or when you launch a run from dagit?
o
Thanks! mostly just hacking apart the multiprocessor executor really 😅 seems to be working though In the very first run. I think it is an exception related to not finding the sqlite database from where the executor is running. and I guess dagster makes the db if it doesn't exist at which point the events can be seen returning but the database that events are being written to isn't hooked up to dagit