https://dagster.io/ logo
h

Haydar Ali Ismail

11/02/2020, 4:26 PM
Hi! I’m trying to archive dagster and its dependencies as a zip so I can pass it as a
spark-submit
parameter to run it on a cluster mode but now I’m getting this error when the it’s trying to import dagster:
Copy code
from dagster import (
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/__init__.py", line 110, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/core/launcher/__init__.py", line 2, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/core/launcher/cli_api_run_launcher.py", line 6, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/api/execute_run.py", line 2, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/cli/__init__.py", line 8, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/cli/api.py", line 13, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/cli/workspace/__init__.py", line 1, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/cli/workspace/cli_target.py", line 13, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/core/host_representation/__init__.py", line 47, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/core/host_representation/repository_location.py", line 8, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/api/snapshot_execution_plan.py", line 9, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/grpc/__init__.py", line 11, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/dagster/grpc/client.py", line 7, in <module>
  File "/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/grpc/__init__.py", line 23, in <module>
ImportError: cannot import name 'cygrpc' from 'grpc._cython' (/appdata/hdfs/v8/yarn/nm/usercache/hismail/appcache/application_1602600045749_421754/container_e168_1602600045749_421754_01_000009/dependencies.zip/grpc/_cython/__init__.py)
any idea? I have tried to include
grpcio
separately from
dagster
as well but the issue still persists
s

sandy

11/03/2020, 1:16 AM
I haven't tried to distribute dagster via spark-submit before. Unfortunately, when libraries depend on native code (like dagster with grpc), I think it's generally difficult to distribute them through a python zip. Is it possible for you to install dagster on the cluster? Does dagster itself need to be on the cluster? Are you using a step launcher?
h

Haydar Ali Ismail

11/03/2020, 7:42 AM
We are trying to avoid installing the
dagster
itself on the cluster to make the solution as portable as much as we can. I think I do need to have
dagster
because when I try to run the pipeline it complains that it cannot import
dagster
. I am actually trying to make my own step launcher for a self-hosted Spark but having a bit of an issue trying to mimic the step and events stuff based from the Amazon EMR pyspark step launcher. To make a bare minimum step launcher, what are the stuff that has to be implemented?
s

sandy

11/06/2020, 4:44 PM
Apologies for the delay here Haydar. First off, I filed https://github.com/dagster-io/dagster/issues/3199 to track the packaging issue you reported. I also filed https://github.com/dagster-io/dagster/issues/3200 to track potentially adding a step launcher that can launch to an on-prem yarn cluster. Last, I wrote up some instructions on how to run a step launcher: https://github.com/dagster-io/dagster/discussions/3201. It's not easy 😞. But deploying code remotely is never very easy. Another option would be to invoke spark-submit from within the body of your solid.
3 Views