Hi All, We are running into some problems when de...
# announcements
Hi All, We are running into some problems when deploying our containerized dagster pipeline to airflow. We try to follow this example: https://docs.dagster.io/deploying/airflow . When triggering the the DAG in (a local development environment with) airflow the docker container starts up and fails immediately with this error:
Copy code
Usage: dagster api execute_step [OPTIONS] INPUT_JSON
Try 'dagster api execute_step --help' for help.

Error: Got unexpected extra arguments ("ExecuteStepArgs", "instance_ref": {"__class__": "InstanceRef", "compute_logs_data": {"__class__": "ConfigurableClassData", "class_name": "LocalComputeLogManager", "config_yaml": "base_dir: /tmp/storage\n", "module_name": "dagster.core.storage.local_compute_log_manager"}, "custom_instance_class_data": null, "event_storage_data": {"__class__": "ConfigurableClassData", "class_name": "SqliteEventLogStorage", "config_yaml": "base_dir: /tmp/history/runs/\n", "module_name": "dagster.core.storage.event_log"}, "local_artifact_storage_data": {"__class__": "ConfigurableClassData", "class_name": "LocalArtifactStorage", "config_yaml": "base_dir: /tmp\n", "module_name": "dagster.core.storage.root"}, "run_coordinator_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunCoordinator", "config_yaml": "{}\n", "module_name": "dagster.core.run_coordinator"}, "run_launcher_data": {"__class__": "ConfigurableClassData", "class_name": "DefaultRunLauncher", "config_yaml": "{}\n", "module_name": "dagster"}, "run_storage_data": {"__class__": "ConfigurableClassData", "class_name": "SqliteRunStorage", "config_yaml": "base_dir: /tmp/history/\n", "module_name": "dagster.core.storage.runs"}, "schedule_storage_data": {"__class__": "ConfigurableClassData", "class_name": "SqliteScheduleStorage", "config_yaml": "base_dir: /tmp/schedules\n", "module_name": "dagster.core.storage.schedules"}, "scheduler_data": {"__class__": "ConfigurableClassData", "class_name": "DagsterDaemonScheduler", "config_yaml": "{}\n", "module_name": "dagster.core.scheduler"}, "settings": {"backfill": null, "sensor_settings": null, "telemetry": null}}, "pipeline_origin": {"__class__": "PipelinePythonOrigin", "pipeline_name": "hello_cereal_pipeline", "repository_origin": {"__class__": "RepositoryPythonOrigin", "code_pointer": {"__class__": "ModuleCodePointer", "fn_name": "hello_cereal_pipeline", "module": "airflow_test.airflow"}, "container_image": null, "executable_path": "Library/Caches/pypoetry/virtualenvs/churn-metrics-ENa28q3B-py3.8/bin/python"}}, "pipeline_run_id": "manual__2021-03-11T15:04:26.729658+00:00", "retries_dict": {}, "should_verify_step": false, "step_keys_to_execute": ["hello_cereal"]})
Any advice on how we can solve this issue? The DAG we are using is:
Hi Melle - we'll do some investigating but while we do, one hunch - it's possible that this will work if you just remove the ENTRYPOINT from your Dockerfile altogether. Those docs are a bit of out date I think, dagster-graphql shouldn't be required here. Will report back after we've investigated a bit more, but you could try that in the meantime if it's quick to try.
Hi Melle, apologies for the thrash. Here’s the tracking issue: https://github.com/dagster-io/dagster/issues/3831 - will keep you posted
Thanks! I really appriciate that you are looking into this.
Hi @Melle Minderhoud - I was able to get a containerized pipeline working after taking out the ENTRYPOINT (and removing any references to dagster-graphql in the dockerfile since it's no longer needed). The docs definitely need a refresh though. You probably realized this already but the signature of make_airflow_dag_containerized has changed as well (and now takes in the name of the python module and the name of the pipeline). The other thing that gave me a bit of trouble is that I needed to make sure that the container was able to load the DagsterInstance storage (e.g. by making sure it can access your postgres DB or your DAGSTER_HOME folder if you're using sqlite). Let us know if you run into any more issues after updating the entrypoint and we'll let you know when the docs are updated.
Hi @daniel and @yuhan, I did not manage to get this to work and am I am currently waiting for the docs to be updated. Do you think it will be updated on the short-term or is it currently a low priority issue?
Hi Melle - are you still running into the same issue after removing the ENTRYPOINT? If there’s a new error you’re running into I’d be happy to take a look while we’re waiting for the docs to be sorted out
Hi Daniel, thanks for your help. I think I messed up something during debugging, because with a clean install, removing the the graphql/entrypoint references and setting the python path in the Dockerfile I managed to get a simple example working. I still have a question though about the intended use of containerized dagster pipelines on airflow. My main goal was to avoid having to install all dependencies in the global airflow environment and moving them to the docker image instead. However
imports the module when creating the dag & tasks so all the dependecies still need to be available in the (global) airflow environment. Is this correct or am I missing something here?
Ah glad its working! Unfortunately your understanding is correct - using dagster without airflow doesn't have this issue (dagit and other dagster system processes never load the pipeline code directly in the system environment), but the dagster airflow integration does import the module and load the pipeline code in the airflow environment to construct the DAG.