Hello, I have a dagit and daemon server running. ...
# ask-community
a
Hello, I have a dagit and daemon server running. I have a dagster grpc server running. Till today, a new job have been launching in same grpc container as a process. I'm trying to move towards launching each job as a new container from the grpc server. I changed dagster.yaml file to have..
Copy code
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    postgres_db:
      username:
        env: DAGSTER_PG_USERNAME
      password:
        env: DAGSTER_PG_PASSWORD
      hostname:
        env: DAGSTER_PG_HOSTNAME
      db_name:
        env: DAGSTER_PG_DB_NAME
      port: 5432
    container_kwargs:
      volumes: # Make docker client accessible to any launched containers as well
        - /var/run/docker.sock:/var/run/docker.sock
        - /tmp/io_manager_storage:/tmp/io_manager_storage
I'm able to launch a container with same image as grpc server but job always fails with the error inside the container. What am I missing or doing wrong? Error:
Copy code
0it [00:00, ?it/s]
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/venv/lib/python3.8/site-packages/dagster/cli/api.py", line 355, in _execute_step_command_body
    check.inst(
  File "/venv/lib/python3.8/site-packages/dagster/_check/__init__.py", line 675, in inst
    raise _type_mismatch_error(obj, ttype, additional_message)
dagster._check.CheckError: Object None is not a PipelineRun. Got None with type <class 'NoneType'>. Pipeline run with id '5a4d8e00-1bb9-4dd3-a833-f8001222a923' not found for step execution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/venv/bin/dagster", line 8, in <module>
    sys.exit(main())
  File "/venv/lib/python3.8/site-packages/dagster/cli/__init__.py", line 50, in main
    cli(auto_envvar_prefix=ENV_PREFIX)  # pylint:disable=E1123
  File "/venv/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/venv/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/venv/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/venv/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/venv/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/venv/lib/python3.8/site-packages/dagster/cli/api.py", line 335, in execute_step_command
    for event in _execute_step_command_body(
  File "/venv/lib/python3.8/site-packages/dagster/cli/api.py", line 421, in _execute_step_command_body
    yield instance.report_engine_event(
  File "/venv/lib/python3.8/site-packages/dagster/core/instance/__init__.py", line 1513, in report_engine_event
    check.invariant(
  File "/venv/lib/python3.8/site-packages/dagster/_check/__init__.py", line 1455, in invariant
    raise CheckError(f"Invariant failed. Description: {desc}")
dagster._check.CheckError: Invariant failed. Description: Must include either pipeline_run or pipeline_name and run_id
o
hi @Ashwin Jiwane! That error typically indicates that the process that the run is executing in (i.e. the docker container) does not have access to the same DagsterInstance that you're launching runs from. my initial guess is that your Postgres database is not running on the same network as your docker container, or that the
dagster.yaml
file is not being copied into the image you're using for your code. https://github.com/dagster-io/dagster/tree/1.1.6/examples/deploy_docker has a docker-compose file that will spin up the postgres database on the same network, as well as an example Dockerfile_dagster
a
Thanks @owen for your response. The error message is not very helpful to know that. I was alluding to that too. I have been following that example setup. For now, I have been testing all only on local machine so there is really no network involved yet. In general, this is setup I have: Existing setup: • dagit running in a container • grpc server running in a container (there are many other various grpc servers and separate containers of their own) • every job run in a grpc server is launching as a process in the grpc container. New setup I'm trying to get: • dagit running in a container. • grpc server running in a container (there are many other various grpc servers and separate containers of their own) • every job run in a grpc server gets launched as a new container with all env variables available in that grpc container. Where am I at with the new setup: every job run in a grpc server gets launched as a new container. However, new container fails immediately after launch with the error mentioned in the main post. Specific env variables from grpc container are missing in the new job container. I believe I will have to add them during container run but not really sure where those gets supplied. I tried to set them up at run_launcher but no success. I'm running all this in local dev machine so there is really no network involved. I'm also on version
0.15.5
for all dagster libraries. May be new version have solved much of these issues 🤔
• I made
dagster.yaml
available in the
grpc-image
at
/opt/dagster/dagster_homes/default/dagster.yaml
• I upgraded to
dagster==1.0.17
and
dagster-docker==0.16.17
• On local machine it's local-storage for dagster. I still get the same error
d
Hi Ashwin - are you able to get the deploy_docker example working without any changes? Or do you see the same error when you run that example out of the box?
Does your dagster.yaml also point the storages at postgres like in the example? https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/dagster.yaml#L23-L63
a
Hi @daniel I haven't tried to run the deploy_docker example yet, I can try that. dagster.yaml does point to postgres storage in dev and production environment. however, for local machine it is using local file storage. let me try to setup postrgres container locally.
d
I don't think your Docker containers have access to your local file storage, which would explain the error that you're seeing
👍 1
(or at least, to the local file storage where your runs are being stored)
a
Hello Daniel, Sorry for the late reply. I was on vacation for sometime. I'm trying to address this again. I was able to upgrade everything to latest versions. I am able to use postgres as dagster DB instead of local storage files. The user-code container can connect to postgresDB. All job info/metadata/runs are stored in postgres DB. I'm seeing something new error which I can't seem to get around. I have a separate API server which launches a dagster job by making API call to dagit. It is failing with
Copy code
dagster_graphql.client.utils.DagsterGraphQLClientError: ('PythonError', "docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))\n")
error. Docker service is running, all containers are running. If I remove the run_launcher config from dagster.yaml and relaunch everything, API server is able to launch a dagster job by making API call to dagit. This is the run_launcher I have
Copy code
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      - DAGSTER_PG_USERNAME
      - DAGSTER_PG_PASSWORD
      - DAGSTER_PG_DB_NAME
      - DAGSTER_PG_HOSTNAME
      - IO_MANAGER_S3_BUCKET
    container_kwargs:
      volumes: # Make docker client accessible to any launched containers as well
        - /var/run/docker.sock:/var/run/docker.sock
        - /tmp/io_manager_storage:/tmp/io_manager_storage
Ah, I'm able to resolve this by mounting to all images
- /var/run/docker.sock:/var/run/docker.sock
now, I'm getting new error.. When a new container is launched on a job run, it started to throw new error..
Copy code
(psycopg2.OperationalError) could not translate host name
I can connect to postgres from the user-code container tho 🤔
ah it seems like it must have the network defined.. I did that and it works locally, everything is working fine locally..
I'm going to start testing this setup in dev aws env and then finally in prod. I wonder if the networking setup will even work with how our nomad orchestration is setup. In addition, there are env variables which are setup during docker-compose and CI setup for user-code image. How can these env variables be available for the job docker container that gets spin up from the user-code. I'm not able to figure that out. The env variables which are available at dagit container are easily available via
run_launcher
Copy code
run_launcher:
  config:
    env_vars:
     - S3_DATABASE_URL
     -. ....
d
Right now to include those env cars they need to be set in the run launcher yaml and available in the container that launches the run (if you don’t set their values). If you use Kubernetes or ecs there are better ways to pass share secrets between the user code containers and the launched runs
a
Right now to include those env cars they need to be set in the run launcher yaml and available in the container that launches the run (if you don’t set their values).
That's not scalable as we have multiple env variables per grpc server and there are multiple grpc servers. It would run into thousands of env variables.
If you use Kubernetes or ecs there are better ways to pass share secrets between the user code containers and the launched runs
We are using nomad. Yes, shared secrets gets used using those configs and env variables are set like that but when job-container is launched from grpc container those env variables are not available in the job container as you mentioned they need to be available at run launcher yaml. Is there a better way ? 🤔
d
some other possibilities - a) use the DefaultRunLauncher instead of the DockerRunLauncher, which will launch each run on the gRPC server (but has some other downsides - each run will no longer be in a separate container) b) bake the env var values into the image (which isn't a very good security practice) c) kind of verbose, but you can set the DAGSTER_CONTAINER_CONTEXT environment variable on the user code container to a JSON string that will tell it to pass through environment variable values to the launched runs
Copy code
'{"docker": "env_vars": ["FOO_ENV_VAR=bar_value", "OTHER_ENV_VAR=other_value"] }'
a
a) use the DefaultRunLauncher instead of the DockerRunLauncher, which will launch each run on the gRPC server (but has some other downsides - each run will no longer be in a separate container)
Yeah, we have this behaviour now and we want to avoid launching jobs in same container. Preferring to launch them in separate container. Thanks! I will look into options b) and c)
condagster 1
b) bake the env var values into the image
ah, this option will not work ! breaks all security principles we have 🙂
the verbose option looks quite cumbersome too. It might be a good feature request for dagster to work on. The option to launch jobs in separate containers is required for scaling and any production system will have many different env variables to pass on
d
How are you imagining that it would pass around the environment variables around?
a
Ideally, the job container which is getting launched from user-code container would inherit all env variables as it is as that's the whole purpose of launching job in separate container.
d
You would want the daemon to copy over the full contents of os.environ? Even the ones that aren't related to the dagster job?
a
yeah, mostly because all env variables set in that container are for the purpose of running dagster jobs.
d
Ok, that's good to know - for now I think one of the options I mentioned is your best bet (or using a deployment option like kubernetes that has built-in support for k8s secrets / AWS secretsmanager)
if you wouldn't mind filing a feature request issue for what you're hoping for with the docker setup that would be helpful for us to prioritize future improvements
👍 1