https://dagster.io/ logo
#dagster-support
Title
# dagster-support
w

Will Gunadi

05/03/2022, 10:32 PM
How do make multiple repo files available from the dagster grpc server? •
dagster api grpc -h 0.0.0.0 -p 4000 -f repo1.py
Do I just have multiple -f switch or what?
p

prha

05/04/2022, 12:06 AM
I believe you have to stand up a separate grpc server for each repo
d

daniel

05/04/2022, 12:26 AM
You can't pass in multiple files, but you can pass in a module that imports the repos from both of the files
w

Will Gunadi

05/04/2022, 1:42 PM
Still unsure why it uses the grpc instead of just specifying the repo as a local file
@daniel what would that import module look like? Wouldn't all the ops and jobs in those multiple repos be just piled up in Dagit? It would be nice to be able to reload just one of the repos.
d

daniel

05/04/2022, 2:01 PM
Here's an example project that loads code from a module: https://github.com/dagster-io/dagster/tree/master/examples/hacker_news If you'd like to be able to reload them independently, then putting them in separate entries in the workspace.yaml makes sense (that would need to be two containers/grpc servers if you're following the example)
Running the code as grpc servers in their own containers is optional, but there are some benefits to containerization - it allows you to run each job in its own docker container, which prevents jobs from interfering with each other and lets them run in different python environments
w

Will Gunadi

05/04/2022, 2:05 PM
@daniel so I would have something like this?
Copy code
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "bb_repository.py"]
and then
Copy code
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4001", "-f", "repo_2.py"]
d

daniel

05/04/2022, 2:05 PM
that looks right to me, yeah
w

Will Gunadi

05/04/2022, 2:05 PM
inside the Dockerfile_user_code (following the example)
d

daniel

05/04/2022, 2:06 PM
If you want, you could use the same image/Dockerfile that contains both files, then include the command in the docker-compose file instead of including CMD in the dockerfile
w

Will Gunadi

05/04/2022, 2:09 PM
@daniel I'm still trying to understand how the containers interact with each other. Could you give me a pseudocode on what would the CMD look like on the docker-compose file for multiple grpc server/repos?
d

daniel

05/04/2022, 2:13 PM
sure. Do you want to use a single image for both servers? Or a separate image/Dockerfile for each one?
w

Will Gunadi

05/04/2022, 2:14 PM
Let's start with single image. I'm pretty sure the moment I said this, there will be the case for two images LOL!
d

daniel

05/04/2022, 2:17 PM
something like this then (after taking the EXPOSE and CMD out of the Dockerfile)
Copy code
docker_example_user_code:
    build:
      context: .
      dockerfile: ./Dockerfile_user_code
    container_name: docker_example_user_code
    image: docker_example_user_code_image
    restart: always
    environment:
      DAGSTER_POSTGRES_USER: "postgres_user"
      DAGSTER_POSTGRES_PASSWORD: "postgres_password"
      DAGSTER_POSTGRES_DB: "postgres_db"
      DAGSTER_CURRENT_IMAGE: "docker_example_user_code_image"
    command: dagster api grpc -h 0.0.0.0 -p 4000 -f repo1.py
    ports:
      - "4000:4000"
    networks:
      - docker_example_network

  docker_other_example_user_code:
    build:
      context: .
      dockerfile: ./Dockerfile_user_code
    container_name: docker_other_example_user_code
    image: docker_example_user_code_image
    restart: always
    environment:
      DAGSTER_POSTGRES_USER: "postgres_user"
      DAGSTER_POSTGRES_PASSWORD: "postgres_password"
      DAGSTER_POSTGRES_DB: "postgres_db"
      DAGSTER_CURRENT_IMAGE: "docker_example_user_code_image"
    command: dagster api grpc -h 0.0.0.0 -p 4000 -f repo2.py
    ports:
      - "4001:4001"
    networks:
      - docker_example_network
there's probably a smarter way to move the 'build:' part out so it doesn't have to be repeated (or you could build the image separately first)
w

Will Gunadi

05/04/2022, 2:18 PM
I see, so as long as the actual repo1.py and repo2.py is in that one image, you can spawn multiple grpc servers pointing to any of them
d

daniel

05/04/2022, 2:19 PM
yeah
w

Will Gunadi

05/04/2022, 2:19 PM
so in theory, the grpc server could be remote?
d

daniel

05/04/2022, 2:20 PM
in theory, yeah (more common in e.g. kubernetes or ECS, where it's probably on some other server in the same cluster)
w

Will Gunadi

05/04/2022, 2:20 PM
remote as in living in a separate EC2 but accessible via some port (4003) for instance
d

daniel

05/04/2022, 2:20 PM
yeah, definitely possible
w

Will Gunadi

05/04/2022, 2:20 PM
Beautiful!
d

daniel

05/04/2022, 2:21 PM
that's another benefit of running the server separately that i should have mentioned, you can redeploy code without needing to restart or rebuild the whole service
w

Will Gunadi

05/04/2022, 2:21 PM
But I still have to manually "Reload" on Dagit to get the latest, right?
d

daniel

05/04/2022, 2:22 PM
It'll actually automatically update for you
w

Will Gunadi

05/04/2022, 2:22 PM
How would it know when to refresh?
d

daniel

05/04/2022, 2:22 PM
It checks in with the server periodically
w

Will Gunadi

05/04/2022, 2:22 PM
heartbeat, isn't it?
ah, I see 🙂
So this is the key to having Dagit monitor multiple remote repos. I've been looking for an answer to this. You need to document this in a place that is easy to find. This is a great benefit!
d

daniel

05/04/2022, 2:24 PM
I'll definitely surface that - what exactly is the 'this' that unlocked it for you?
w

Will Gunadi

05/04/2022, 2:25 PM
We are a data shop that orchestrate many ETLs for different clients. Of course those ETLs are spread through many servers (EC2s)
d

daniel

05/04/2022, 2:25 PM
ah I see, the ability to run code servers on remote instances. got it
w

Will Gunadi

05/04/2022, 2:26 PM
So I've been looking for a way to have one nice UI (like Dagit) which will allow me to see those separate ETLs
If I setup a grpc server on those separate EC2s, then I can establish a cluster network, so it'd be visible to one Dagit and one dagster-daemon (for schedules)
@daniel I keep getting this dbt error with this docker example. dbt --log-format json run --project-dir /opt/dagster/app/dbt_dev/bb_dbt --profiles-dir /opt/dagster/app/dbt_profiles --models incr.emails --vars {"import_ts":"2022-05-04 151557"}
d

daniel

05/04/2022, 7:34 PM
would you mind making a new post for this? Not a dbt expert
w

Will Gunadi

05/04/2022, 7:34 PM
oh sorry. absolutely
@daniel to continue the saga 🙂 So now I have dagster running on docker. The only wrinkle is when I look at Dagit "Raw compute logs" which usually show me the stdout and stderr, I see nothing, yet at the bottom of the window I see the usual location of the .out files such as:
/opt/dagster/dagster_home/storage/456cc875-97ec-425a-b795-321247888b00/compute_logs/bb_contact_contacts_list_seb_op.out
What do you think is going on here? Is this because Dagit is in a different container than the dagster-daemon?
d

daniel

05/10/2022, 2:27 PM
Yeah, the default compute logs write to the filesystem, which dagit doesn't have access to
you could use a different compute log manager, or mount the compute log directory as a volume so that it's shared across all the containers
w

Will Gunadi

05/11/2022, 11:13 PM
@daniel I have setup the dagster.yaml as such:
Copy code
compute_logs:
  module: dagster.core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /opt/dagster/dagster_home/storage
And map a volume to the host's filesystem in the docker-compose.yaml:
Copy code
volumes: # Make docker client accessible so we can launch containers using host docker
      - /var/run/docker.sock:/var/run/docker.sock
      - ./storage:/opt/dagster/dagster_home/storage
I map the above in both dagit and dagster_daemon containers. But.. when I run a job, the raw log view is still empty, the bottom says:
Copy code
/opt/dagster/dagster_home/storage/08dfbcc7-b8e5-4cb1-890c-9b83b7df5ebd/compute_logs/some_op.out
but I can't find that .out file in the volume (which is why the view is empty). The whole directory is there (up to /compute_logs), just the file is missing. Any thoughts?
d

daniel

05/12/2022, 12:40 AM
the problem is that those mounted volumes aren't included in the containers that are spun up by the run launcher
there's an example here of specifying volumes in the run launcher: https://docs.dagster.io/deployment/guides/docker#mounting-volumes (note that the path has to be absolute)
w

Will Gunadi

05/13/2022, 1:35 PM
@daniel this one seems to work:
Copy code
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      - DAGSTER_POSTGRES_USER
      - DAGSTER_POSTGRES_PASSWORD
      - DAGSTER_POSTGRES_DB
    network: dagster_network
    container_kwargs:
      volumes:
        - /home/user1/dagster_prj/storage:/opt/dagster/dagster_home/storage
condagster 1
11 Views