How do make multiple repo files available from the...
# ask-community
w
How do make multiple repo files available from the dagster grpc server? ā€¢
dagster api grpc -h 0.0.0.0 -p 4000 -f repo1.py
Do I just have multiple -f switch or what?
p
I believe you have to stand up a separate grpc server for each repo
d
You can't pass in multiple files, but you can pass in a module that imports the repos from both of the files
w
Still unsure why it uses the grpc instead of just specifying the repo as a local file
@daniel what would that import module look like? Wouldn't all the ops and jobs in those multiple repos be just piled up in Dagit? It would be nice to be able to reload just one of the repos.
d
Here's an example project that loads code from a module: https://github.com/dagster-io/dagster/tree/master/examples/hacker_news If you'd like to be able to reload them independently, then putting them in separate entries in the workspace.yaml makes sense (that would need to be two containers/grpc servers if you're following the example)
Running the code as grpc servers in their own containers is optional, but there are some benefits to containerization - it allows you to run each job in its own docker container, which prevents jobs from interfering with each other and lets them run in different python environments
w
@daniel so I would have something like this?
Copy code
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4000", "-f", "bb_repository.py"]
and then
Copy code
CMD ["dagster", "api", "grpc", "-h", "0.0.0.0", "-p", "4001", "-f", "repo_2.py"]
d
that looks right to me, yeah
w
inside the Dockerfile_user_code (following the example)
d
If you want, you could use the same image/Dockerfile that contains both files, then include the command in the docker-compose file instead of including CMD in the dockerfile
w
@daniel I'm still trying to understand how the containers interact with each other. Could you give me a pseudocode on what would the CMD look like on the docker-compose file for multiple grpc server/repos?
d
sure. Do you want to use a single image for both servers? Or a separate image/Dockerfile for each one?
w
Let's start with single image. I'm pretty sure the moment I said this, there will be the case for two images LOL!
d
something like this then (after taking the EXPOSE and CMD out of the Dockerfile)
Copy code
docker_example_user_code:
    build:
      context: .
      dockerfile: ./Dockerfile_user_code
    container_name: docker_example_user_code
    image: docker_example_user_code_image
    restart: always
    environment:
      DAGSTER_POSTGRES_USER: "postgres_user"
      DAGSTER_POSTGRES_PASSWORD: "postgres_password"
      DAGSTER_POSTGRES_DB: "postgres_db"
      DAGSTER_CURRENT_IMAGE: "docker_example_user_code_image"
    command: dagster api grpc -h 0.0.0.0 -p 4000 -f repo1.py
    ports:
      - "4000:4000"
    networks:
      - docker_example_network

  docker_other_example_user_code:
    build:
      context: .
      dockerfile: ./Dockerfile_user_code
    container_name: docker_other_example_user_code
    image: docker_example_user_code_image
    restart: always
    environment:
      DAGSTER_POSTGRES_USER: "postgres_user"
      DAGSTER_POSTGRES_PASSWORD: "postgres_password"
      DAGSTER_POSTGRES_DB: "postgres_db"
      DAGSTER_CURRENT_IMAGE: "docker_example_user_code_image"
    command: dagster api grpc -h 0.0.0.0 -p 4000 -f repo2.py
    ports:
      - "4001:4001"
    networks:
      - docker_example_network
there's probably a smarter way to move the 'build:' part out so it doesn't have to be repeated (or you could build the image separately first)
w
I see, so as long as the actual repo1.py and repo2.py is in that one image, you can spawn multiple grpc servers pointing to any of them
d
yeah
w
so in theory, the grpc server could be remote?
d
in theory, yeah (more common in e.g. kubernetes or ECS, where it's probably on some other server in the same cluster)
w
remote as in living in a separate EC2 but accessible via some port (4003) for instance
d
yeah, definitely possible
w
Beautiful!
d
that's another benefit of running the server separately that i should have mentioned, you can redeploy code without needing to restart or rebuild the whole service
w
But I still have to manually "Reload" on Dagit to get the latest, right?
d
It'll actually automatically update for you
w
How would it know when to refresh?
d
It checks in with the server periodically
w
heartbeat, isn't it?
ah, I see šŸ™‚
So this is the key to having Dagit monitor multiple remote repos. I've been looking for an answer to this. You need to document this in a place that is easy to find. This is a great benefit!
d
I'll definitely surface that - what exactly is the 'this' that unlocked it for you?
w
We are a data shop that orchestrate many ETLs for different clients. Of course those ETLs are spread through many servers (EC2s)
d
ah I see, the ability to run code servers on remote instances. got it
w
So I've been looking for a way to have one nice UI (like Dagit) which will allow me to see those separate ETLs
If I setup a grpc server on those separate EC2s, then I can establish a cluster network, so it'd be visible to one Dagit and one dagster-daemon (for schedules)
@daniel I keep getting this dbt error with this docker example. dbt --log-format json run --project-dir /opt/dagster/app/dbt_dev/bb_dbt --profiles-dir /opt/dagster/app/dbt_profiles --models incr.emails --vars {"import_ts":"2022-05-04 151557"}
d
would you mind making a new post for this? Not a dbt expert
w
oh sorry. absolutely
@daniel to continue the saga šŸ™‚ So now I have dagster running on docker. The only wrinkle is when I look at Dagit "Raw compute logs" which usually show me the stdout and stderr, I see nothing, yet at the bottom of the window I see the usual location of the .out files such as:
/opt/dagster/dagster_home/storage/456cc875-97ec-425a-b795-321247888b00/compute_logs/bb_contact_contacts_list_seb_op.out
What do you think is going on here? Is this because Dagit is in a different container than the dagster-daemon?
d
Yeah, the default compute logs write to the filesystem, which dagit doesn't have access to
you could use a different compute log manager, or mount the compute log directory as a volume so that it's shared across all the containers
w
@daniel I have setup the dagster.yaml as such:
Copy code
compute_logs:
  module: dagster.core.storage.local_compute_log_manager
  class: LocalComputeLogManager
  config:
    base_dir: /opt/dagster/dagster_home/storage
And map a volume to the host's filesystem in the docker-compose.yaml:
Copy code
volumes: # Make docker client accessible so we can launch containers using host docker
      - /var/run/docker.sock:/var/run/docker.sock
      - ./storage:/opt/dagster/dagster_home/storage
I map the above in both dagit and dagster_daemon containers. But.. when I run a job, the raw log view is still empty, the bottom says:
Copy code
/opt/dagster/dagster_home/storage/08dfbcc7-b8e5-4cb1-890c-9b83b7df5ebd/compute_logs/some_op.out
but I can't find that .out file in the volume (which is why the view is empty). The whole directory is there (up to /compute_logs), just the file is missing. Any thoughts?
d
the problem is that those mounted volumes aren't included in the containers that are spun up by the run launcher
there's an example here of specifying volumes in the run launcher: https://docs.dagster.io/deployment/guides/docker#mounting-volumes (note that the path has to be absolute)
w
@daniel this one seems to work:
Copy code
run_launcher:
  module: dagster_docker
  class: DockerRunLauncher
  config:
    env_vars:
      - DAGSTER_POSTGRES_USER
      - DAGSTER_POSTGRES_PASSWORD
      - DAGSTER_POSTGRES_DB
    network: dagster_network
    container_kwargs:
      volumes:
        - /home/user1/dagster_prj/storage:/opt/dagster/dagster_home/storage
condagster 1