https://dagster.io/ logo
Title
c

Clayton Casey

05/08/2023, 10:11 PM
Are there any examples of a production project that has multiple code locations in it? I'm trying to follow best practices in setting that up. Example directory of what I'm thinking: -> code-locations -> common-resources -> common-ops -> dagster project 1 -> dagster project 2 -> dagster project 3 -> setup.py -> workspace.yaml Is this the right or wrong way to look at this? I am just going to deploy this to an EC2 for now. Any best practices or gotcha's to watch out for?
s

sean

05/09/2023, 1:11 PM
Hi Clayton, is the idea that
common-resources
and
common-ops
are their own code locations, or are just the 3 “dagster projects” the code locations?
c

Clayton Casey

05/09/2023, 1:36 PM
No, just the 3 dagster projects as code locations. The 'common' folders are where I will keep any config files that I will use to create @resources and @ops in my code locations. My goal is to have a common resource.py and common ops.py template that I put into every code location so that my pipelines can be config driven. E.g. 'dagster project 1' has 2 config files, dagster-project-1-resource.yaml and dagster-project-1-ops.yaml. My templates will read from these files to create necessary resources and ops for my 'dagster project 1' pipeline. 'Dagster project 2' would use the same templates in the code location but will have it's own set of config files. What do you think about this? Does this setup seem appropriate in a production environment?
s

sean

05/09/2023, 1:53 PM
Two questions: • Are you intending to put only a single “pipeline” per code location? If “pipeline” here means dagster
job
, that’s not necessary. One code location can hold many jobs. • I’m a little unclear on what you mean by “template” in this context. Can you elaborate?
c

Clayton Casey

05/09/2023, 3:13 PM
1. In this use case, the code location itself may have multiple pipelines or only 1. I want each 'code location' to correspond to a company data project. So 'dagster project 1' may have 1 or multiple piplines, same for 'dagster project 2' etc. Just depends on the specific project. When I say pipeline I just mean 'project'. 2. I should be clearer on this. A template is just a single script that will handle a project's yaml file for me. So it will read the contexts of a yaml file and create these resources for me. E.g 'dagster project 1' will have the resource.py template/script that will read in the config file for this specific code location and create any resources. Could be a database connection, s3 bucket, etc. 'dagster project 2' will have the same resource.py template/script that will read in project 2's specific yaml file and create a completely different set of resources. Does this make sense?
Also if there is a more efficient way of doing this (making my dagster projects config driven) then I am happy to learn!
s

sean

05/09/2023, 3:17 PM
So it sounds like you want to use a factory pattern to create your dagster definitions-- that’s viable, but isn’t recommended unless you are trying to e.g. map some predefined DAG structure into Dagster-world. Why do you want to go through these yaml files instead of just creating/importing the requisiite definitions directly for each code location?
c

Clayton Casey

05/09/2023, 4:04 PM
My thought process was to use the yaml files to make the resource creation 'easier' so I could just add a new dagster project as a code location, include a config.yaml file that tells my resource script what resources I want made. and the resource script would build all my resources when initialized. I would still expose all resources to the project using the Definitions object.
Stepping away from this, how would you organize your dagster production project? This may help me understand the best practice and draw some insights. Let's say you were hosting dagster instance on an EC2 with the schedule and sensor daemon running. How would you organize a complex company wide instance that houses multiple code locations?
s

sean

05/09/2023, 4:21 PM
I think your basic structure of code location in sub-folders makes sense. You might find further insight in our project file reference and AWS deployment guides? • https://docs.dagster.io/getting-started/project-file-referencehttps://docs.dagster.io/deployment/guides/aws I’m still unclear how using YAML files makes resource creation easier than something like below (but I don’t have much context on your resources):
### project 1

from root.common_resources import foo_resource

...

defs = Definitions(
    # ... other defs ...
    resources={"foo_resource": foo_resource}
)