I'm currently evaluating Dagster as a replacement ...
# announcements
f
I'm currently evaluating Dagster as a replacement for Airflow in our team. Currently we have multiple Git repositories that contain code for different aspects of our ETL/ML needs. Each Git-repo can contain multiple pipelines. These pipelines are run regularly in production, either on a schedule or some external trigger. I've been reading the docs, and know that Dagster has the concept of "repositories" to group multiple pipelines. In our case we'd probably define one Dagster-repo for each Git-repo? However, it's my understanding that when I launch dagit I can only point it to a single repo definition file at the time? Is there a way of pointing dagit at multiple repo-definition files? Or will we need to have a separate dagit instance for each Dagster/Git-repo? How do you recommend we'd structure our Dagster repos/pipelines?
m
hi @fred -- we are starting to think through what would be required for dagit to target multiple repositories. in the meantime, assuming that you are using git repos to divide work up among teams / tasks, but you want to run everything using a singular dagster repo / dagit instance, there is no need for a dagster repo to map a single git repo
for instance, you could install several packages into the python environment within which dagit runs (from github, or wherever), and then import pipelines from those packages into a single
RepositoryDefinition
f
Oh cool, didn't think of that approach. I'll try it out with our code base. The loading could probably be made quite dynamic using pkg_resources and smart "pipeline-package" naming. Thanks!