hello, i have seen this issue: <https://github.com...
# announcements
f
hello, i have seen this issue: https://github.com/dagster-io/dagster/issues/3018 .. we would have more or less the same thing: we would like to have one dagster instance, leave it running and deploy new repo's without touching the dagster instance (created from helm). the approach mentioned in the issue (subcharts) would not allow for this, because this way the usercode deployments need to be known at installation time (look at the current code in the helm chart: it uses for-loops in the template). i think there should be a way (maybe even using a custom resource DagsterRepository) to deploy new repo's to a dagster instance
c
thanks for sharing your use case! we would love to better understand how you use repos in your org, either in this slack thread or we can jump on a call? cc @rex who is working on 3018
f
we did a small POC with airflow here https://github.com/airflow-helm/charts/issues/79 but i think dagster with usercode would be vastly superior
the idea would be this: companies we work for have a data/BI/ops team that work for multiple internal customers. so over time, lots of big and small pipelines get developed for various teams. we have seen at multiple places the effort to reach the perfect equilibrium between a big central control/monitoring/lineage and the flexibility of multiple independent projects
so i'm a data engineer, and i work on a dagster pipeline. i have a local k3s or microk8s with dagster running on it and i use a tool like skaffold to live-edit my pipeline. off course, on my local k3s/microk8s i don't have all pipelines running of my whole organisation. but my deliverable would be a small project that can be deployed to the big company-wide test/prod cluster (as a helm chart). when i'm finished developing locally i can just deploy my helm chart to the cluster. i don't touch the dagster installation or the helm charts created by my fellow developers
so my little project would be in its own gitrepo (basically python code, docker files, skaffold yaml and helm chart). this could be deployed to any kubernetes cluster that runs the official dagster helmchart.
from the documentation: A User Code Deployment runs a gRPC server and responds to Dagit's requests for information (such as: "List all of the pipelines in each repository" or "What is the dependency structure of pipeline X?"). The user-provided image for the User Code Deployment must contain a repository definition and all of the packages needed to execute pipelines / schedules / sensors / etc within the repository.
so i guess little would be needed to make this possible
c
to check my understanding, does this mean you’d like to have a new repository per new project? (historically, we’ve noticed that users tend to use one repository per team and that adding new repositories is a very rare workflow)
we have considered designing for a more dynamically modifiable list of repositories in a workspace, but havent prioritized the work yet because we havent had a user request it
f
Yes indeed..
but the user code deployments are already there to support the "multi repo" scenario, no ?
c
yup the user code deployments are created to support the multi-repo scenario. i think your use case requires optimizing for user code deployment creation more than we currently do and would require architectural changes (probably storing workspace’s repositories in a db instead of a version controlled file). would you like to file a gh issue to track this? we can look into this in a future release
v
Frank's usage pattern would also be a common in our org considering how we plan to deploy dagster. We were planning on deploying the UCDs independently, and modify the
workspace.yaml
in a configmap, but it's not ideal as we would have to restart dagit. In a chat with @daniel he mentioned that hot reloading of repos was feasible given the internal dagit architecture, so it was just a matter of nailing the API.