The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

One question around branch deployments: <https://docs.dagster.io/guides/dagster/branch_deployments> in DBT it is possible to only materialize the changes to the schema of the branch (to not require duplication of the data, and wasting time for complex materializations of a full TBs sized DWH). Dagster seems to prefer to always construct a full copy. And perhaps with Snowflakes cloning this is even feasible. However, in general with other storage engines it will be impossible to change sourcing all the data. I think you somehow will have to be able to support to only write/read changes from the branch`s schema. How should an IO manager potentially handle reading from master but writing to the specific branch name in an intelligent way?

Hi geoHeil. This is a good question. One way I can think of to structure this code is to have two different IO managers, one that reads from master (and doesn't write back to master) and one that writes to the branch's schema.

You could specify the default job IO manager to be the IO manager that writes to the branch's schema, and add per-input IO manager for the inputs that need to be loaded from master. <https://docs.dagster.io/concepts/io-management/io-managers#per-input-io-manager|(Example)>

This sounds like a potential way forward. I think many people will face this problem. What do you think about updating or adding such an example to the branch deployment code snippets in the documentation?

But I think there is even more to it - also conceptually. Think about 3 assets source transformation and downstream. 

Let's assume the source does not change an should be read from master. The transformation changes and is writing to the branch. 

What about downstream? Who is already handling such dependencies (code not changed asset stale) to write to the branch as well?

Hmm, not sure if I'm completely following. I think the following should happen:
• Source loads from master and outputs to branch
• Transformation and downstream load from branch and output to branch

I agree that this would be a helpful snippet to add to either this guide or a future guide, let me file an issue

<https://github.com/dagster-io/dagster/issues/9462>