Furthermore, I want to better understand how to ma...
# announcements
Furthermore, I want to better understand how to make dagit scale (beyond a single main data engineering team). How can I connect data dependencies (I guess this is called workspace)? I know that some smaller organizations seem to prefer to write everything into a single big DAG which certainly has some great benefits with regards to observability. How could I reconstruct such a E2E full graph from projects originating from different workspaces/teams?
Great question. This is an area we are actively working on. We believe a really great way to model cross-team is using asset sensors. Meaning that the two teams in question agree on a data asset that will be an interface between the team (team A produces table X in the data lake, and team B consumes table X). Note that this is a more loosely coupled dependency that an edge in a single graph.
To implement this pattern we suggest asset-based sensors. See here in our docs: https://dagster.vercel.app/concepts/partitions-schedules-sensors/sensors#asset-sensors
^-- this is our new docs site being pushed out today
🎉 4
We currently do not have visualizations for cross-dag deps encoded in asset sensors, but this is something we are thinking about a lot
Hi geoHeill , I think you need to clear between the distinction of dagster and dagit. It's two different applicaitons. Dagit should be used for debugging.
sorry for mixing dagster/dagit.
I think some kind of visual connection - even for loosely coupled (sensor-based) data dependencies would be great.
👍🏻 1
How would a backfilling work in this context? Would the sensor realize that a partition was deleted & updated & filled again? Would it re-trigger downstream (loosely coupled) tasks as only a sensor and not edge is connecting them?
Another excellent question. Don’t have any out of the box solutions for you there. However sensors are just arbitrary code so what you describe is to totally possible
good to know.
Would it be possible to have a single big DAG, but instead of putting all the pipelines into a mono repository have unique repositories per team or use-case but somehow be able to centrally combine / register them in a single dag using pip install for each team/use-case?
This currently isn’t possible. But would the visualized cross-dag deps with sensors cover this request in your mind?
Well this depends. Currently the sensors would require some (a lot? I do not fully know for sure so far) custom code. As the important thing for me ist to link these cross team dependencies not only during the regular execution (like Oozie triggers / your Sensors), but also in case of backfilling.
But I would guess they are a first and useful step towards this goal.
Yup makes sense