Furthermore I want to better understand how to make dagit sc dagster #announcements

Furthermore, I want to better understand how to ma...

geoHeil

03/18/2021, 11:31 AM

Furthermore, I want to better understand how to make dagit scale (beyond a single main data engineering team). How can I connect data dependencies (I guess this is called workspace)? I know that some smaller organizations seem to prefer to write everything into a single big DAG which certainly has some great benefits with regards to observability. How could I reconstruct such a E2E full graph from projects originating from different workspaces/teams?

schrockn

03/18/2021, 12:06 PM

Great question. This is an area we are actively working on. We believe a really great way to model cross-team is using asset sensors. Meaning that the two teams in question agree on a data asset that will be an interface between the team (team A produces table X in the data lake, and team B consumes table X). Note that this is a more loosely coupled dependency that an edge in a single graph.

schrockn

03/18/2021, 12:06 PM

To implement this pattern we suggest asset-based sensors. See here in our docs: https://dagster.vercel.app/concepts/partitions-schedules-sensors/sensors#asset-sensors

schrockn

03/18/2021, 12:07 PM

^-- this is our new docs site being pushed out today

🎉 4

schrockn

03/18/2021, 12:07 PM

We currently do not have visualizations for cross-dag deps encoded in asset sensors, but this is something we are thinking about a lot

Gerhard Van Deventer

03/18/2021, 12:10 PM

Hi geoHeill , I think you need to clear between the distinction of dagster and dagit. It's two different applicaitons. Dagit should be used for debugging.

✅ 1

geoHeil

03/18/2021, 1:26 PM

sorry for mixing dagster/dagit.

geoHeil

03/18/2021, 1:27 PM

I think some kind of visual connection - even for loosely coupled (sensor-based) data dependencies would be great.

👍🏻 1

geoHeil

03/18/2021, 1:28 PM

How would a backfilling work in this context? Would the sensor realize that a partition was deleted & updated & filled again? Would it re-trigger downstream (loosely coupled) tasks as only a sensor and not edge is connecting them?

schrockn

03/18/2021, 1:47 PM

Another excellent question. Don’t have any out of the box solutions for you there. However sensors are just arbitrary code so what you describe is to totally possible

geoHeil

03/18/2021, 4:36 PM

good to know.

geoHeil

03/20/2021, 9:10 AM

Would it be possible to have a single big DAG, but instead of putting all the pipelines into a mono repository have unique repositories per team or use-case but somehow be able to centrally combine / register them in a single dag using pip install for each team/use-case?

schrockn

03/20/2021, 12:44 PM

This currently isn’t possible. But would the visualized cross-dag deps with sensors cover this request in your mind?

geoHeil

03/20/2021, 12:52 PM

Well this depends. Currently the sensors would require some (a lot? I do not fully know for sure so far) custom code. As the important thing for me ist to link these cross team dependencies not only during the regular execution (like Oozie triggers / your Sensors), but also in case of backfilling.

geoHeil

03/20/2021, 12:52 PM

But I would guess they are a first and useful step towards this goal.

schrockn

03/20/2021, 1:09 PM

Yup makes sense

Open in Slack

Previous Next