Son Giang
07/14/2022, 10:14 AMdownstream1
for partition 2022-01-01
, Dagster should be able to automatically recognize the upstream dependencies graph of downstream1
and should be able to automatically materialize all upstream dependencies (upstream1
, upstream2
) with partition 2022-01-01
before materialize downstream1
.
Step 2: Materialize the downstream2
for partition 2022-01-01
, Dagster should be able to automatically recognize the partition 2022-01-01
of the upstream2
is already materialized from Step 1. So it will only materialize the downstream2
.
For now, to do the materialize all upstream of an asset. I can only come up with this:
job_1 = define_asset_job(name="job_1", selection=AssetSelection.keys(AssetKey(["downstream1"])).upstream())
job_2 = define_asset_job(name="job_2", selection=AssetSelection.keys(AssetKey(["downstream2"])).upstream())
But this run into the problem of duplicated materialization, when I run job_1
then job_2
the upstream2
will be materialized 2 times, which is a waste of computation power and duplicated data.
I wonder if is there any way to do this? Or if it isn’t, do you think this is something you plan to support in the near future?yuhan
07/14/2022, 8:37 PMsandy
07/17/2022, 11:09 PMSon Giang
07/18/2022, 3:33 AMsandy
07/18/2022, 3:27 PM