Is there a way to define in code a job containging multiple dagster #ask-community

Is there a way to define in code a job containging...

Caleb Parnell Lampen

05/02/2023, 10:20 PM

Is there a way to define in code a job containging multiple assets with different partitions? Or to similarly materialize a set of related assets that have different (but mapped) partitions? In my case, I'm using static partitions. Simplified, the setup is: Asset A: PartitionDef 1 Asset B: ParititionDef 2 Asset C: ParititionDef 2 Asset C downstream of both Asset A and Asset B. There is a one to many mapping defined between PartitionDef1 and PartitionDef2. I think the answer is "no", but I wanted to check. The error when I try to select all these to materialize in Dagit is "they must share a partition at the root node". Does that mean I could make some kind of placeholder asset that is upstream of Asset A and AssetB that lets me materialize the group with a correctly selected ParitionDefinition and mapping?

sandy

05/03/2023, 12:09 AM

Does that mean I could make some kind of placeholder asset that is upstream of Asset A and AssetB that lets me materialize the group with a correctly selected ParitionDefinition and mapping?

Yes. When you launched it, it would launch a backfill (essentially a collection of runs, one per partition) instead of a single run.

sandy

05/03/2023, 12:10 AM

Also, if you'd be up for filing a github issue with the functionality that you're looking for (details about the kinds of PartitionsDefinitions would be helpful), we might be able to add this at some point

Caleb Parnell Lampen

05/03/2023, 3:17 PM

Thanks sandy. I'll try this out, and think about the details of the issue I'd want to file. Intuitively, I'd want to be able to select any arbitrary set of linked assets with multiple partition definition, and launch an ad-hoc materialization of them. The software should be able to trace any mappings between those partitions and execute the ops in the correct order. Same logic would apply to defining a job in software with an arbitrary asset selection with different partitions. For example, with the above asset definitions, doing an asset selection of *AssetC is invalid, I think since AssetA has a different partition definition. This means users cannot easily materialize all these related assets together. They have to materialize AssetA in one run, and then wait for it to finish, and then materialize AssetB and AssetC in another run.

Caleb Parnell Lampen

05/09/2023, 5:47 PM

Okay, I just got to trying this workaround. Some notes: I made a new DummyRootAsset that does nothing, and uses non-argument dependencies to put it upstream of AssetA and AssetB. This now lets me create a software defined job with all the assets in the "tree", as well as usefully click on "materialize all". An important caveate is that "Materialize All" seamingly only let you materialize based upon the PartitionDefinition of "DummyRootAsset". So, I have to decide a priori if users will find materializing based upon PartitionDef1 or PartitionDef2 more useful.

Caleb Parnell Lampen

05/09/2023, 5:57 PM

So, in general, this is a successful workaround. It is a bit weird to need to make a "dummy" asset that does nothing but give me a root node on the asset graph. And, there is the slight interface issue mentioned above that I can only use the root nodes partitions to base my materializations against.

18 Views

Open in Slack

Previous Next