hi all I have two questions regarding partitioned assets I h dagster #ask-community

hi all, I have two questions regarding partitioned...

Clément Masson

01/17/2023, 9:08 AM

hi all, I have two questions regarding partitioned assets. • I have a daily partitioned asset that depends on upstream data. For each day, this asset depends on data ranging from the previous day at 10pm to the next day at 2am. How should I model this ? Can the upstream data be modeled as an asset ? If so, what partition do I use ? • I have dependencies between daily partitioned assets. From what I understand, the default mechanisms is that a partition for a given asset on a given day will depend on upstream's asset's partitions for the same day (or intersecting that day). Is there any way to define a dependency in such a way that a partition for a given day depends on an upstream's partition for the previous day, or even the partition from 7 days before ?

chris

01/17/2023, 10:30 PM

For your first question; I would recommend modeling the upstream data as an asset, you can use time-lagged partitions to represent this (https://twitter.com/s_ryz/status/1603781913134608384) For the second question, you can define a custom partitionmapping to define which partitions from an upstream asset correspond to a particular downstream asset

Clément Masson

01/18/2023, 10:05 AM

• For the first question, I'm not sure how to model this. If the upstream data is a partitioned by day, the downstream asset then depends on partitions for day N-1, N and N+1, so far so good. But if I receive new data for the middle of the day N-1, then it would render partition N of the downstream asset stale, which is not what I want. If the data is for the middle of the day N-1, it should not invalidate day N of the downstream asset. My next solution was to make the upstream asset hourly partitioned, to better reflect those dependencies, but then I can't use the

start_offset

and

end_offset

on the downstream asset, because the upstream and downstream don't have the same partition definition. Maybe a custom Partition mapping would work then, but I get the same problem as for the second question (see below) • I see in the docstring for

PartitionMapping

that

Overriding PartitionMapping outside of Dagster is not supported

. Does this mean that I can technically do it but I'm on my own, or that it will error somewhere ?

chris

01/18/2023, 6:30 PM

For (1) I think the key is that you can define how the partition key maps to the actual data if you use your own io manager. So for your example, I would say that the upstream set of partitions, you could say partition for day N maps to data from 10pm on day N to 2am on day N+1, and then when you are actually loading that data via the io manager, you encode that mapping in how you retrieve data. Does that make sense? For (2) sorry I wasn’t super specific, you can use

TimeWindowPartitionMapping

with like a start offset of N-7 and an end offset of N-6, which would map to 7 days ago for example

Clément Masson

01/19/2023, 7:13 AM

• yeah I think I understand, I'll give it a shot. • okay got it thanks !

Open in Slack

Previous Next