Binoy Shah
03/01/2023, 2:01 PMsandy
03/03/2023, 12:45 AMclaire
03/03/2023, 5:24 PMTimeWindowPartitionMapping
to map different time dimensions to each other, which currently does not exist right now. Would you mind filing an issue for this?sri raghavan
03/10/2023, 11:01 PM("account_id", "day")
, our downstream asset is ("account_id", "month")
, and we'd like the downstream "month" asset to depend on the last upstream day of that month's asset. I'm currently working on a custom mapping class for this (seems like it should be straightforward?), but a natively-supported multi-mapping probably avoids the trial-and-error I'm about to go through learning this API 🙂claire
03/10/2023, 11:02 PMsri raghavan
03/10/2023, 11:02 PMBinoy Shah
03/10/2023, 11:06 PMclaire
03/10/2023, 11:06 PMsri raghavan
03/10/2023, 11:07 PMBinoy Shah
03/10/2023, 11:07 PMclaire
03/10/2023, 11:10 PMBinoy Shah
03/10/2023, 11:14 PMsri raghavan
03/17/2023, 10:07 AMaccount_id
, month
)
• downstream is also partitioned by (account_id
, month
), but this time depends on all preceding months [the difference between "all" and "all preceding" isn't important here -- we don't expect to recompute old stuff very often]
thoughts / other ideas for workarounds while y'all are solidifying the new functionality here?claire
03/17/2023, 4:55 PMPrecedingPartitionMapping
and allow you to specify that in your multipartition mapping (if this pr makes it through)sri raghavan
03/17/2023, 11:52 PMclaire
03/20/2023, 9:39 PMdef get_upstream_partitions_for_partitions(
self,
downstream_partitions_subset: Optional[PartitionsSubset],
upstream_partitions_def: PartitionsDefinition,
dynamic_partitions_store: Optional[DynamicPartitionsStore] = None,
) -> PartitionsSubset:
partition_key = downstream_partitions_subset.get_partition_keys()[0] # assume only one key
return upstream_partitions_def.empty_subset().with_partition_key_range(PartitionKeyRange(upstream_partitions_def.get_first_partition_key(), partition_key))
If we created a custom PrecedingPartitionMapping
that contained a function like the one above, we'd be able to define that the downstream asset's partition X
depends on all upstream partitions between the first partition and X
.sri raghavan
03/20/2023, 11:31 PMPrecedingPartitionsMapping
would help here! I don't think I understand, however, how using IdentityPartitionsMapping
is a workaround, since it doesn't specify the dependency between a downstream partition and preceding upstream partitions. (Maybe you're saying "it should work if you never compute downstream assets before their corresponding upstreams", which I agree with?)claire
03/20/2023, 11:35 PMAllPartitionsMapping
is a workaround if you don't recompute old partitions very frequently, so if you materialize new partitions after they exist they should only incorporate the preceding months up to the month of the partition.sri raghavan
03/20/2023, 11:53 PM