Daniel Gafni
07/06/2022, 5:12 PMpartition_mappings
available anywhere?
Basically I want to have a (partitioned) asset (or maybe a job) that's dependent on multiple upstream partitions.
Lets say I have a daily partitioned dataset and I want to train a model on the last 30 days from the dataset.
Do I need partition_mappings
for this? If not, what's the correct way to achieve this?sandy
07/06/2022, 7:37 PMDaniel Gafni
07/06/2022, 10:39 PMsandy
07/06/2022, 10:40 PMDaniel Gafni
07/06/2022, 10:42 PMsandy
07/06/2022, 10:52 PMcontext.asset_partition_key_range
or context.asset_partition_key_time_window
. there are also equivalent methods on OpExecutionContext
if you want to get this information from within the body of your opDaniel Gafni
07/06/2022, 10:52 PMcontext.asset_partition_key_range
(for example, PartitionKeyRange(start='2022-06-26', end='2022-07-06')
), but it only has the start and end of the partitions.
As I understand, this is because PartitionMapping
works this way. this is a little inconvenient, because different assets may have different partition schemas (daily, weekly, etc) and there is no clear way to get the exact partitions just from the PartitionKeyRange
.
Would you recommend a way of reconstructing the actual exact partitions list? I've managed to implement the IO Manager with hardcoded partition_key_range -> partitions_list mapping, but it's not a good general solutioncontext.asset_partition_key
or context.get_asset_identifier()
inside an IO Manager:
<http://context.log.info|context.log.info>(f"context.asset_partition_key_range: {context.asset_partition_key_range}")
<http://context.log.info|context.log.info>(f"context.has_partition_key: {context.has_partition_key}")
<http://context.log.info|context.log.info>(f"context.partition_key: {context.partition_key}")
<http://context.log.info|context.log.info>(f"context.asset_partition_key: {context.asset_partition_key}")
is this the desired behavior?