https://dagster.io/ logo
#dagster-support
Title
# dagster-support
d

Daniel Gafni

07/06/2022, 5:12 PM
Hey guys, is there an example using
partition_mappings
available anywhere? Basically I want to have a (partitioned) asset (or maybe a job) that's dependent on multiple upstream partitions. Lets say I have a daily partitioned dataset and I want to train a model on the last 30 days from the dataset. Do I need
partition_mappings
for this? If not, what's the correct way to achieve this?
plus1 1
s

sandy

07/06/2022, 7:37 PM
Hey Daniel - I think the best examples right now we have are in unit tests: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster_tests/core_tests/asset_defs_tests/test_partitioned_assets.py#L182. I'd be happy to answer questions if that's not illustrative enough on its own
d

Daniel Gafni

07/06/2022, 10:39 PM
That's great, thank you! Turns out I've been implementing the wrong abstract method lol. What about AssetDefinition.from_graph? Is there a reason for not having the partition_mappings argument? I could add it with a PR if not.
s

sandy

07/06/2022, 10:40 PM
I posted a PR for that this morning after I saw your earlier message about it: https://github.com/dagster-io/dagster/pull/8768
🎉 1
d

Daniel Gafni

07/06/2022, 10:42 PM
You guys are absolutely amazing! Thank you!
🙏 1
By the way, what would be the best way to access and load the upstream partitions? Would it have to be a dynamic graph?
s

sandy

07/06/2022, 10:52 PM
generally, you code this logic into your IO manager - using
context.asset_partition_key_range
or
context.asset_partition_key_time_window
. there are also equivalent methods on
OpExecutionContext
if you want to get this information from within the body of your op
keanu thanks 1
d

Daniel Gafni

07/06/2022, 10:52 PM
Great, thanks again!
What is the best way to access the actual partitions list in this case? I can access
context.asset_partition_key_range
(for example,
PartitionKeyRange(start='2022-06-26', end='2022-07-06')
), but it only has the start and end of the partitions. As I understand, this is because
PartitionMapping
works this way. this is a little inconvenient, because different assets may have different partition schemas (daily, weekly, etc) and there is no clear way to get the exact partitions just from the
PartitionKeyRange
. Would you recommend a way of reconstructing the actual exact partitions list? I've managed to implement the IO Manager with hardcoded partition_key_range -> partitions_list mapping, but it's not a good general solution
Also, I get this error when trying to access
context.asset_partition_key
or
context.get_asset_identifier()
inside an IO Manager:
Copy code
<http://context.log.info|context.log.info>(f"context.asset_partition_key_range: {context.asset_partition_key_range}")
<http://context.log.info|context.log.info>(f"context.has_partition_key: {context.has_partition_key}")
<http://context.log.info|context.log.info>(f"context.partition_key: {context.partition_key}")
<http://context.log.info|context.log.info>(f"context.asset_partition_key: {context.asset_partition_key}")
is this the desired behavior?
here is my code... I would be happy to get some tips for improvement
3 Views