Hi support, for partitioned jobs launched in a sin...
# ask-community
c
Hi support, for partitioned jobs launched in a single run (see screenshot)/ Do you need to loop on
_context_.asset_partition_key_range
and then return an array for each key? e.g range if the range is
['2023-06-20', '2023-06-21', '2023-06-22']
the asset would return
'hello', 'dagster', 'world'
If that’s correct what do you recommend doing in an IO Manager to prevent end users errors? • Check the length of the partition range is the same as the output? • Ask the end user to return a key value dictionary with the shape:
range_key: value
like
Copy code
{
  "2023-06-20": "hello",
  "2023-06-21": "dagster",
  "2023-06-22": "world"
}
• Else? Thanks in advance, and sorry I couldn’t find docs regarding this.
🤖 1
b
I actually posted something related to this: https://dagster.slack.com/archives/C01U5LFUZJS/p1687941747304719
s
Hey Chris, this is an area of active development. Currently the value returned needs to depend on the IO Manager you are using-- the IO manager will receive the value and is responsible for breaking it up and writing it to disk, so that value can be anything the IO Manager can digest. However, we will soon be introducing a container object (probably called
PartitionedOutput
) that you will return: https://github.com/dagster-io/dagster/pull/14621
what do you recommend doing in an IO Manager to prevent end users errors?
Just validate whatever structure you’re expecting, as with the dictionary example you provided.
c
Alright perfect, thanks for the super quick answer
@sean I have kinda of follow up question. Let’s say I have an asset with 5 partitions that is the upstream of an non-partitioned asset. Do I understand correctly that I need my IO Manager to collect all 5 partitions and then wrap it in an array that will become the input of the downstream? Is there a way to call the IO Manager X amount of times and return each partitions as a separate input argument
s
Do I understand correctly that I need my IO Manager to collect all 5 partitions and then wrap it in an array that will become the input of the downstream?
Yes, although it doesn’t need to be an array, you can return any value. I believe our default
UPathIOManager
returns a dict keyed by partition key.
Is there a way to call the IO Manager X amount of times and return each partitions as a separate input argument
There is presently no way to do this.
c
Gotcha thanks really helpful