https://dagster.io/ logo
#dagster-plus
Title
# dagster-plus
m

Maria Zentsova

08/29/2023, 10:40 AM
Hi, could you please advise how best to configure io manager for dagster cloud hybrid deployment on ECS to take an advantage of partitioned assets? ConfigurablePickledObjectS3IOManager can't handle runs with partitioned assets
j

jamie

08/29/2023, 2:38 PM
hey @Maria Zentsova - the
ConfigurablePickledObjectS3IOManager
should be able to handle partitioned assets. Are you seeing any error messages?
m

Maria Zentsova

08/29/2023, 3:03 PM
Hi Jamie, I'm moving from non-partitioned asset to partitioned assets and have the following error: tried to access partition key for output 'result' of step 'step1', but the step output has a partition range: 'partition1' to 'partition2'. Could you please point me in the right direction, how to solve it?
j

jamie

08/29/2023, 3:49 PM
ok - by “moving from non-partitioned asset to partitioned assets” do you mean that you have an upstream non-partitioned asset with a downstream partitioned asset? or that you had a non-partitioned asset and now you are updating that same asset to make it partitioned?
m

Maria Zentsova

08/30/2023, 8:59 AM
Hi Jamie, I have an upstream non-partitioned assets with a downstream partitioned assets.
d

Daniel Gafni

08/30/2023, 10:22 AM
The problem is with the partitions range (I think). The IOManager does't support it. It seems like you are executing a backfill as single run with partitions range? Are you doing this on purpose? If you execute a backfill as multiple individual runs everything should be fine.
m

Maria Zentsova

08/31/2023, 9:00 AM
Hi Daniel, thank you for your reply! Yes, I divide a big data source by multiple partitions (data for different companies) to be able track the calculation. Can I run partitions as a multiple individual runs as a part of a scheduled workflow?
d

Daniel Gafni

08/31/2023, 11:06 AM
Yes, and I think you should be doing it. Runs for partitions ranges are a very specific feature which your IOManager should support. Basically you would load all upstream partitions for all downstream partitions in a single run. This is usually undesirable, unless you are working with something like Snowflake or BigQuery. You probably don't need this if you are loading objects from filesystem and running Python code with them.
👍 1
m

Maria Zentsova

08/31/2023, 4:25 PM
Thank you so much, Daniel!
4 Views