https://dagster.io/ logo
#ask-community
Title
# ask-community
k

Kevin Otte

06/02/2023, 6:01 PM
Right now - we have a number of ops that conduct different functions. Our first op, we batch process items - eg each op dynamically calculates and returns a list of items. Now, we are considering adding another parameter to these items - let's call it
region
for simplicity. Does it make sense to continue to dynamically yield 5x more ops or is there a better approach? this is the current op config:
Copy code
@op(
    config_schema={
        "batch_size": Field(int, default_value=1000),
        "limit": Field(int, default_value=10000),
        "region": Field(str, default_value="USA"),
    },
    out={"batch_process_mintable_params": DynamicOut(BatchProcessMintableParams)},
)
c

claire

06/07/2023, 5:29 PM
Hi Kevin. Not entirely sure I'm understanding your use case here, what's happening downstream that requires these items to be yielded as dynamic outputs? Some thoughts: • Can you yield each batch as a single dynamic output, rather than yielding each item separately? • If you know what dimensions you want to group your data by, would it make sense to use partitioning? This way, each run can process a given batch, without having to yield many dynamic outputs
k

Kevin Otte

06/08/2023, 6:28 PM
@claire theyre yielded dynamically so they can be processed async
there are in fact other ops downstream of the one I linked above too so wondering if that changes it... so for example, it would be great to partition by region and run the entire job, as is, but for each region, in parallel
@claire posted what I was attempting to do here : https://dagster.slack.com/archives/C01U954MEER/p1686278119350379