wave Hello team I m wondering how I can loop over the outpu dagster #ask-community

:wave: Hello, team! I'm wondering how I can loop o...

Alec Ryan

04/14/2022, 3:12 PM

👋 Hello, team! I'm wondering how I can loop over the output of an OP in a job. The OP structure is as follows: 1. Fetch data from API --> List 2. Loop through that List to execute API calls I get this error when I try to do that in a job:

Copy code

Attempted to iterate over an InvokedSolidOutputHandle. This object represents the output "result" from the solid "extract_game_ids_to_list". Consider defining multiple Outs if you seek to pass different parts of this output to different solids.

alex

04/14/2022, 3:16 PM

https://docs.dagster.io/concepts/ops-jobs-graphs/dynamic-graphs the

@job

funciton is evaluated at init time to determine the dependency structure of the graph, it isn’t used at “run” time

Alec Ryan

04/14/2022, 3:17 PM

Thanks @alex, so I would want to create a third OP to loop through the output?

alex

04/14/2022, 3:18 PM

i think you want to convert

extract_game_ids_to_list

to use a “dynamic output” instead of a list, if the goal is to run down stream work separately for each id

Alec Ryan

04/14/2022, 3:21 PM

extract_game_id_to_list outputs a game_id that is used to call an api and return a json object basically

Alec Ryan

04/14/2022, 3:21 PM

I would then load that json to s3 (using another OP I guess?)

Alec Ryan

04/14/2022, 3:22 PM

Maybe I haven't fully grasped the idea of a job just yet

alex

04/14/2022, 3:26 PM

basically, all actual code execution has to happen in

op

s . So you could either • do the iteration inside

extract_game_data_to_json

, making it

List[id]

List[json]

• use dynamic outputs, which will effectively clone the

extract_game_data_to_json

that goes

id

json

op for each

id

that is determined at runtime

Alec Ryan

04/14/2022, 3:39 PM

Is it common to handle the entire extraction/load to s3 in one op?

Alec Ryan

04/14/2022, 3:39 PM

Assuming yes from the docs:

Alec Ryan

04/14/2022, 3:39 PM

I'm basically trying to do bullet # 4

alex

04/14/2022, 4:19 PM

ya one way to look at it is checkpointing, if something fails how much do you want to re-do you could just do everything all in one big op, but you have to start all the way over if any thing fails similarly, operating on whole lists, you have to re-do the whole list if just one item fails something like splitting extract and load comes down to how expensive the extract is, if the load fails do you care if you have to re-extract?

Alec Ryan

04/14/2022, 4:40 PM

the idea then is to encapsulate ops within jobs within graphs then right?

alex

04/14/2022, 5:15 PM

a dependency graph of ops is a

graph

job

is an executable

graph

Ben Gatewood

04/14/2022, 10:31 PM

I use DynamicOutput for a very similar operation: Retrieve list if IDs from API --> Fan out to individual API requests for each ID in the list --> Collect all the results and pass them on for downstream processing

3 Views

Open in Slack

Previous Next