is it possible to gather metadata from multiple fanned out o dagster #ask-community

is it possible to gather metadata from multiple fa...

Danny Steffy

03/03/2023, 4:55 PM

is it possible to gather metadata from multiple fanned out ops and collect it somewhere? We have this fanned out graph setup:

Copy code

result = (
        key_to_score(recruiter_teams_to_score)
        .map(
            lambda key: score_data_set(
                data_to_score_for_key(recruiter_in_batch=key), trained_model
            )
        )
        .collect()
    )

score_data_set and data_to_score_for_key each collect metadata on their specific steps (processing time and num_rows). I'd like to collect that metadata and average it at the end to display for the graph-backed asset that this ends up being

sandy

03/03/2023, 10:46 PM

we don't currently have super-easy APIs for accessing runtime metadata in downstream ops. here's a ticket where we're tracking this: https://github.com/dagster-io/dagster/issues/8521 it should be possible to access the metadata if you muck around with the methods on DagsterInstance:

Copy code

instance.all_logs(run_id=context.run_id, of_type=DagsterEventType.STEP_OUTPUT)

would be a good place to start. you can get the metadata off of the output events

Danny Steffy

03/07/2023, 3:23 PM

Hm when I try that code above, I get an error saying the run_id doesn't exist in the InputContext for the IO Manager. Is there any way to get that data inside the IOManager?

sandy

03/07/2023, 4:16 PM

ah - try

context.step_context.run_id

Danny Steffy

03/07/2023, 5:13 PM

excellent that worked. are there additional filters available in all_logs?

sandy

03/07/2023, 5:32 PM

that's all it currently supports

Open in Slack

Previous Next