Alec Koumjian
04/27/2023, 2:07 PMpath
my IO Manager use based on runtime information. It seems like I should be able to pass a parameter in an Output
object and have the IO Manager pick it up.
So for example, Output(result, metadata={"custom_path": "<gs://bucket-name/path/to/custom/destination>"})
However, this is difficult as the private _get_path
method only receives the OutputContext
and it's not clear at all how to get from that to the metadata in the actual Output
.
Do I have to traverse something like OutputContext
-> step_context
-> execution_plan
-> get_step_output
to find it?def _get_path(self, context: Union[InputContext, OutputContext]) -> UPath:
"""
Look for a custom destination path in the metadata, otherwise use the default
"""
# Get the actual output object from the output context
event_data = None
if isinstance(context, OutputContext):
events = context.step_context.instance.all_logs(
run_id=context.run_id, of_type=DagsterEventType.STEP_OUTPUT
)
for e in events:
event_specific_data = e.dagster_event.event_specific_data
if (
event_specific_data.step_output_handle.step_key == context.step_key
and event_specific_data.step_output_handle.output_name
== context.name
):
event_data = event_specific_data
break
if event_data is not None and "path" in event_data.metadata:
return UPath(event_data.metadata["path"].value)
return super()._get_path(context)
Saw reference to it in a year old issue here: https://github.com/dagster-io/dagster/issues/8521
It works but it feels inelegant.chris
04/27/2023, 6:04 PMfs_io_manager
, or you can always just write your own that allows for path overrides via metadataAlec Koumjian
04/27/2023, 6:08 PM_get_path
so that it's generated consistently. The issue is the vast amount of digging I had to do to get from the OutputContext
to the metadata on the actual Output
event.chris
04/27/2023, 6:09 PMOutputContext.metadata
?Alec Koumjian
04/27/2023, 6:09 PMOutputContext.metadata
does not include Output(metadata=...)
.chris
04/27/2023, 6:10 PMAlec Koumjian
04/27/2023, 6:12 PMOutputContext
would include an easy reference to everything in the Output
(including the output value itself) as well as additional runtime information.Eric Loreaux
05/02/2023, 5:24 PMAndré Augusto
05/04/2023, 3:17 PMEric Loreaux
05/04/2023, 10:31 PMAlec Koumjian
05/05/2023, 3:24 PMall_logs
), you can see my implementation of _get_path
up above in this thread. It's simply not a very elegant solution. I would expect that a method designed to generate a path for an Output
would have direct access to the Output
and its metadata. Instead it requires a rather verbose traversal and I believe a database query.Eric Loreaux
05/05/2023, 4:14 PMAlec Koumjian
05/05/2023, 4:14 PMAndré Augusto
05/05/2023, 5:04 PM