https://dagster.io/ logo
Title
a

Alec Koumjian

04/27/2023, 2:07 PM
I would like a general way to override the output
path
my IO Manager use based on runtime information. It seems like I should be able to pass a parameter in an
Output
object and have the IO Manager pick it up. So for example,
Output(result, metadata={"custom_path": "<gs://bucket-name/path/to/custom/destination>"})
However, this is difficult as the private
_get_path
method only receives the
OutputContext
and it's not clear at all how to get from that to the metadata in the actual
Output
. Do I have to traverse something like
OutputContext
->
step_context
->
execution_plan
->
get_step_output
to find it?
This is what I have at the moment:
def _get_path(self, context: Union[InputContext, OutputContext]) -> UPath:
        """
        Look for a custom destination path in the metadata, otherwise use the default
        """
        # Get the actual output object from the output context
        event_data = None
        if isinstance(context, OutputContext):
            events = context.step_context.instance.all_logs(
                run_id=context.run_id, of_type=DagsterEventType.STEP_OUTPUT
            )
            for e in events:
                event_specific_data = e.dagster_event.event_specific_data
                if (
                    event_specific_data.step_output_handle.step_key == context.step_key
                    and event_specific_data.step_output_handle.output_name
                    == context.name
                ):
                    event_data = event_specific_data
                    break

        if event_data is not None and "path" in event_data.metadata:
            return UPath(event_data.metadata["path"].value)
        return super()._get_path(context)
Saw reference to it in a year old issue here: https://github.com/dagster-io/dagster/issues/8521 It works but it feels inelegant.
c

chris

04/27/2023, 6:04 PM
Which io manager are you using? You can set a custom path prefix via
fs_io_manager
, or you can always just write your own that allows for path overrides via metadata
a

Alec Koumjian

04/27/2023, 6:08 PM
I am using my own io manager. The issue is where in the io manager's interface is the correct place to capture the override. It seems like the most correct place is in
_get_path
so that it's generated consistently. The issue is the vast amount of digging I had to do to get from the
OutputContext
to the metadata on the actual
Output
event.
c

chris

04/27/2023, 6:09 PM
Ah I see - are you saying that the metadata was not on
OutputContext.metadata
?
a

Alec Koumjian

04/27/2023, 6:09 PM
Correct,
OutputContext.metadata
does not include
Output(metadata=...)
.
c

chris

04/27/2023, 6:10 PM
Gotcha - I think that’s just a bug
Workaround would be what you describe - but ideally we just make that easily available. That should solve the amount of digging for your case, correct? You could just easily gate on the existence of metadata on the context
a

Alec Koumjian

04/27/2023, 6:12 PM
Yes, I would think the
OutputContext
would include an easy reference to everything in the
Output
(including the output value itself) as well as additional runtime information.
e

Eric Loreaux

05/02/2023, 5:24 PM
Is there a bug we can track for this one? Would also love to see this made easier
c

chris

05/02/2023, 5:26 PM
here and here
🙌 1
🙌🏻 1
a

André Augusto

05/04/2023, 3:17 PM
also a related discussion and my comment on it https://github.com/dagster-io/dagster/discussions/6913#discussioncomment-4173268 hoping that this gets sorted it out soon because I’m (again) needing this feature to give special paths to dynamic outputs
e

Eric Loreaux

05/04/2023, 10:31 PM
What is preventing you from using the workaround specified in https://github.com/dagster-io/dagster/issues/8521?
a

Alec Koumjian

05/05/2023, 3:24 PM
Nothing is preventing me from using the workaround (using
all_logs
), you can see my implementation of
_get_path
up above in this thread. It's simply not a very elegant solution. I would expect that a method designed to generate a path for an
Output
would have direct access to the
Output
and its metadata. Instead it requires a rather verbose traversal and I believe a database query.
e

Eric Loreaux

05/05/2023, 4:14 PM
Sorry I was referring to the comment made by @André Augusto , just because they mentioned it as a potential blocker
a

Alec Koumjian

05/05/2023, 4:14 PM
My bad, sorry!
a

André Augusto

05/05/2023, 5:04 PM
yeah, I took a closer look code Alec provided earlier and adapted to my case to make it work. However, as Alec said, it is not elegant and quite surprising we cant access this kind of data in a ergonomic way given the powerful dagster APIs