https://dagster.io/ logo
j

Jeff Hulbert

03/29/2021, 5:24 PM
if a step is re-run (due to a run failure) should the InputContext upstream_output.config for the re-run step change? I have an IO manager and the handle input depends on the configured output path of the previous step but when the step is re-run the output path is recalculated as if a the previous step was run again. Because the output path contained a timestamp the handle_input function can no longer find the file.
s

sandy

03/29/2021, 5:37 PM
Hi @Jeff Hulbert - yes, it will change. I.e. the upstream_output.config is based on whatever config is supplied for that output during the current run, even if the current run is a re-execution of a prior run with different config
j

Jeff Hulbert

03/29/2021, 6:13 PM
I guess best bet for naming the files is adding context.get_run_scoped_output_identifier() like the existing IO managers since that will be consistent on a re-run? Any way to get a timestamp, maybe the start of the original run as part of that? I'm using the IO manager to persist the file long term and would like a timestamp in the filename to help with tracking.
s

sandy

03/29/2021, 10:25 PM
ah, that's right. if you want to pull metadata about the run, you can potentially do it via the run storage, which is accessible via something like
context.step_context.instance.run_storage
. https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/core/storage/runs/base.py this might feel hacky, but another thing you could do is create an accompanying empty file with the timestamp. so if your run id is "abc123", you could create
Copy code
abc123/step1/output1/file.pkl
abc123/step1/output1/2020-03-29
j

Jeff Hulbert

03/29/2021, 11:24 PM
thanks!