if a step is re-run (due to a run failure) should ...
# announcements
j
if a step is re-run (due to a run failure) should the InputContext upstream_output.config for the re-run step change? I have an IO manager and the handle input depends on the configured output path of the previous step but when the step is re-run the output path is recalculated as if a the previous step was run again. Because the output path contained a timestamp the handle_input function can no longer find the file.
s
Hi @Jeff Hulbert - yes, it will change. I.e. the upstream_output.config is based on whatever config is supplied for that output during the current run, even if the current run is a re-execution of a prior run with different config
j
I guess best bet for naming the files is adding context.get_run_scoped_output_identifier() like the existing IO managers since that will be consistent on a re-run? Any way to get a timestamp, maybe the start of the original run as part of that? I'm using the IO manager to persist the file long term and would like a timestamp in the filename to help with tracking.
s
ah, that's right. if you want to pull metadata about the run, you can potentially do it via the run storage, which is accessible via something like
context.step_context.instance.run_storage
. https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/core/storage/runs/base.py this might feel hacky, but another thing you could do is create an accompanying empty file with the timestamp. so if your run id is "abc123", you could create
Copy code
abc123/step1/output1/file.pkl
abc123/step1/output1/2020-03-29
j
thanks!