Currently investigating Dagster as a replacement for Airflow. I rolled off my last attempt a few months ago, and I still can’t really understand as an Ops user if a run fails, and I make changes or rerun the job. How do I know that a full run has completed? I’m mentally trained for airflow’s ‘green dot’ when all child processes have been retried successfully. If I have a failed job and rerun ‘from failure’ until I get a success I see no indication that the original run has been ‘fixed’. For example if I run a single step again I get a ‘green dot’ for that step. If I rerun ‘from failure’ it’s a seperate run and visually has no indication that the Failure run has been ‘handled’.
In the screenshot the first failure was due to a API response handling. A few ‘retry from failures’ later after tweaks the job was successful. The last success was a single step from the failed job. I feel a lack of confidence that the team will be able to identify when things are in a good state.
08/24/2022, 10:15 PM
Hey Jon - this is a good call out and I've raised an issue for it. If you haven't checked it out, you might want to see the instance status page (accessible via the status button on top right of screen) which gives a more holistic view into the run history, but it still doesn 100% give you what you want. You can get yourself into a state where the last run of a step was a failure, but the status is showing green for a particular job.
09/02/2022, 9:54 PM
+1 understanding the pipeline status is not simple because of this issue
Perhaps a job can track the underlying steps status even if they were computed in a different run? And the status could be shown next to the job name. Partitioned jobs could display the latest partition status. Thoughts?