< chris> We ve been continuing our memoization work All is m dagster #ask-community

<@U017KUAENS0> We've been continuing our memoizati...

Dan Corbiani

06/02/2021, 12:51 PM

@chris We've been continuing our memoization work. All is mostly good except for the DagsterNoStepsToExecute error. Is it intentional that an exception is raised if there is nothing to do? Am I calling the pipeline incorrectly? I end up having to wrap a bunch of our execution processes in try blocks like below: It makes me wonder how to access the output from a previously run memoized pipeline.

chris

06/02/2021, 2:12 PM

Yea, this is the (currently) correct behavior. If there are no steps to execute, then this error will be thrown. Essentially dagster isn't designed to run with no steps, and so to prevent more esoteric failures later on, we just fail fast here.

chris

06/02/2021, 2:14 PM

We don't currently have a super convenient API to access run results to my knowledge, and this is a known feature gap. In combination with the no step keys to execute error though, I'm seeing that it makes for a bit of an incongruency.

chris

06/02/2021, 2:19 PM

one potential workaround is directly instantiating your io manager, and using build_input_context/build_output_context to pass to the

handle_input

and

load_output

respectively

Dan Corbiani

06/02/2021, 5:49 PM

hmm. That's interesting. I'm glad to hear I interpreted it correctly. Do you anticipate this making it's way onto the roadmap at some point? It also seems like I could put a dummy step into the pipeline that would never get saved.

chris

06/02/2021, 5:50 PM

that seems like a good workaround for now. A better solution is definitely on the roadmap, whether that is a better output-retrieval API or being able to execute without any steps, or both

Dan Corbiani

06/03/2021, 1:14 AM

I realized this is a bit more frustrating than I originally thought. I've been playing around with it more and it seems like if something is loaded, I can't access it as output at all. Is that also correct? The use case is if I rerun a pipeline I can not access the previous output for a step. Is there something I can do to help make that possible?

Dan Corbiani

06/03/2021, 1:19 AM

I also looked through the instantiation docs but I'm not sure how to get that to work. It seems like I'm also missing the version information when I create that context.

Dan Corbiani

06/03/2021, 1:25 AM

I can get around the limitation by creating a dummy solid that calls that output. As long as it isn't saved, I can access the output.

chris

06/03/2021, 2:20 PM

Hmm. Are you saying that from the result object returned by

execute_pipeline

, that you can't access outputs memoized from previous runs?

chris

06/03/2021, 2:24 PM

I'll investigate solutions around this space. I think both of these problems should be fixable: not failing if we have no steps to execute, and retrieving outputs from previously memoized runs.

Dan Corbiani

06/03/2021, 6:04 PM

yes. that's the conclusion I've come to. I've ended up creating a dummy step that won't be memoized. That allows us to basically point back to the memoized dataframe. Happy to share whatever I can to help.

Dan Corbiani

06/03/2021, 6:04 PM

we fully realize this is experimental. It's a fantastic start.

3 Views

Open in Slack

Previous Next