Question about conditionally executing visualizations when u dagster #announcements

Question about conditionally executing visualizati...

sean

12/09/2020, 3:55 AM

Question about conditionally executing visualizations when using Jupyter notebooks as Dagster solids: I have a pipeline that uses Jupyter notebooks (via

dagstermill

) as solids. Several of my notebook solids include plotting code that allows visualization of intermediate results. However, rendering these plots can be computationally expensive. I want to be able to execute the pipeline in two differents ways: (1) with execution of plotting code, when debugging/inspecting; (2) without execution of plotting code, when I just want the final results. The most obvious way to do this is to put the plotting code under a conditional that reads some parameter. To me, the most natural kind of parameter to use is what Dagster calls a [

mode

](https://docs.dagster.io/overview/modes-resources-presets/modes-resources). I prefer this to a solid configuration parameter because (a) my configuration parameters concern the values I'm computing-- whether I pre-compute visualizations or not is a kind of meta-parameter; (b) the visualization parameter should be shared across all solids. Really it is a kind of logging configuration, but I don't think that I can use dagster's built-in logging API because I am trying to control execution of notebook cells. So I'd like to use separate modes to toggle this visualization behavior. My problem is, from what I can tell from the

mode

documentation, modes simply configure resource/logger/storage/executor keys-- I can't figure out how to read the name of the mode itself from the context during execution. Is this possible? More generally, is there some more appropriate Dagster abstraction I should be using to control this behavior? Finally, it occurred to me while thinking through this that it would be nice if dagstermill itself were capable of varying its notebook cell execution dependent on cell metadata. That way, the visualizations I described above could be sequestered in specially tagged cells, and the notebook execution engine could conditionally execute them depending on the mode. I'm pretty sure that Papermill supports this functionality.

✅ 1

max

12/09/2020, 4:14 AM

i think global config like this is appropriate to encapsulate in a resource

max

12/09/2020, 4:16 AM

you should be able to get the mode name also... i believe it's just

context.pipeline_run.mode

max

12/09/2020, 4:17 AM

finer grained control over notebook execution is an interesting idea so long as we can avoid overbuilding/building too specifically

sean

12/09/2020, 4:33 AM

Thanks for the pointer! Regarding finer grained control, I suspect it can be implemented without too much hassle. Papermill lets you use an alternative execution engine. It is trivial to extend the default for some purposes, like post-processing-- not sure about conditional cell execution though. Dagster could provide an engine to Papermill that runs a check on each cell before executing. The check by default just returns True, but can be configured via a user-provided callback that takes the cell (or just metadata) and execution context, and returns a boolean specifying whether the cell should be executed.

max

12/09/2020, 6:42 AM

yep, and we already use a custom engine for other reasons. again an issue would be super helpful for me to track this — I’ll be able to take a serious look at the end of the week

sean

12/09/2020, 4:51 PM

Cool, here's the issue: https://github.com/dagster-io/dagster/issues/3378

Open in Slack

Previous Next