is there an example of a pipeline making use of intermediate dagster #announcements

Join Slack

is there an example of a pipeline making use of in...

# announcements

dwall

06/03/2019, 6:02 PM

is there an example of a pipeline making use of intermediates?

nate

06/03/2019, 6:11 PM

yeah! is your question regarding how intermediates are stored, or how they are passed between solids?

nate

06/03/2019, 6:12 PM

if you add to your config:

Copy code

storage:
    filesystem:

your pipeline will store intermediates on disk

nate

06/03/2019, 6:13 PM

without a directory specified under

filesystem:

, this will be defaulted to

/tmp/dagster/runs/<run id>

dwall

06/03/2019, 6:14 PM

oh super interesting. is there more documentation on this?

alex

06/03/2019, 6:19 PM

not much at the moment https://dagster.readthedocs.io/en/latest/sections/reference/reference.html#intermediates

alex

06/03/2019, 6:19 PM

certainly something we need to document better

nate

06/03/2019, 6:20 PM

yeah, good reminder that we should add this! The intermediates storage is how we support re-execution. right now we support in-memory, filesystem, and S3 storage for intermediates. we’re actively working on improving this part of the system - the goal is to ultimately support persisting intermediate results on a variety of object stores, and eventually to permit user-configuration, e.g. so if you’ve already got data in some

<s3://your_bucket/2019/01/01/*.parquet>

, you won’t need to migrate it to work with dagster

dwall

06/03/2019, 6:25 PM

this is really awesome. so just to be clear, if

storage

is not specified in config.yml, no intermediates will stored, yeah?

nate

06/03/2019, 6:26 PM

yup exactly, without

storage

the intermediates will be in memory only, nothing on disk/elsewhere

dwall

06/03/2019, 6:29 PM

very very cool. is there a way to specify materialization format in the config? would love to take a peek at an example of a pipeline config that uses intermediates if y’all know of one

alex

06/03/2019, 6:31 PM

from the airline-demo example you can see how we set up custom types that register StoragePlugins to control how they are materialized https://github.com/dagster-io/dagster/blob/master/examples/dagster_examples/airline_demo/types.py

schrockn

06/03/2019, 8:06 PM

@dwall would love for you to try out our step re-execution stuff

schrockn

06/03/2019, 8:07 PM

run with filesystem config

schrockn

06/03/2019, 8:09 PM

Now you’ll when a persistent storage mode is in place you can mouseover and get the replay button. If you press that it instigates around run, but only that step, using the intermediates from the previous run

schrockn

06/03/2019, 8:09 PM

In the new run, only the single step is executed. Then you can just rerun that step while you iterate on the business logic. I used it to refactor this very pipeline and it was magical.

Taylor

06/04/2019, 3:02 PM

I used this intermediate / single solid re-execution to debug a pipeline and it literally saved me hours.

🙌 1

Open in Slack

Previous Next