hi all, liking dagster a lot from the tutorial, pl...
# announcements
p
hi all, liking dagster a lot from the tutorial, planning to use with
dbt
to manage a workflow consisting of transforming a dataset from bigquery, then running ML on it, and putting results back in bigquery. I know this was asked before, but one question — when re-running a pipeline containing heavy computations, are there mechanisms for avoiding re-computation of parts whose inputs haven’t changed? Let’s say the solids’ inputs [outputs] are results read from [dumped to] files Essentially some type of caching/invalidation mechanism.
j
cc @chris
c
Hey @Prasad Chalasani! We actually have an experimental feature for versioning/memoization of pipeline runs. We have an example intended to get someone up and running with these features coming out later today with our mini-release.
p
great, thanks for the fast response!
c
Once the docs are updated, I'd be happy to link that to you, and would love to get your feedback on the user experience
p
happy to give feedback
👍 1
c
you can also re-execute a solid subselection, using the upstream outputs from a previous run. let me find the docs..
here’s a video that shows loading a finished run, selecting one of the solids (you can also select a subset), and then launching a new run with the previous run’s outputs: https://www.loom.com/share/b02dea352c034c15b671307ecd71f0b9
p
will take a look, thank you!
c
Oh also, this only works if the intermediates are written to persistent storage (ie not in memory), so S3 for example would work