Hi all! New user here. I'm really liking the philosophy and documentation of the framework and am trying it out on a client project. I have some n00b questions:
Looking at the last couple of weeks of chat history, it sounds like lots of folks are making feature engineering / ML pipelines (as am I) and trying to get intermediate memoization to fit their use case. I find the current UI for this somewhat inconvenient since my source data does not updating frequently and I'm not running jobs interactively or though dagit; re-using intermediates across jobs and configs seems to fit my use case better. If I understand correctly this is possible, but requires specifying a particular job run intermediate output. Someone suggested splitting the pipeline into parts, but this is also inconvenient since experimentation on the feature engineering parts of the pipeline generate lots of different possible sub-pipelines, so this ends up being close to making a pipeline per solid.
The best options I could think of are either:
a. Build caching logic into solids; this seems unfortunate since most of the work for what I want is already built in the intermediates framework
b. Hack IntermediateStore paths to: (i) not incorporate job_id, (ii) have versioned solids (to invalidate caches when logic changes), (iii) pass along upstream solid dependencies (and their versions) to incorporate into the intermediate path key (so upstream version changes will bump the downstream path)
Before going further down either path, since this seems like a pretty common use case, I wanted to check if there are other better options (or if either of these is a bad idea).