victor
04/06/2021, 9:57 AMvictor
04/06/2021, 10:01 AM@resource
and yielding the generated run, so I can use it in solids. This works as I need for the in_process
executor, but any others will create a new parent experiment run for each executor process that is started.victor
04/06/2021, 10:05 AMvictor
04/06/2021, 10:06 AMvictor
04/06/2021, 11:10 AMschrockn
04/06/2021, 2:40 PMschrockn
04/06/2021, 2:41 PMvictor
04/06/2021, 4:09 PMI would recommend having a special purpose solid at the beginning of the pipeline whose sole responsibility is to create the dagster run id <--> mlflow id mappingI thought about doing this, but it would be annoying to have to pass around the mlflow run ID as an output/input as opposed to having it magically in all the contexts.
We make no guarantees about the parallelism of resource spinupsGood point, I had not thought about that...
victor
04/06/2021, 4:12 PMa resource that spins up a EMR/Dataproc cluster once per runThat's definitely a good use case. I was actually thinking of the same pattern but with a Dask cluster (possibly on kubernetes with the same image as the UCD for our use case) and we'd need the same pattern.
What would you expect the API/capability to be? My first thought would be to provide a key-value stores where you can associate attributes and serializable values with run ids, and allow resources to stash state there. (In your case there would be an mlflow_experiment_id attribute) This facility could easily be abused, which would be my primary reservation to adding it.That sounds like a good API, and abuse could be minimised (while keeping the functionality needed by the use cases we have mentioned) by making the pairs immutable for a given run.
Tobias Macey
04/06/2021, 5:00 PMschrockn
04/06/2021, 5:15 PMvictor
04/07/2021, 8:52 AM