victor04/06/2021, 9:57 AM
and yielding the generated run, so I can use it in solids. This works as I need for the
executor, but any others will create a new parent experiment run for each executor process that is started.
schrockn04/06/2021, 2:40 PM
victor04/06/2021, 4:09 PM
I would recommend having a special purpose solid at the beginning of the pipeline whose sole responsibility is to create the dagster run id <--> mlflow id mappingI thought about doing this, but it would be annoying to have to pass around the mlflow run ID as an output/input as opposed to having it magically in all the contexts.
We make no guarantees about the parallelism of resource spinupsGood point, I had not thought about that...
a resource that spins up a EMR/Dataproc cluster once per runThat's definitely a good use case. I was actually thinking of the same pattern but with a Dask cluster (possibly on kubernetes with the same image as the UCD for our use case) and we'd need the same pattern.
What would you expect the API/capability to be? My first thought would be to provide a key-value stores where you can associate attributes and serializable values with run ids, and allow resources to stash state there. (In your case there would be an mlflow_experiment_id attribute) This facility could easily be abused, which would be my primary reservation to adding it.That sounds like a good API, and abuse could be minimised (while keeping the functionality needed by the use cases we have mentioned) by making the pairs immutable for a given run.
Tobias Macey04/06/2021, 5:00 PM
schrockn04/06/2021, 5:15 PM
victor04/07/2021, 8:52 AM