Separate question, was wondering how GPU jobs were orchestrated? We have some large GPU tasks that we'd like to write in, and were wondering if there was a way to have an init function that instantiates a model so we don't have to instantiate it for each call of a task, and that we could maybe have "warm tasks" running that could pickup these jobs faster without having to initialize the model again?
dagster bot responded by community 1
12/05/2022, 3:01 AM
I don't think there is a built in way here but you could have an asset that creates a serving endpoint and then use that in you're inference job
12/05/2022, 4:26 PM
Yeah what Oliver said seems like it would work. You could also model it as a resource that your ops are dependent on as they get initialized before each op. You'd just need to make the resource idempotent as it'll get initialized for every op / resource that is dependent on it.
12/05/2022, 5:46 PM
I see, them my follow up is how does "spinning up workers" work in that case, like if I had 500 requests, would a new resource be initialized every time?
12/05/2022, 5:57 PM
yes it would. If serializing / deserializing the model is possible and doesn't take too long you could make a resource that takes a path to the model on disk - if it exists, load it from there, if not, create it, save it to disk, and return the model to the op as an input.