Bryan Wood
03/21/2022, 10:00 PMjamie
03/21/2022, 10:31 PM@job
def my_job():
model = get_ml_model()
out_1 = op_1(model)
out_2 = op_2(model)
...
2. depending on what trade-offs you're willing to make a resource could work as well. by default dagster runs jobs in multi-process mode and each process gets an instantiation of each resource. so you'd still end up loading the ml model multiple times. If you set your jobs so that they only run in a single process, you could have a resource that loads the model into memory and then access the model in each opBryan Wood
03/21/2022, 10:43 PMprha
03/21/2022, 11:20 PM@job(executor_def=in_process_executor)
def my_job():
...
This will only initialize each resource once, at the start of the job run.
If you are using the default multiprocess executor, you’ll still have to select which ops incur the loading cost by using required_resource_keys
:
@op(required_resource_keys={'model'})
def my_op_that_needs_the_model(context):
# does something with context.resources.model, will incur the resource initialization cost
@op
def my_other_op():
# will not have an initialized resource
...
@job(resource_defs={'model': my_model_loading_resource})
def my_multiprocess_job():
my_op_that_needs_the_model()
my_other_op()
Bryan Wood
04/16/2022, 12:20 AM