Dagster question: I have a machine learning pipeli...
# ask-community
m
Dagster question: I have a machine learning pipeline, and I want to run multiple versions of the model/features In my mind I expect to have to write a custom asset for each version of the features, but thereafter the code is identical for all versions - is there a good way of doing this in Dagster?
👀 1
dagster bot responded by community 1
will probably try this:
Copy code
@asset
def features_v1(data:pd.DataFrame)-catboost.Pool:
    ...


@asset
def features_v2(data:pd.DataFrame)-catboost.Pool:
    ...


@asset
def features_v3(data:pd.DataFrame)-catboost.Pool:
    ...

def build_ml_pipeline(features:catboost.Pool, version:str):
    
   x = f'''
   @asset
   def model_{version}(features):
      ...

   @asset
   def predictions_{version}(model_{version}):
      ...

   @asset
   def model_{version}(model_{version}):
      ...

   '''

   exec(x)

for i, feat in enumerate([features_v1, features_v2, features_v3]):
   build_ml_pipeline(feat, f'v{i}')


   '''
r
I have a similar usecase. I think there are (at least) two ways you can deal with this 1. Defining a single asset
my_model
and statically partitioning that asset, where each partition corresponds to a different version of the same model 2. Using an Asset Factory approach — you have some function that returns an AssetsDefinition (for a particular model) and you call this function multiple times programmatically to generate your assets
Re: features, you could similarly partition the features, and define a
StaticPartitionMapping
between feature/model partitions
y
cc @sandy this could be made easier potentially w/ runtime asset partitions
s
@Mycchaka Kleinbort that looks like a reasonable approach to me. Runtime asset partitions would work in the case where the features are computed in the same way and you just have different config parameters or something, but if you actually have different functions for different versions, your approach makes the most sense IMO.