The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Dagster question: I have a machine learning pipeline, and I want to run multiple versions of the model/features

In my mind I expect to have to write a custom asset for each version of the features, but thereafter the code is identical for all versions - is there a good way of doing this in Dagster?

will probably try this:

```@asset
def features_v1(data:pd.DataFrame)-catboost.Pool:
    ...


@asset
def features_v2(data:pd.DataFrame)-catboost.Pool:
    ...


@asset
def features_v3(data:pd.DataFrame)-catboost.Pool:
    ...

def build_ml_pipeline(features:catboost.Pool, version:str):
    
   x = f'''
   @asset
   def model_{version}(features):
      ...

   @asset
   def predictions_{version}(model_{version}):
      ...

   @asset
   def model_{version}(model_{version}):
      ...

   '''

   exec(x)

for i, feat in enumerate([features_v1, features_v2, features_v3]):
   build_ml_pipeline(feat, f'v{i}')


   '''```

I have a similar usecase. I think there are (at least) two ways you can deal with this
1. Defining a single asset `my_model` and statically partitioning that asset, where each partition corresponds to a different version of the same model
2. Using an Asset Factory approach — you have some function that returns an AssetsDefinition (for a particular model) and you call this function multiple times programmatically to generate your assets

Re: features, you could similarly partition the features, and define a `StaticPartitionMapping` between feature/model partitions

cc <@U011CET83FG> this could be made easier potentially w/ <https://github.com/dagster-io/dagster/issues/7943|runtime asset partitions>

<@U04CRGA236D> that looks like a reasonable approach to me.

Runtime asset partitions would work in the case where the features are computed in the same way and you just have different config parameters or something, but if you actually have different functions for different versions, your approach makes the most sense IMO.