José Zavala González
11/15/2022, 3:38 PMjamie
11/15/2022, 3:42 PMJosé Zavala González
11/15/2022, 3:53 PMjamie
11/15/2022, 3:56 PMdefine_dagstermill_asset
uses a lot of the same code as define_dagstermill_op
and define_dagstermill_op
supports returning arbitrary outputs from the notebook (but those outputs won’t be assets). So define_dagstermill_asset
may already be able to return arbitrary outputs, but i haven’t explored the implications of that deeply enough to be able to confidently recommend itJosé Zavala González
11/15/2022, 3:59 PMjamie
11/15/2022, 4:00 PMnickvazz
11/15/2022, 4:25 PMall defined as SDAs. I am currently trying to get it situated to all be partitioned assets since each notebook is essentially a separate experiment->notebook_0
->data.parq
notebook_1
non_argument_deps
to chain the notebooks together:
notebook_asset = dm.define_dagstermill_asset(
"some_notebook_asset",
"template_notebook.ipynb",
non_argument_deps={'some_other_notebook_asset'},
)
José Zavala González
11/15/2022, 5:00 PMnon_argument_deps
definitely help alleviate the lineage. What partitioning variable do you want to use for your notebooks?nickvazz
11/15/2022, 5:08 PM{EXP_NAME}_{VERSION}
since the same experiment could have multiple iterations@job
(via define_asset_job
) and that job is using an @config_mapping
to specify the shared parameters within each notebook in the pipeline
notebook_asset_0 = dm.define_dagstermill_asset(
'notebook_asset_0',
'notebook_0.ipynb',
group_name='group_0',
config_schema={'required_shared_var': dagster.Field(str)}
notebook_asset_1 = dm.define_dagstermill_asset(
'notebook_asset_1',
'notebook_1.ipynb',
group_name='group_0',
config_schema={'required_shared_var': dagster.Field(str)},
non_argument_deps={'notebook_asset_0'},
))
@config_mapping(config_schema={'shared_var': dagster.Field(str)})
def simplified_config(values)
return {'ops:
{'notebook_asset_0': 'config': {'required_shared_var': values['shared_var']},
{'notebook_asset_1': 'config': {'required_shared_var': values['shared_var']},
}
notebook_pipeline_job = define_asset_job(
name='full_notebook_pipeline_job',
config=simplified_config,
selection=AssetSelection.groups('group_0'),
)
that way I can only have to specify required_shared_var
once for the entire job and not to every @op
/ @asset
but I cant figure out how to combine partitioning with the config_mapping
https://dagster.slack.com/archives/C01U954MEER/p1668200854863459