https://dagster.io/ logo
#ask-community
Title
# ask-community
t

Thorsten Schäfer

07/06/2022, 7:06 AM
Hi all, does anyone have an example of using partitioned SDAs with partition_mapping? I'm trying to create a forecasting pipeline, that backtests severals models and creates a forecast based on the best model in the backtest. Therefore, given a partition of this month, the backtest-asset partition required is 12 month ago... I assume partition_mappings are applicable for such cases, but cannot find an example.
s

sandy

07/06/2022, 7:21 PM
Hey Thorsten - I think the best example we have right now is in the unit test: https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster_tests/core_tests/asset_defs_tests/test_partitioned_assets.py#L182. I'd be happy to answer questions if that's not illustrative enough on its own
t

Thorsten Schäfer

07/08/2022, 3:35 PM
Hey @sandy, based on the example, I realized that the partition_mapping is not working as I had it in mind: the test case shows, that both assets/ops are only called with partition_key 2; the extended mapping is only used in the IOManager. I would have thought that a job with the two provided assets should lead to three asset materialization events (upstream asset with partitions 1 and 2), otherwise, the IOManager tries to load an unmaterialized asset partition. Is this the intended behavior and if so, how would one create a job with both assets, that ensures all required partitions are materialized?
s

sandy

07/09/2022, 1:01 AM
Ah - got it. That's a very reasonable request, but alas not something we yet support. I filed an issue to track it: https://github.com/dagster-io/dagster/issues/8811. A question for you: when materializing the upstream asset, would you want to have a step for each partition? Or a single step that materializes both partitions (i.e. that accepts a range of partitions to process)?
t

Thorsten Schäfer

07/11/2022, 9:11 AM
I have no hard preference on it, but I would assume, that it's most flexible if both options are provided based on the dependency graph and its partition mappings: Case 1: In my use case, I need the base forecasts from 12 month ago for the backtest and the current base forecasts for creating an ensemble forecast. Hence, the same asset would be referenced in two different parts of the graph with two different partition mappings (a default one, and one that maps t to t-12m)
Case 2: If I'd create a window metric as an asset, e.g. rolling 7 days avg., I'd have a single reference with a partition mapping that maps t -> t-6, t-5, ..., t - so a single step would be a direct representation of the underlying dependency
2 Views