Is there a way to arbitrarily parametrize assets that plays dagster #ask-community

Is there a way to arbitrarily parametrize assets t...

Anthony Carapetis

05/10/2023, 4:08 AM

Is there a way to arbitrarily parametrize assets that plays well with caching/staleness detection? I have a pipeline built as a graph of

@asset

s, and would to like to parametrize it to operate on an arbitrary input dataset, along with having some adjustable configuration values (which would affect the outputs!) Each op/asset in the graph would depend only on a subset of the configuration values, and it would be great to be able to re-use an existing materialization when the relevant config values haven't changed. Using dynamic partitions works for parametrizing on the input dataset, but I'm at a loss as to how to implement the configuration - dagster's

Config

isn't taken into account in an asset's

data_version

, so in the context of assets I assume it's intended to be used for things that don't affect the output?

sean

05/10/2023, 1:25 PM

Hi Anthony, At present config is not taken into account in automatically generated data versions, but this is on our roadmap for the near future. You are correct that dynamic partitions is the way to go for arbitrary parametrization. Similarly, staleness and partitions don’t play nicely together at present, but this is a very active area of development and should see major improvements in the next week or two. One possible temporary solution for the config issue to generate your own data versions that take into account the config-- you can return a

DataVersion

in an

Output

, and it will be used in place of dagster’s auto-generated versions.

Anthony Carapetis

05/17/2023, 4:23 AM

Hi @sean, thanks for your reply! Good to hear the team is thinking about these things, I'll keep an eye on development 🙂

Anthony Carapetis

05/17/2023, 5:27 AM

I'll also have a think about whether config + custom data versions will work, thanks for the idea.

6 Views

Open in Slack

Previous Next