https://dagster.io/ logo
Title
r

Rasmus Bonnevie

04/07/2023, 7:09 AM
are asset factories an anti-pattern or best practice? I Frequently find the need to build x assets with the same core logic but different parameters. E.g. let's say I have 10x configuration objects
complex_config_x
, and a function
config_to_asset(cfg)
, how do I turn this into 10 assets in the most dagsteronic way? Factories? Configured assets? Graph assets with configured ops? Partitions?
n

Nicolas Parot Alvarez

04/07/2023, 10:02 AM
We're using a lot of factories for all the Dagster objects (assets, ops, sensors), it's also mentioned in the doc, I think it's a valid pattern. It does add a layer of complexity to the code, but remaining DRY is more important.
👍 2
👍🏻 1
d

David Merritt

04/07/2023, 4:06 PM
Same here - using factories to generate N assets from a static config list
r

Rasmus Bonnevie

04/07/2023, 4:09 PM
yes, I've been using them extensively already, was just wondering with all the work going into configuration whether there was a way to make it more explicit that the assets were using the same underlying definition. Then again, I also find that I often need to dynamically build dependencies and the like, so might be no way around the factory
s

sandy

04/07/2023, 6:54 PM
I think factories are generally considered a good practice. One potential drawback I could think of is that they don't explicitly allow the UI to represent the fact that two different assets share the same underlying template - is that important to you?
:dagster: 1
l

Leo Qin

04/07/2023, 8:57 PM
I also have a wealth of assets that are configured as yml entries and generated by factories. It works really well to hide the configuration interface from users, just giving them a button to push when they need a materialization. I've kind of hijacked/repurposed the
compute_kind
label to denote what kind of template it's following. It's a nice visible label and it doesn't force users to stick to a certain key prefix or asset group.
r

Rasmus Bonnevie

04/10/2023, 9:00 AM
I think it's useful to be able to identify similar assets in the UI, similar to what Leo is suggesting with using the badges (edge annotations could also be nice at some point like "clean", "copy", etc) My main reason for raising the topic was that the factory pattern is passing configuration in a somewhat oblique fashion. At the moment, I need to build a sequence of assets that are produced in a serial fashion and need to be aware of prior history, so I have a factory like the following:
config = PydanticModel()

    def chain_factory(config, last_chain_asset):
        @asset(
            ins={"last_chain_link": AssetIn(key=last_chain_asset.key)},
            metadata={"config": MetadataValue.json(config.dict())},
        )
        def chain_node_asset(last_chain_link):
            # config passed as closure
            return asset_from_config(config, last_chain_link)

    next_asset = chain_factory(config, first_asset)
but that means that we completely bypass all notions of dagster configuration and only keep an informal record in the metadata. Could this be done otherwise?