Hi, I'm trying to use multiple code locations. The...
# ask-community
t
Hi, I'm trying to use multiple code locations. The way I understood the documentation is that each code location is a self-contained namespace ("Definitions within a code location have a common namespace and must have unique names."). In fact, I even have to redefine assets as SourceAssets if I want to use them across different locations (Github comment by sryza). Therefore, I could use the same asset name in two different code locations without them interfering. However, this leads to an error. Dagit explicitly tells me to rename the assets to avoid collisions. What did I misunderstand? Is there any way to use the same asset names in different code locations? I created a MWE with two almost identical files. If I run
dagster dev -f locationA.py -f locationB.py
, I see the Global Asset Lineage shown in the attached image.
Copy code
# locationA.py
import dagster as dg

@dg.asset
def loc_A_asset_1():
    return "A1"

@dg.asset
def asset_2(loc_A_asset_1):
    return "A2"

defs = dg.Definitions(
    assets=[loc_A_asset_1, asset_2]
)
Copy code
# locationB.py
import dagster as dg

@dg.asset
def loc_B_asset_1():
    return "B1"

@dg.asset
def asset_2(loc_B_asset_1):
    return "B2"

defs = dg.Definitions(
    assets=[loc_B_asset_1, asset_2]
)
c
Hi Thomas. The dagster notion of an asset is that it is a unique object persisted in storage, identified by the name/key. Because of this, there can only be one function that is run when the asset is computed, to prevent different functions from overwriting the object persisted in storage. This is why an error is raised when multiple different definitions are found for the same asset key. Source assets do not have functions--basically they are a way to load the asset's value from a different code location, but are unable to write a persisted value back.
Wondering what your use case is for wanting to have multiple assets with the
asset_2
key? If these assets are persisting values to different locations, here are some ideas for how you can avoid key collisions: • You could add a
key_prefix
, which basically prepends the key with a certain value. I.e.
team_1/asset
,
team_2/asset
etc. • You could assign a different name to one of the assets
t
Hi Claire, thanks for the clarification. I specify different storage locations by using a configurable IO manager. Hence, overwriting is not an issue for me but I can see that this could be the case in general, e.g., the MWE I wrote. Our use case is that we have custom data sources for each customer. The aim is to transform all of them to one, harmonized model. Thus, we use the same asset names to make it easier to understand. Key prefixes seem to be a possible solution. It would be great if I could define them only once for all assets in the code location. Unfortunately, as you mentioned in another discussion, this seems to be impossible as of now.
c
The
load_assets_from_package_module
methods accept a
key_prefix
arg that you can use to auto-assign a key prefix to all the assets, if that helps
❤️ 1
t
Ah, I didn't know. That's great. Thanks a lot!