https://dagster.io/ logo
#dagster-feedback
Title
# dagster-feedback
m

Martin Carlsson

01/18/2023, 10:10 AM
I’m playing around with multi_asset I want to generate assets from a combined list of COMPANIES and ENDPOINTS. It is working fine. However, it looks like I have to put all the generated
asset
names as parameters in the
transform/multi_asset
function. This means that when I add a new company or endpoint I have to hardcode it multiple places in my code. Are there any way to get around that?
Copy code
@multi_asset(
    outs={
        transformed_asset_name(company, endpoint): AssetOut(
            dagster_type=pd.DataFrame,
            io_manager_key="default_io_manager",
            description=f"Transform {company} {endpoint} to dataframe",
            is_required=False,
        )
        for company, endpoint in product(COMPANIES, ENDPOINTS)
    },
    internal_asset_deps={
        transformed_asset_name(company, endpoint): {AssetKey(extract_asset_name(company, endpoint))}
        for company, endpoint in product(COMPANIES, ENDPOINTS)
    },
    can_subset=True,
)
def transform(
    context,
    extract_dk_glentry_get,
    extract_dk_dimensionsetentry_get,
    extract_usa_glentry_get,
    extract_usa_dimensionsetentry_get,
):
    ...
dagster bot responded by community 1
d

Daniel Gafni

01/18/2023, 12:07 PM
Maybe you could instead use an asset factory to generate multiple assets in a for loop
m

Martin Carlsson

01/18/2023, 2:00 PM
@Daniel Gafni Thanks! I wasn’t able to find
asset factory
in the documentation. Could you point me in the right direction?
d

Daniel Gafni

01/18/2023, 2:04 PM
1. have a function generate the asset for tou
Copy code
def get_asset_for_company(company: str) -> AssetDefinition
    @asset(name=company)
    def company_asset()
        return get_data(company)
    return company_asset
2. Use it multiple times (in a for loop)
Copy code
assets = []
for company in COMPANIES:
    assets.append(get_asset_for_company(company))
It's crusial to write the
def get_asset_for_company
OUTSIDE of the for loop, otherwise Python will bake the last iterating value in the asset function (they all will be the same). Typed this in slack so may have some errors but you should get the idea
c

Casper Weiss Bang

01/18/2023, 2:31 PM
We had a bunch of similar issues, and we also used an asset factory. Haven't used a list of assets though - is dagster able to automatically find them like that too? huge Today-I-Learned moment. we have way too many lines like this this:
m

Martin Carlsson

01/18/2023, 2:35 PM
Thanks @Daniel Gafni and @Casper Weiss Bang 🙏 This is something I will start using!
What I ended up doing was … see below … but I don’t like that code, it is impossible for the next person to reason about what I’ve done 🫣
Copy code
@multi_asset(
    ins={
        extract_asset_name(company, endpoint): AssetIn(key=extract_asset_name(company, endpoint))
        for company, endpoint in product(COMPANIES, ENDPOINTS)
    },
    outs={
        transformed_asset_name(company, endpoint): AssetOut(
            dagster_type=pd.DataFrame,
            io_manager_key="default_io_manager",
            description=f"Transform {company} {endpoint} to dataframe",
            is_required=False,
        )
        for company, endpoint in product(COMPANIES, ENDPOINTS)
    },
    internal_asset_deps={
        transformed_asset_name(company, endpoint): {AssetKey(extract_asset_name(company, endpoint))}
        for company, endpoint in product(COMPANIES, ENDPOINTS)
    },
    can_subset=True,
)
def transform(context, **extracts):
    transformed = {}

    for company, endpoint in product(COMPANIES, ENDPOINTS):

        if transformed_asset_name(company, endpoint) in context.selected_output_names:

            transformed[transformed_asset_name(company, endpoint)] = transform_to_dataframe(
                extracts.get(extract_asset_name(company, endpoint))
            )

            yield Output(
                value=transformed[transformed_asset_name(company, endpoint)],
                output_name=transformed_asset_name(company, endpoint),
                metadata={"Number of rows": len(transformed[transformed_asset_name(company, endpoint)].index)},
            )
c

Casper Weiss Bang

01/18/2023, 2:38 PM
haha what i initially had was pretty similar. actually a bit worse. but similar
d

Daniel Gafni

01/18/2023, 2:38 PM
@Casper Weiss Bang you can pass the list of assets to the
@repository
or
Definitions
or even load them from the module with the corresponding dagster function
👍 1
@Casper Weiss Bang a fellow Nord Theme user! nice
c

Casper Weiss Bang

01/18/2023, 2:40 PM
I use nord everywhere i can. even my website https://cwb.dk/
d

Daniel Gafni

01/18/2023, 2:41 PM
Same. I have Nord everywhere (dotfiles although the screenshot is really outdated).
4 Views