Hi team Does anyone know how to avoid the typing error when dagster #ask-community

Hi team! Does anyone know how to avoid the typing...

Harpal

07/14/2022, 3:18 PM

Hi team! Does anyone know how to avoid the typing error when summing two different inputs for `AssetGroup`s

assets

default parameter? Specifically the problem when adding ( with the ‘+’ operator) the output of

dbt_assets = load_assets_from_dbt_project()

which is a

Sequence[AssetsDefinition]

AND a function

csv_assets = csv_assets_for_dbt_assets(dbt_assets)

with the output type

list[AssetsDefinition]

. The result is a Pylance error described below.

Copy code

Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"
  Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"PylancereportGeneralTypeIssues

See the sample code in the comments for more details.

Harpal

07/14/2022, 3:23 PM

Copy code

dbt_assets = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select=f"tag:{DATASET_TYPE}")


def csv_assets_for_dbt_assets(dbt_assets):
    outs = {}
    deps = {}
    for asset_key in dbt_assets[0].asset_keys:
        table_name = asset_key.path[-1]
        if "rig" in table_name:
            outs[table_name] = Out(asset_key=AssetKey(["gcs", table_name]))
            deps[table_name] = {AssetKey(table_name)}
        else:
            continue

    @multi_asset(outs=outs, non_argument_deps=set(dbt_assets[0].asset_keys), compute_kind="gcs")
    def _assets(context):
        for table_name in outs.keys():
            # --query is so ugly because we need to use a workaround to get .csv files with headers.
            # <https://stackoverflow.com/questions/51271931/exporting-from-cloud-sql-to-csv-with-column-headers>
            gcl_export_table_comm = f"gcloud sql export csv moonfire-01 <gs://moonfire-training-data/sector_cls/train_test_eval/{DIR_ULID}_train_{TRAIN_SPLIT}_test_{TEST_SPLIT}/{DATASET_TYPE}/{table_name}.csv> --database=moonfire --query=\"SELECT 'text' AS text, 'labels' AS labels UNION ALL SELECT text, labels FROM {table_name}\""  # noqa: E501
            <http://context.log.info|context.log.info>(f"The export command was run! {gcl_export_table_comm}")

            subprocess.call(gcl_export_table_comm, shell=True)
            yield Output(table_name, table_name)

    return [_assets]


# This is the list of assets that will be created by this job.

csv_assets = csv_assets_for_dbt_assets(dbt_assets)
all_assets = dbt_assets + csv_assets
sector_cls_all_assets = AssetGroup(
    assets=all_assets,
    resource_defs={
        "dbt": dbt_cli_resource.configured(
            {
                "project_dir": DBT_PROJECT_DIR,
                "profiles_dir": DBT_PROFILE_DIR,
        )
    },
).build_job("sector_cls_all_assets")

The problem is that

dbt_assets

is of type

Sequence[AssetsDefinition]

but csv_assets is of type

list[AssetsDefinition]

. Pylance raises a typing error when trying to add these as shown below. But the code works and allows the addition regardless.

Copy code

all_assets = dbt_assets + csv_assets

Is there a way to override/incorporate the addition operation in the documentation allowing the two to be summed? If not, how can I ensure that

csv_assets + dbt_assets

no longer returns this error?

owen

07/14/2022, 4:32 PM

hi @Harpal! I'll look into the underlying issue (Sequence not playing nicely with the list type), but two quick options to fix your type error:

Copy code

all_assets = list(dbt_assets) + csv_assets

or, for a slightly more performant solution (cast doesn't do anything at runtime)

Copy code

from typing import cast

# ...

all_assets = cast(list, dbt_assets) + csv_assets

Harpal

07/14/2022, 4:56 PM

Thanks @owen ! That did the trick dagsir

8 Views

Open in Slack

Previous Next