Hi team! Does anyone know how to avoid the typing...
# ask-community
h
Hi team! Does anyone know how to avoid the typing error when summing two different inputs for `AssetGroup`s
assets
default parameter? Specifically the problem when adding ( with the ‘+’ operator) the output of
dbt_assets = load_assets_from_dbt_project()
which is a
Sequence[AssetsDefinition]
AND a function
csv_assets = csv_assets_for_dbt_assets(dbt_assets)
with the output type
list[AssetsDefinition]
. The result is a Pylance error described below.
Copy code
Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"
  Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"PylancereportGeneralTypeIssues
See the sample code in the comments for more details.
Copy code
dbt_assets = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select=f"tag:{DATASET_TYPE}")


def csv_assets_for_dbt_assets(dbt_assets):
    outs = {}
    deps = {}
    for asset_key in dbt_assets[0].asset_keys:
        table_name = asset_key.path[-1]
        if "rig" in table_name:
            outs[table_name] = Out(asset_key=AssetKey(["gcs", table_name]))
            deps[table_name] = {AssetKey(table_name)}
        else:
            continue

    @multi_asset(outs=outs, non_argument_deps=set(dbt_assets[0].asset_keys), compute_kind="gcs")
    def _assets(context):
        for table_name in outs.keys():
            # --query is so ugly because we need to use a workaround to get .csv files with headers.
            # <https://stackoverflow.com/questions/51271931/exporting-from-cloud-sql-to-csv-with-column-headers>
            gcl_export_table_comm = f"gcloud sql export csv moonfire-01 <gs://moonfire-training-data/sector_cls/train_test_eval/{DIR_ULID}_train_{TRAIN_SPLIT}_test_{TEST_SPLIT}/{DATASET_TYPE}/{table_name}.csv> --database=moonfire --query=\"SELECT 'text' AS text, 'labels' AS labels UNION ALL SELECT text, labels FROM {table_name}\""  # noqa: E501
            <http://context.log.info|context.log.info>(f"The export command was run! {gcl_export_table_comm}")

            subprocess.call(gcl_export_table_comm, shell=True)
            yield Output(table_name, table_name)

    return [_assets]


# This is the list of assets that will be created by this job.

csv_assets = csv_assets_for_dbt_assets(dbt_assets)
all_assets = dbt_assets + csv_assets
sector_cls_all_assets = AssetGroup(
    assets=all_assets,
    resource_defs={
        "dbt": dbt_cli_resource.configured(
            {
                "project_dir": DBT_PROJECT_DIR,
                "profiles_dir": DBT_PROFILE_DIR,
        )
    },
).build_job("sector_cls_all_assets")
The problem is that
dbt_assets
is of type
Sequence[AssetsDefinition]
but csv_assets is of type
list[AssetsDefinition]
. Pylance raises a typing error when trying to add these as shown below. But the code works and allows the addition regardless.
Copy code
all_assets = dbt_assets + csv_assets
Is there a way to override/incorporate the addition operation in the documentation allowing the two to be summed? If not, how can I ensure that
csv_assets + dbt_assets
no longer returns this error?
o
hi @Harpal! I'll look into the underlying issue (Sequence not playing nicely with the list type), but two quick options to fix your type error:
Copy code
all_assets = list(dbt_assets) + csv_assets
or, for a slightly more performant solution (cast doesn't do anything at runtime)
Copy code
from typing import cast

# ...

all_assets = cast(list, dbt_assets) + csv_assets
h
Thanks @owen ! That did the trick dagsir