Harpal
07/14/2022, 3:18 PMassets
default parameter?
Specifically the problem when adding ( with the ‘+’ operator) the output of dbt_assets = load_assets_from_dbt_project()
which is a Sequence[AssetsDefinition]
AND a function csv_assets = csv_assets_for_dbt_assets(dbt_assets)
with the output type list[AssetsDefinition]
. The result is a Pylance error described below.
Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"
Operator "+" not supported for types "Sequence[AssetsDefinition]" and "list[AssetsDefinition]"PylancereportGeneralTypeIssues
See the sample code in the comments for more details.Harpal
07/14/2022, 3:23 PMdbt_assets = load_assets_from_dbt_project(project_dir=DBT_PROJECT_DIR, select=f"tag:{DATASET_TYPE}")
def csv_assets_for_dbt_assets(dbt_assets):
outs = {}
deps = {}
for asset_key in dbt_assets[0].asset_keys:
table_name = asset_key.path[-1]
if "rig" in table_name:
outs[table_name] = Out(asset_key=AssetKey(["gcs", table_name]))
deps[table_name] = {AssetKey(table_name)}
else:
continue
@multi_asset(outs=outs, non_argument_deps=set(dbt_assets[0].asset_keys), compute_kind="gcs")
def _assets(context):
for table_name in outs.keys():
# --query is so ugly because we need to use a workaround to get .csv files with headers.
# <https://stackoverflow.com/questions/51271931/exporting-from-cloud-sql-to-csv-with-column-headers>
gcl_export_table_comm = f"gcloud sql export csv moonfire-01 <gs://moonfire-training-data/sector_cls/train_test_eval/{DIR_ULID}_train_{TRAIN_SPLIT}_test_{TEST_SPLIT}/{DATASET_TYPE}/{table_name}.csv> --database=moonfire --query=\"SELECT 'text' AS text, 'labels' AS labels UNION ALL SELECT text, labels FROM {table_name}\"" # noqa: E501
<http://context.log.info|context.log.info>(f"The export command was run! {gcl_export_table_comm}")
subprocess.call(gcl_export_table_comm, shell=True)
yield Output(table_name, table_name)
return [_assets]
# This is the list of assets that will be created by this job.
csv_assets = csv_assets_for_dbt_assets(dbt_assets)
all_assets = dbt_assets + csv_assets
sector_cls_all_assets = AssetGroup(
assets=all_assets,
resource_defs={
"dbt": dbt_cli_resource.configured(
{
"project_dir": DBT_PROJECT_DIR,
"profiles_dir": DBT_PROFILE_DIR,
)
},
).build_job("sector_cls_all_assets")
The problem is that dbt_assets
is of type Sequence[AssetsDefinition]
but csv_assets is of type list[AssetsDefinition]
.
Pylance raises a typing error when trying to add these as shown below. But the code works and allows the addition regardless.
all_assets = dbt_assets + csv_assets
Is there a way to override/incorporate the addition operation in the documentation allowing the two to be summed?
If not, how can I ensure that csv_assets + dbt_assets
no longer returns this error?owen
07/14/2022, 4:32 PMall_assets = list(dbt_assets) + csv_assets
or, for a slightly more performant solution (cast doesn't do anything at runtime)
from typing import cast
# ...
all_assets = cast(list, dbt_assets) + csv_assets
Harpal
07/14/2022, 4:56 PM