Danny Steffy
08/23/2023, 7:49 PMraw_table
that has date and id. I'm trying to create an incremental dbt asset that is partitioned on date, but when I try the insert_overwrite
strategy, it tells me I've reached my quota for insert/append for the table. When I try to use the merge
strategy with partitioning on date, and the unique_id being id, I get an error saying I'm creating a table with over 4000 partitions, even when I try to materialize the asset for only a year's worth of data. When I try removing the partitioning, I get an error from dagster saying there is no partition_expr
metadata value. Is there a better way to handle this? My source data table in BigQuery has data ranging from 1990-2023, although not every day actually has datajamie
08/23/2023, 7:54 PMDanny Steffy
08/23/2023, 7:55 PMDanny Steffy
08/23/2023, 7:55 PM@dbt_assets(
manifest=Path(DBT_PROJECT_DIR, "target", "manifest.json"),
partitions_def=nhl_daily_partition,
io_manager_key="warehouse_io_manager",
dagster_dbt_translator=NHLDagsterDbtTranslator(),
)
def dbt_assets(context: OpExecutionContext, dbt: DbtCliResource):
time_window = context.asset_partitions_time_window_for_output(
list(context.selected_output_names)[0]
)
dbt_vars = {
"min_date": time_window.start.strftime('%Y-%m-%d'),
"max_date": time_window.end.strftime('%Y-%m-%d')
}
dbt_build_args = ["build", "--vars", json.dumps(dbt_vars)]
yield from dbt.cli(dbt_build_args, context=context).stream()
Danny Steffy
08/23/2023, 7:55 PMjamie
08/23/2023, 7:58 PM@dbt_asset
function. Storing the data from the model is handled by dbt and the IO manager is only used to load the data in downstream assets
• We also don’t use BigQuery native partitions in the BigQuery IO manager right now, which leads me to believe that the issue is coming from how dbt is trying to load the data into BigQueryDanny Steffy
08/23/2023, 7:59 PMjamie
08/23/2023, 8:01 PMDanny Steffy
08/23/2023, 8:03 PMDanny Steffy
08/23/2023, 8:03 PMjamie
08/23/2023, 8:03 PM@dbt_assets
use the IO manager now?Danny Steffy
08/23/2023, 8:04 PMDanny Steffy
08/23/2023, 8:05 PMDanny Steffy
08/23/2023, 8:05 PMDanny Steffy
08/23/2023, 8:06 PMDanny Steffy
08/23/2023, 8:07 PMDanny Steffy
08/24/2023, 2:25 PMDanny Steffy
08/24/2023, 2:25 PMjamie
08/24/2023, 2:45 PMowen
08/25/2023, 8:53 PMNothing
, which should circumvent any IOManager behavior, so definitely odd that adding / removing that would have any impact.
Some things I'd look into:
1. see the exact command that dagster is running on your behalf (this will show up in the logs for the run)
2. see if that looks as you expect (dates formatted as expected and all that)
I'm not familiar with bigquery sql syntax in particular, but I know many sql dialects can be quite picky about how the dates are passed in (often, you'll want to have an explicit date parsing function called, rather than just passing in a bare string), so that might be another place to lookEdo
08/29/2023, 5:09 PMDATE('{{ var('max_date') }}')