Sebastian Charrier
02/06/2023, 8:05 PM@asset(compute_kind="random", io_manager_key="io_manager_raw")
def orders() -> pd.DataFrame:
data = pd.read_csv("<https://docs.dagster.io/assets/orders.csv>")
return data
dbt_assets = load_assets_from_dbt_project(
DBT_PROJECT_DIR,
DBT_PROFILES_DIR,
)
raw_data_assets = load_assets_from_package_module(
raw_data,
group_name="raw_data",
# all of these assets live in the duckdb database, under the schema raw_data
key_prefix=["raw_data"],
)
resources = {
"io_manager": bigquery_pandas_io_manager.configured(
{
"credentials": {"env": "BIGQUERY_SERVICE_ACCOUNT_CREDENTIALS"},
"project_id": {"env": "BIGQUERY_PROJECT_ID"},
"dataset_id": "analytics"
}
),
# this io_manager is responsible for storing/loading our pickled machine learning model
"model_io_manager": fs_io_manager,
# this resource is used to execute dbt cli commands
"dbt": dbt_cli_resource.configured(
{"project_dir": DBT_PROJECT_DIR, "profiles_dir": DBT_PROFILES_DIR}
),
}
defs = Definitions(
assets=[*dbt_assets,*raw_data_assets],
resources=resources,
)
I am also trying to write data in bigquery in 2 different datasets.
raw_data in one dataset and dbt generated data in a different dataset called analytics
Thank you in advanceSean Lopp
02/06/2023, 8:17 PMsources.yaml
file look like for the dbt project?
In that example, the BQ IO manager assumes a single dataset and then it names the table with the entire asset key (prefix + asset name).
https://github.com/dagster-io/quickstart-gcp/blob/main/quickstart_gcp/io_managers.py#L34-L38
So you could potentially do a few things:
• update sources.yaml to tell it to read from raw_data___orders
• remove the key prefix altogether
• tweak the IO manager behavior, for example if you want to only use the asset name for the table name, you'd do something like this:
https://github.com/slopp/dagster-conditional-etl-gcp-demo/blob/main/dagster_project/resources.py#L80-L81Sebastian Charrier
02/06/2023, 8:22 PMversion: 2
sources:
- name: raw_data
tables:
- name: orders
- name: users
- name: forecasting
tables:
- name: predicted_orders
I change it to
sources:
- name: raw_data
tables:
- name: raw_data__orders
- name: users
- name: forecasting
tables:
- name: predicted_orders
but this is what happensSean Lopp
02/06/2023, 8:42 PMSebastian Charrier
02/06/2023, 8:48 PMjamie
02/06/2023, 8:49 PMSebastian Charrier
02/06/2023, 8:49 PMMikeVL
02/06/2023, 9:22 PM