Heiner Hippke
07/19/2023, 10:09 PMclaire
07/19/2023, 11:55 PM@asset
def my_asset(context):
partition_key = context.partition_key
metadata = my_partition_metadata(partition_key)
return Output(..., metadata=metadata)
This would attach specific metadata to the partitioned materialization whenever you execute the asset (in a backfill or ad-hoc materialization)claire
07/19/2023, 11:59 PMmy_repo.get_job("partitioned_asset_job").execute_in_process(
instance=instance,
tags={ASSET_PARTITION_RANGE_START_TAG: start_partition_key, ASSET_PARTITION_RANGE_END_TAG: end_partition_key},
)
Heiner Hippke
07/20/2023, 7:53 AM@asset
def my_asset(context):
p_start = context.asset_partition_key_range.start
p_end = context.asset_partition_key_range.end
# create asset for partitions in partition range
metadata_export = export_data_from_db_to_gcs(p_start, p_end)
metadata_import = import_data_from_gcs_to_bq()
# gather partition-wise metadata
partition_info = read_partition_metadata_from_information_schema(p_start, p_end)
for partition in partition_info:
context.log_event(AssetObservation(
f'{DBT_BQ_PROJECT_NAME}/{trg_table_name}',
partition=partition.day, metadata={
'bytes': partition.total_billable_bytes,
'rows': partition.total_rows,
}))
# return None-Output since asset has been materialized in asset itself
# metadata might be empty or not. - does not make a difference for described problem
metadata = metadata_export.copy()
metadata.update(metadata_import)
return Output(None, metadata=metadata)
This solution yields the metadata attached to the Output object to be the youngest valid metadata for all partitions of that asset. The metadata attached to each partition via AssetObservations is older since its logged beforehand. And hence the metadata plots in dagit do not display the intended by partition metadata:
I loaded the partitions from 2023-06-15 to 2023-06-30 in a single backfill with no metadata attached to the Output object and the partitions from 2023-07-01 to date with attaching rows = total rows importet in that single backfill run.
In both cases the plots do not show the actual metadata of the partitions. In the first case no values are shown. In the latter all partitions show the total number of rows in the second run (2023-07-01 - 2023-07-19)
I think the best solution would be to introduce a metadata_by_partition_key-like feature.claire
07/22/2023, 12:25 AMclaire
07/22/2023, 12:25 AMHeiner Hippke
07/24/2023, 11:01 AM