assaf
06/06/2021, 7:34 AMarchiving_inventory_parquet_files = databricks_process_inventory()
for pq in archiving_inventory_parquet_files:
run_archiving_pipeline_dagster(pq)
To better track progress of the actual archiving, I wanted to use partition sets, where each partition represents a single parquet file. What I did was output the list of parquet files as a CSV, save that as a file next to my pipeline code, and generate a PartitionSetDefinition
from that. However, I need to add such a CSV to my code whenever I rerun the Databricks code and generate a new partition set. Ideally, I would like to wrap that Databricks job in a Dagster pipeline, and have it add a partition set to the other pipeline when it's done.
My question is: is there a way to dynamically update the partition sets (namely, add a new one) for pipeline B, using output from pipeline A? Maintaining that state in Dagster is not a trivial ask, I understand, but has anybody else come across a similar challenge?alex
06/07/2021, 3:37 PM