Hello hello ! I have to create partitions from an API witch basically return what should be in the partition, and launch an ETL with this partitions every days. Until now, I was building partition using PartitionedConfig with partition_fn param. Then, I build a config, using PartitionedConfig, and configure my job with this.
def get_partitions(current_time):
	return build_partitions_from_api(call_api())

partition_definition = DynamicPartitionsDefinition(get_partitions)

def get_run_config_for_partition(partition):
	build_config(blah blah)

partitioned_config = PartitionedConfig(partition_definition, get_run_config_for_partition)

def my_etl():
	blah blah
I see severals problems : - API is called too often - I use a configured job. I would prefer use partition_def My second solution is : Use a DynamicPartitionsDefinition with a name, and create an asset that add a partition with context.instance.add_dynamic_partitions() Then, I can add a schedule that materialize this asset and run my job. Is there a better solution?
Yes. But I dont want just a time partition. My API return more complex data as a list of configs with dates range and more. I want to query the API every days only to see if partitions have changed, but be able to run backfill on a specific configuration.
then dynamic partition would be better. similar to how i approach my problem in the other thread. i have s3 files that come in randomly, and i'm dynamically partitioning them based on s3 key
so using
allows me to keep track if the partition has been processed or not
Hi Cedric, +1 to using dynamic partitions. One possible way you could structure your code is to have a sensor or a schedule that adds the dynamic partition, then kicks off a partitioned run accordingly.
