I'm trying to use Dagster to control ETL flow for ...
# ask-community
u
I'm trying to use Dagster to control ETL flow for data warehouse projects but... I've read through the doc and searched for the answer. It seems the scheduler CANNOT support
incremental refresh
. Like this case: 1. I need the scheduler run ONCE A DAY 2. The scheduler runs job which contains ASSETS that are PARTITIONED BY MONTH 3. I need the scheduler refresh the MOST RECENT TWO MONTHS of data I've tried the
build_schedule_from_partitioned_job
function - it automatically runs the job once a month because the assets are partitioned by month. I've tried using
ScheduleDefinition
- It just won't work and returned error. Please. Any workaround?
f
You can just define you own schedule and provide it a job which executes the assets you want to. The Assets can be partitioned by whatever you like. In my case it is a dynamic partition, which is not related to time. By the way, why would need a dynamic partition in combination with monthly partition?
Copy code
@schedule(job=partitioned_data_job, cron_schedule=per_asset_timer,
          default_status=DefaultScheduleStatus.RUNNING)
def my_schedule(context: ScheduleEvaluationContext):
    partitions = context.instance.get_dynamic_partitions(partitions_def_name="some_dynamic_partition")
    for part in partitions:
        yield RunRequest(run_key= part, partition_key= part)
👍 1
o
just chiming in here, @Frederik Löw's answer is correct here, although it seems like you probably don't need dynamic partitions if you're just using time-partitioned data (instead, you could just use a MonthlyPartitionsDefinition)