https://dagster.io/ logo
Title
a

Arun Kumar

10/27/2021, 3:25 AM
Hi team, is there any way to configure a job with multiple partition sets? I would want to be able to have both monthly and daily partitions on my jobs.
m

max

10/27/2021, 3:30 AM
would two otherwise identical jobs with different partitions work for you?
a

Arun Kumar

10/27/2021, 3:36 AM
Hi Max, thanks for the response. That might work, but would result in duplicate job definitions. I was wondering if there is any other nicer way to do this. For context, most of our pipelines write data to some snowflake table in a daily incremental fashion and daily partitions work really well here. However, often we might want to run backfills for 2/3 months and this would result in large number of runs (1 for each date). Also snowflake has a lot of constraints on concurrent updates and it is becoming challenging to run backfills. If there are multiple partition sets, I can make the SQLs configurable by date and run monthly backfills in a single SQL.
❤️ 1
m

max

10/27/2021, 4:09 AM
cc @sandy
s

sandy

10/28/2021, 5:15 AM
Hi @Arun Kumar - Dagster currently has the constraint that, if you're using a partitioned job, each job run corresponds to a single partition. This means that the Dagster partitioned jobs can't model "this single job run backfilled 30 different partitions". If you want that flexibility, you could write a job that accepts a time range as config and a schedule that runs it daily with config for that day. To run "backfills", people could launch the job from the Launchpad (what used to be called the Playground) and supply the time range they want to backfill as config. However, the job wouldn't be able to take advantage of the Partitions UI. We'd ultimately like to be able to address the pattern you brought up about with asset partitions - a single job run would be able to fill in multiple partitions for an asset, and you'd be able to go to the asset page to view a matrix that shows the status of each of the asset's partitions.
a

Arun Kumar

10/28/2021, 10:25 PM
Thanks for explaining the alternatives and the long term solution. I actually want to use the partitions UI and was actually thinking of doing what Max proposed. I am going to try creating a duplicate job definition with a monthly partitioned config which will run the same code as the daily job and use this monthly job only for backfills.
@sandy I saw a lot of updates on asset based jobs. Just wanted to check once again if this is possible now with asset based jobs?
s

sandy

11/19/2021, 4:24 PM
@Arun Kumar it's not yet possible, but we're working on changes that I believe should make this possible soon
:thankyou: 1
c

Chris Chan

11/24/2021, 6:37 PM
Also interested in this since we have EMR jobs that are daily partitioned, but when running backfills it’s easier to have one cluster handle multiple dates than to spin up a cluster for each date
s

sandy

11/24/2021, 7:09 PM
@Chris Chan that makes a lot of sense - will keep you posted