The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Is it possible to run partitions one by one (serial) ?

Hey <@U0469U3S0BD>, great question!
I asked for something related recently in the CoRise slack, and <@UULA0R2LV> kindly replied:
&gt; it depends on how you kick off the job. if you kick them off using the backfill button in the UI, they should be run sequentially (e.g. if date partitioned, dagster will run the earliest date first)


and I was wondering if there's a way to specify the sorting key

e.g. when you are using <https://docs.dagster.io/_apidocs/partitions#dagster.dynamic_partitioned_config|dynamic partitions>, it it possible to define a sorting key (or sorting date) which is different than the partition key?

<@UULA0R2LV> :dagster-angel: is it possible?

hey Rafael - can you tell me a little more about your use-case? is this with assets or op-based jobs? are you using time partitions? do you want them to be serial to avoid too much concurrent load on your systems, or because later partitions depend on the data in earlier partitions?

In my case since the order doesn't matter, I end up using the following configuration (due high load in my system):
```tag_concurrency_limits:
      - key: "dagster/backfill"
        limit: 1```


that's what I was going to recommend - does address your issue, or are there remaining gaps?

It did solve my issue :slightly_smiling_face: thanks anyway

hey <@U011CET83FG>, sorry to throw in my question. For my use case, later partitions depend on the data from previous partition

<@U02KDE1D13N> got it - the previous partition of the same asset or an upstream asset? if it's an upstream asset, the asset reconciliation sensor should handle this if you use a custom partition mapping. if it's the same asset, I'm actually currently investigating what it would look like to support this

and are you using ordered partitions that aren't time-based? I'm curious- what kinds of partitions?

it's the previous partition of the same asset

we have a bunch of database backup (.bak) files in S3 and we need to process them, sequentially. we can't use time-base partitions because the number of bak files per day is variable

there's generally one backup per day, but for some days we are missing the bak files. and there's a possibility that the backup frequency gets increased to multiple times per day in the future

so I was planning to use the bak_file_name as the partition key

got it - there's no great builtin way to do that right now, but we're interested in supporting it in the future. you could always write a custom sensor that submits runs in the order you expect

I also thought of using a custom sensor, but was not sure of how it would work

I suppose the sensor could read from the s3 bucket, sort the list of file names, and yield a job run for each file

would I have to implement a logic in the sensor to determine if a file was already processed, or Dagster takes already care of running each job "key" exactly once?

if you include values for the `run_key` argument of your  `RunRequest`s, Dagster will avoid launching duplicate runs for the same run key