Huib Keemink

02/08/2022, 10:57 PM
Is there a way to combine different partition strategies? For instance, having a static partition together with a date partition? My usecase would be to kick off a dump from a database into datalake. The tables can then be something to partition on. This would allow for a super clean job definition, easy to kick-off specific runs through the interface, and a nice way of separating runs (if one table fails, keep loading the other tables).
I’ve curently solved this by creating
jobs (one for each table) from a graph. This works fine, but kicking off a backfill / enabling the schedule is a bit of work, not to mention the clutter in dagit


02/09/2022, 12:57 AM
We unfortunately don’t have a great solution here… I think people have defined dynamic partition config to generate the flattened set of partitions (time_window x static set), but this is also clunky and we don’t have great backfill / scheduling tools for selecting these structured partitions for execution.

Huib Keemink

02/09/2022, 8:00 AM
Got it. Is there a way for dagster to signal that a failed op should fail the job without cancelling the other ops that do are not downstream?
imagine having parallel streams (query from source, write to destination, clean up) that are in the same job, but are completely independent. One failed stream can fail the job, but I’d like the other streams to still run. Is this possible in some way?
The reason for asking is that this would essentially fix my usecase. I can then have a single job with a single schedule and a single backfill, with insights in where things go wrong, without aborting everything as soon as one ingestion step fails