The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hello, I _think_ I have a use case for partition sets, but I want to make sure I'm grokking your design intent for this feature.

We have a bunch of simple jobs we run in a different tool currently that perform a daily truncate / reload of data from Salesforce objects into their respective Snowflake table. After reading your docs, I'm thinking I could adapt partition sets to this problem by writing a single pipeline to handle the truncate / reload functionality, and a partition function which returns the list of Salesforce objects to load.

Does this sound like a valid way to use partition sets, or is there a better way I'm missing?

Hi <@U01N2QTJ3PF> - that does sound like a good use for partition sets

image.png

How would the range argument work in this example? Alphabetically?

it would be based on whatever order the partitions are listed in in the list that you return in your partition function

<@U011CET83FG> - Do you think the docs should be updated if a partition is more a general purpose mechanism for managing any list that needs to be processed? I think it speaks specifically to timebased and backfill

<@U011CET83FG> with the release of DynamicOutput would that be the preferred mechanism now?

<@U01N81Y6FLM> that is good feedback of about the docs - I will think about how to incorporate that

regarding dynamic orchestration, that's a good question.  I think it comes down to how tightly coupled you want the runs for the different salesforce objects to be.  if you will typically want to load all the salesforce objects at the same time, then DynamicOutputs would work well.  if you want to be able to easily kick off runs for separate objects individually, then partition sets might make more sense