Howdy Dagster friends I m trying to wire up a `dynamic parti dagster #ask-community

Howdy, Dagster friends! I’m trying to wire up a `...

trevenrawr

03/29/2022, 11:34 PM

Howdy, Dagster friends! I’m trying to wire up a

dynamic_partitioned_config

and am curious if there’s a generally-accepted way of passing richer information to the partition config generation function than a

parition_key

can contain on its own. The old

Partition

class was a nice wrapper that allowed for a key or display name to be paired with rich information that could be used to configure a partitioned run, but it seems like the new paradigm makes the key and the information one and the same. (Oh, it looks like

Partition

is still in use internally but only the name is returned; any particular reason not to open that back up?)

trevenrawr

03/29/2022, 11:48 PM

Oh, looks like I could build my own

PartitionedConfig

since

DynamicPartitionsDefinition

seems to allow for a

partition_fn

that returns `Partition`s instead of strings… Giving that a go!

prha

03/29/2022, 11:58 PM

Yep! That should work! Curious, because I’m actively looking at the partitions/backfill API. What types of objects are you using for your partitions? Why is it so much better to use custom objects rather than string keys?

prha

03/30/2022, 12:00 AM

(Asking because these affect the APIs that we use between dagster framework code and user code - the custom objects typically cannot pass across a process boundary and so are more constrained)

trevenrawr

03/30/2022, 12:07 AM

My data is partitioned across tenants, not dates, and configuring a job for a given tenant is not as simple as passing the tenant name along; I need at least 4 bits of information to properly configure a run. I thought about shoving them all into the partition key (and parsing it on the other end), but then partition key space seems like it would explode over time and I’d generally like to keep runs for tenants grouped in the same partition (just tenant name) to make it easy to request backfills, etc.

trevenrawr

03/30/2022, 12:07 AM

The objects are basically just dictionaries. (Back to the original question you asked, haha.)

trevenrawr

03/30/2022, 1:37 AM

Building my own wasn’t too bad! Here’s the basic gist in case anyone else wants a pattern to follow:

Copy code

def partition_fn(_current_time):
    job_configs = build_job_configs()
    return [Partition(job_config, f"{job_config.tenant}-{job_config.env}") for job_config in job_configs]

def run_conf_fn(partition):
    job_config = partition.value
    return make_run_config(job_config)

return PartitionedConfig(
    partitions_def=DynamicPartitionsDefinition(partition_fn),
    run_config_for_partition_fn=run_conf_fn,
    decorated_fn=run_conf_fn,
)

I know the internal APIs aren’t really meant for external use, but it’s not entirely clear to me why the

PartitionedConfig

needs a

run_config_for_partition_fn

and also a

decorated_fn

when the same function works for both.

trevenrawr

03/30/2022, 8:36 PM

Also: the ability to choose a partitioned config from a dropdown in the Launchpad is DOPE. dagpurr

prha

03/30/2022, 9:37 PM

Oh I see. Yeah, this might be related to multi-dimensional partitioning (https://github.com/dagster-io/dagster/issues/4591), which is a long-standing issue that we’ve had in the back of our minds for a while now.

dagsir 1

8 Views

Open in Slack

Previous Next