I m having a lot of trouble with partitioned config it seems dagster #dagster-feedback

I'm having a lot of trouble with partitioned confi...

Sean Quinlan

06/30/2023, 1:25 AM

I'm having a lot of trouble with partitioned config, it seems like it should be perfect for my use case on paper: I generate the same type of asset for a number of configurations, where the configurations can be enumerated with a string. But I keep getting lost using the feature. So I started with something like this:

Copy code

CONFIG_KEYS = ['a', 'b', ...] 

@op
def load_config(config_key: str):
    # use a resource to load the config by key
    return {}

@asset(partitions_def=StaticPartitionsDefinition(CONFIG_KEYS))
def do_something(context):
    config = load_config(context.partition_key);
    # run the rest of the graph
    pass

I wasn't entirely happy with this. I would like leverage the Dagster Config abstraction. PartitionedConfig sounded like just the ticket on paper, but I realize it's actually strangely much more verbose for the same end result:

Copy code

@static_partitioned_config(partition_keys=CONFIG_KEYS)
def my_config(partition_key: str):
    # use a resource to load the config by key
    config = {}
    return { 
        "ops": { 
            "load_config": { 
                "config": config
            }
        }
    }

@op
def load_config(context, config: my_config):
    return config

@job
def do_something(context):
    config = load_config()
    # run the rest of the graph
    pass

Why do I have to map to ops in my_config? Worse, I can't actually figure out how to provide the schema for my config with this setup. It seems very vaguely covered in the documentation. I must be totally misunderstanding this feature.

sean

06/30/2023, 3:45 PM

Hi Sean, I agree

PartitionedConfig

is not very well-documented. I think the main point of confusion here is between run config and op/asset config. Run config is the full set of configuration bindings for a run. It includes config for logging, execution, and more. It also includes the config for each op/asset under the top-level “ops” key.

PartitionedConfig

is for generating run config from a partition key. That is why when you use

PartitionedConfig

, you need to specify “ops”.

Worse, I can’t actually figure out how to provide the schema for my config with this setup.

Here’s a minimal example of how to use

PartitionedConfig

to configure a simple job with a single asset:

Copy code

from dagster import (
    Config,
    Definitions,
    StaticPartitionsDefinition,
    asset,
    define_asset_job,
    static_partitioned_config,
)

PARTITION_KEYS = ["a", "b"]


@static_partitioned_config(partition_keys=PARTITION_KEYS)
def my_config(partition_key: str):
    return {"ops": {"do_something": {"config": {"some_param": f"hello {partition_key}"}}}}


class DoSomethingConfig(Config):
    some_param: str


@asset(partitions_def=StaticPartitionsDefinition(PARTITION_KEYS))
def do_something(context, config: DoSomethingConfig):
    return config


my_job = define_asset_job("my_job", [do_something], config=my_config)

defs = Definitions(assets=[do_something], jobs=[my_job])

Sean Quinlan

06/30/2023, 6:39 PM

Thanks for the response @sean. I feel like this isn't something I'd actually ever use. Seems like such a niche abstraction to centralize a mapping of all op configs to partition key. Personally I feel like the individual map of an op config to an op by partition key is useful and far more intuitive. Really appreciate the support though, cleared up my understanding.

sean

06/30/2023, 6:50 PM

That’s fair-- Dagster’s config system has been getting overhauled recently and could probably benefit from some love here. The “centralized” architecture of

PartitionedConfig

is probably due to history, as there was a time when all executions were initiated through a job, which required specifying the entire run config at once. The introduction of assets changed that and made more granular executions (like materialization of one asset) more of a first-class use case. cc @sandy, has the idea of an asset/op-scoped

PartitionedConfig

been considered?

Sean Quinlan

06/30/2023, 8:24 PM

Now I'm reading up on IOManagers and got to "Providing per-input config to an input manager" having deja vu. I think maybe the high level feedback is that it feels like there's 8 ways to do everything and I'm left just feeling overwhelmed with options + not knowing how my particular need matches with an option

14 Views

Open in Slack

Previous Next