Sean Quinlan
06/30/2023, 1:25 AMCONFIG_KEYS = ['a', 'b', ...]
@op
def load_config(config_key: str):
# use a resource to load the config by key
return {}
@asset(partitions_def=StaticPartitionsDefinition(CONFIG_KEYS))
def do_something(context):
config = load_config(context.partition_key);
# run the rest of the graph
pass
I wasn't entirely happy with this. I would like leverage the Dagster Config abstraction. PartitionedConfig sounded like just the ticket on paper, but I realize it's actually strangely much more verbose for the same end result:
@static_partitioned_config(partition_keys=CONFIG_KEYS)
def my_config(partition_key: str):
# use a resource to load the config by key
config = {}
return {
"ops": {
"load_config": {
"config": config
}
}
}
@op
def load_config(context, config: my_config):
return config
@job
def do_something(context):
config = load_config()
# run the rest of the graph
pass
Why do I have to map to ops in my_config? Worse, I can't actually figure out how to provide the schema for my config with this setup. It seems very vaguely covered in the documentation. I must be totally misunderstanding this feature.sean
06/30/2023, 3:45 PMPartitionedConfig
is not very well-documented.
I think the main point of confusion here is between run config and op/asset config.
Run config is the full set of configuration bindings for a run. It includes config for logging, execution, and more. It also includes the config for each op/asset under the top-level “ops” key. PartitionedConfig
is for generating run config from a partition key. That is why when you use PartitionedConfig
, you need to specify “ops”.
Worse, I can’t actually figure out how to provide the schema for my config with this setup.Here’s a minimal example of how to use
PartitionedConfig
to configure a simple job with a single asset:
from dagster import (
Config,
Definitions,
StaticPartitionsDefinition,
asset,
define_asset_job,
static_partitioned_config,
)
PARTITION_KEYS = ["a", "b"]
@static_partitioned_config(partition_keys=PARTITION_KEYS)
def my_config(partition_key: str):
return {"ops": {"do_something": {"config": {"some_param": f"hello {partition_key}"}}}}
class DoSomethingConfig(Config):
some_param: str
@asset(partitions_def=StaticPartitionsDefinition(PARTITION_KEYS))
def do_something(context, config: DoSomethingConfig):
return config
my_job = define_asset_job("my_job", [do_something], config=my_config)
defs = Definitions(assets=[do_something], jobs=[my_job])
Sean Quinlan
06/30/2023, 6:39 PMsean
06/30/2023, 6:50 PMPartitionedConfig
is probably due to history, as there was a time when all executions were initiated through a job, which required specifying the entire run config at once.
The introduction of assets changed that and made more granular executions (like materialization of one asset) more of a first-class use case.
cc @sandy, has the idea of an asset/op-scoped PartitionedConfig
been considered?Sean Quinlan
06/30/2023, 8:24 PM