Hey guys! whats the best way to integrate dagster ...
# ask-community
c
Hey guys! whats the best way to integrate dagster ConfigMappings with dagster.config_from_file? Essentially what i want is to use dagster.config_from_file to read in a simplified config, which then expands into the true config for the pipeline. However, because mappingconfig is a function, i'm not sure the best methodology to go about this. Both in the end feed into the decorator @job(config=config) in the examples
o
hi @Charles Leung! I wonder if you could just skip the explicit config mapping step entirely, and instead just write a regular function to take your simplified config and transform it into the true config. then you could have
@job(config=my_config_transforming_function(my_config_from_files))
basically just writing a ConfigMapping without the decorator
c
ah that's a good idea! part of my team likes how it implements a limited config set to the UI when we want to run ad-hoc runs on the pipeline. This seems only possible via config mapping
Is there a seamless way to combine the two?
o
hm so what is the config that you're getting from the file representing in this case? my understanding was that it was going to be baked-in to the job (and so not possible to change)
c
We're not sure how best to integrate so many things, since we eventually want our job to turn into a partitioned_config job😅 . The way i have setup in test right now is partitioned_config function calls dagster.config_from_files and overrides a key. Now just adding configmappings into that makes things complicated -
so in our DEV environment, we want to expose these variables so users can modify/execute in custom bigquery datasets to their liking. Also, want to provide an example date variable for user to enter in and test. Once we deploy w/ k8s, we'd like to read from file and use a partitioned job, but definitely want to keep the configs inline with how the UI / how they're being read in in DEV. Am i making sense 😅 ?
@Phil Armour 😉 looping you in phil
o
ah ok I think I basically get it -- I think what you can do is create one config mapping function that takes
(simple config) -> (full config)
, let's call it
base_config_mapping
. For your dev job, you can have
@job(config=base_config_mapping)
, which will leave it up to the user to supply values for this simplified schema. For your prod job, you can do
@job(config=base_config_mapping.config_fn(config_from_files))
. Once you transition this prod job to a partitioned job, you can change the job back to
@job(config=base_config_mapping)
, and have your partitioned config mapping function do what you're describing
c
@Serj Bilokhatniuk
Ah i see, so have two separate jobs based on the environment, one for UI interface of configs, and the second one for a fixed mapping - Thanks owen! LMK if i understood that correctly 🙂 this stuff is complicated
s
ah, the
Configmapping.config_fn()
was the secret sauce I was missing to connect static config with config mappings
👍 1
p
I threw together a gist. Personally I was imagining something like the “partial” mapping in the gist, where you expose a sub-set of the graph’s configs to be passed in at run-time, while the rest can be specified via a known/more statically configured set of values and therefore not required to be passed at run time. What I have here, should be accomplish my goal, but I was wondering if there’s a better/less onerous way pass in partial sets of values. From the CLI, it looks like it supports this concept:
Copy code
Example: dagster job execute -f hello_world.py -j pandas_hello_world -c pandas_hello_world/solids.yaml -c pandas_hello_world/env.yaml
but if I try my gist with
provided_configs = {"test1": "provided-1"}
I get a KeyError that I didn’t provide the second key to the configMapping - rather than it allowing me to proceed - and provide the missing key at invocation.