Hey guys! whats the best way to integrate dagster ...
Hey guys! whats the best way to integrate dagster ConfigMappings with dagster.config_from_file? Essentially what i want is to use dagster.config_from_file to read in a simplified config, which then expands into the true config for the pipeline. However, because mappingconfig is a function, i'm not sure the best methodology to go about this. Both in the end feed into the decorator @job(config=config) in the examples
hi @Charles Leung! I wonder if you could just skip the explicit config mapping step entirely, and instead just write a regular function to take your simplified config and transform it into the true config. then you could have
basically just writing a ConfigMapping without the decorator
ah that's a good idea! part of my team likes how it implements a limited config set to the UI when we want to run ad-hoc runs on the pipeline. This seems only possible via config mapping
Is there a seamless way to combine the two?
hm so what is the config that you're getting from the file representing in this case? my understanding was that it was going to be baked-in to the job (and so not possible to change)
We're not sure how best to integrate so many things, since we eventually want our job to turn into a partitioned_config job😅 . The way i have setup in test right now is partitioned_config function calls dagster.config_from_files and overrides a key. Now just adding configmappings into that makes things complicated -
so in our DEV environment, we want to expose these variables so users can modify/execute in custom bigquery datasets to their liking. Also, want to provide an example date variable for user to enter in and test. Once we deploy w/ k8s, we'd like to read from file and use a partitioned job, but definitely want to keep the configs inline with how the UI / how they're being read in in DEV. Am i making sense 😅 ?
@Phil Armour 😉 looping you in phil
ah ok I think I basically get it -- I think what you can do is create one config mapping function that takes
(simple config) -> (full config)
, let's call it
. For your dev job, you can have
, which will leave it up to the user to supply values for this simplified schema. For your prod job, you can do
. Once you transition this prod job to a partitioned job, you can change the job back to
, and have your partitioned config mapping function do what you're describing
@Serj Bilokhatniuk
Ah i see, so have two separate jobs based on the environment, one for UI interface of configs, and the second one for a fixed mapping - Thanks owen! LMK if i understood that correctly 🙂 this stuff is complicated
ah, the
was the secret sauce I was missing to connect static config with config mappings
I threw together a gist. Personally I was imagining something like the “partial” mapping in the gist, where you expose a sub-set of the graph’s configs to be passed in at run-time, while the rest can be specified via a known/more statically configured set of values and therefore not required to be passed at run time. What I have here, should be accomplish my goal, but I was wondering if there’s a better/less onerous way pass in partial sets of values. From the CLI, it looks like it supports this concept:
Example: dagster job execute -f hello_world.py -j pandas_hello_world -c pandas_hello_world/solids.yaml -c pandas_hello_world/env.yaml
but if I try my gist with
provided_configs = {"test1": "provided-1"}
I get a KeyError that I didn’t provide the second key to the configMapping - rather than it allowing me to proceed - and provide the missing key at invocation.