on a similar note, is there an established pattern...
# announcements
s
on a similar note, is there an established pattern for configuring resources that may have different settings to configure depending on what mode you're using?
s
Most common pattern is to have two different resources. One for each mode but they implement the same interface
s
we're following that pattern at the moment, but the distinct resources require different config settings, e.g. an external api resource requires an api url whereas the "mock" local filesystem version of that resource requires a working directory
hence the issue i described
s
Yeah there should be two different
@resource
declarations with two different config schemas
so for example, here’s how you would vary config based on mode:
Copy code
from dagster import pipeline, solid, resource, ModeDefinition, ResourceDefinition, execute_pipeline


@solid(required_resource_keys={"a"})
def print_me(context):
    <http://context.log.info|context.log.info>(str(context.resources.a.value))


class ContainsSomething:
    def __init__(self, value):
        self.value = value


@resource({"value": int})
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])


@resource({"string_value": str})
def contains_string(context):
    return ContainsSomething(context.resource_config["string_value"])


mock_mode = ModeDefinition(
    name="mock", resource_defs={"a": contains_int, "b": ResourceDefinition.none_resource()}
)

real_mode = ModeDefinition(
    name="real", resource_defs={"a": contains_string, "b": ResourceDefinition.none_resource()}
)


@pipeline(mode_defs=[mock_mode, real_mode])
def i_need_resources_pipeline():
    print_me()


if __name__ == "__main__":
    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"value": 1}}}},
        mode="mock",
    )

    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"string_value": "foo"}}}},
        mode="real",
    )
i included the “none” resource because it is an example of a way to make it so that resource doesn’t exist at all in a particular mode
i just whipped that up so let me know what isn’t clear
s
right, i understand that pattern. i think i might not be explaining my request clearly - we've already set up resources with multiple different sets of config parameters. we are now running into cases where we're invoking a pipeline in a context that could be run in multiple different modes, which means we'd need to provide different attributes to the resource in the run config depending on the mode. does dagster have tooling to support this, or will we have to handle building the distinct resource configurations (as part of a pipeline execution request) ourselves?
s
Yeah I think we might not be understanding each other. Do you have a code sample?
s
here's a quick toy example:
Copy code
def generate_run_request(self) -> RunRequest:
    return RunRequest(
        run_key='whatever',
        run_config={
            "solids": {
                ...
            },
            "resources": {
                "some_api": {
                    "config": {
                        # the prod version of this resource requires the 'api_url' setting,
                        # but the local/test version only requires the 'working_directory' setting
                        "api_url": '',
                    }
                }
            }
        }
    )
does dagster have a more elegant way than just a set of if/else statements to provide different options to a resource depending on what mode you're trying to use for a pipeline?
(we can build a less brittle solution ourselves if need be, i'm just making sure that dagster doesn't have any tooling for this sort of thing)
s
The way to support this in dagster would to have two
@resource
declarations
How elegant that ends up being is kind of dependent on the underlying representation of the underlying object
otherwise
you can just totally opt out of schema validation
e.g.
Copy code
@resource(dict)
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])
s
yeah, my question is just a matter of elegance. we've already got the two resource declarations set up, i'm just thinking through how to write code that passes the correct parameters to the resource.
but it sounds like this is something we'll have to handle ourselves. thanks for the feedback.
s
do you have a sense of how you would want it to look in an ideal world?
s
hmm. the majority of this issue is dealing with boilerplate like i mentioned in my earlier post - dagster's schema validation demanding configuration for resources in a mode that aren't referenced in a pipeline is tedious. the other half is that a lot of the resource config my team has done is static at the environment level. expanding default configuration through presets (and similar tooling) would prevent this from being an issue for most of our resources, including making presets available more broadly (e.g. in sensors/run requests)
"default" config blocks are easy enough to set up manually, of course, but having them as actual dagster objects would be valuable for encouraging best practices (and having them manageable in dagit would massively improve our feedback cycle when tweaking configuration)
s
so 1) you can always drop schema validation. it’s purely opt-in. 2) There is also
configured
for currying in configuration which does not vary at runtime. 3) And if you aren’t using a resource in the mode and you just change the definition of the resource in that modeo to require no configuration
s
the schema validation is definitely valuable to us, though, so i'd prefer not to drop it just over some friction like this. i'll definitely look into
configured
. i'm not sure i understand your third point, though. are we misusing modes (i.e. should modes be defined at a per-pipeline level instead of just one mode per environment, with every pipeline using the same set of modes?)
s
ahhhhhh
s
...now that i say that out loud, that pattern makes much more sense
s
there we go
yes modes will generally be per-pipeline
there are cases where reusing them make sense
some folks have pipelines which has very similar structures (maybe they only have one node that is different) and in that case cross-pipeline mode reuse makes sense
but that doesn’t sound like it is the case here
s
yeah, that makes sense to me. thanks for helping me clear that up. i'm a bit embarrassed.
s
please don’t be
we are still figuring out the right way to message how these concepts inter-relate and it’s not always obvious
@Steve Pletcher this has actually started an internal discussion thread because we’re not satisfied with the answers we provided here
the feedback is invaluable so keep it coming
🙏 1
@Steve Pletcher just to follow up here. If the behavior was that a mode only boots up (and requires config from) the resources required by all the solids in a particular pipeline, that would solve your problem nicely, correct?
s
that's exactly it, yes
👍 1
s
cool. no promises but stay tuned 🙂