https://dagster.io/ logo
s

Steve Pletcher

03/11/2021, 2:34 PM
on a similar note, is there an established pattern for configuring resources that may have different settings to configure depending on what mode you're using?
s

schrockn

03/11/2021, 2:44 PM
Most common pattern is to have two different resources. One for each mode but they implement the same interface
s

Steve Pletcher

03/11/2021, 2:46 PM
we're following that pattern at the moment, but the distinct resources require different config settings, e.g. an external api resource requires an api url whereas the "mock" local filesystem version of that resource requires a working directory
hence the issue i described
s

schrockn

03/11/2021, 2:49 PM
Yeah there should be two different
@resource
declarations with two different config schemas
so for example, here’s how you would vary config based on mode:
Copy code
from dagster import pipeline, solid, resource, ModeDefinition, ResourceDefinition, execute_pipeline


@solid(required_resource_keys={"a"})
def print_me(context):
    <http://context.log.info|context.log.info>(str(context.resources.a.value))


class ContainsSomething:
    def __init__(self, value):
        self.value = value


@resource({"value": int})
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])


@resource({"string_value": str})
def contains_string(context):
    return ContainsSomething(context.resource_config["string_value"])


mock_mode = ModeDefinition(
    name="mock", resource_defs={"a": contains_int, "b": ResourceDefinition.none_resource()}
)

real_mode = ModeDefinition(
    name="real", resource_defs={"a": contains_string, "b": ResourceDefinition.none_resource()}
)


@pipeline(mode_defs=[mock_mode, real_mode])
def i_need_resources_pipeline():
    print_me()


if __name__ == "__main__":
    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"value": 1}}}},
        mode="mock",
    )

    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"string_value": "foo"}}}},
        mode="real",
    )
i included the “none” resource because it is an example of a way to make it so that resource doesn’t exist at all in a particular mode
i just whipped that up so let me know what isn’t clear
s

Steve Pletcher

03/11/2021, 3:20 PM
right, i understand that pattern. i think i might not be explaining my request clearly - we've already set up resources with multiple different sets of config parameters. we are now running into cases where we're invoking a pipeline in a context that could be run in multiple different modes, which means we'd need to provide different attributes to the resource in the run config depending on the mode. does dagster have tooling to support this, or will we have to handle building the distinct resource configurations (as part of a pipeline execution request) ourselves?
s

schrockn

03/11/2021, 3:22 PM
Yeah I think we might not be understanding each other. Do you have a code sample?
s

Steve Pletcher

03/11/2021, 3:37 PM
here's a quick toy example:
Copy code
def generate_run_request(self) -> RunRequest:
    return RunRequest(
        run_key='whatever',
        run_config={
            "solids": {
                ...
            },
            "resources": {
                "some_api": {
                    "config": {
                        # the prod version of this resource requires the 'api_url' setting,
                        # but the local/test version only requires the 'working_directory' setting
                        "api_url": '',
                    }
                }
            }
        }
    )
does dagster have a more elegant way than just a set of if/else statements to provide different options to a resource depending on what mode you're trying to use for a pipeline?
(we can build a less brittle solution ourselves if need be, i'm just making sure that dagster doesn't have any tooling for this sort of thing)
s

schrockn

03/11/2021, 3:42 PM
The way to support this in dagster would to have two
@resource
declarations
How elegant that ends up being is kind of dependent on the underlying representation of the underlying object
otherwise
you can just totally opt out of schema validation
e.g.
Copy code
@resource(dict)
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])
s

Steve Pletcher

03/11/2021, 3:45 PM
yeah, my question is just a matter of elegance. we've already got the two resource declarations set up, i'm just thinking through how to write code that passes the correct parameters to the resource.
but it sounds like this is something we'll have to handle ourselves. thanks for the feedback.
s

schrockn

03/11/2021, 3:51 PM
do you have a sense of how you would want it to look in an ideal world?
s

Steve Pletcher

03/11/2021, 3:57 PM
hmm. the majority of this issue is dealing with boilerplate like i mentioned in my earlier post - dagster's schema validation demanding configuration for resources in a mode that aren't referenced in a pipeline is tedious. the other half is that a lot of the resource config my team has done is static at the environment level. expanding default configuration through presets (and similar tooling) would prevent this from being an issue for most of our resources, including making presets available more broadly (e.g. in sensors/run requests)
"default" config blocks are easy enough to set up manually, of course, but having them as actual dagster objects would be valuable for encouraging best practices (and having them manageable in dagit would massively improve our feedback cycle when tweaking configuration)
s

schrockn

03/11/2021, 4:00 PM
so 1) you can always drop schema validation. it’s purely opt-in. 2) There is also
configured
for currying in configuration which does not vary at runtime. 3) And if you aren’t using a resource in the mode and you just change the definition of the resource in that modeo to require no configuration
s

Steve Pletcher

03/11/2021, 4:02 PM
the schema validation is definitely valuable to us, though, so i'd prefer not to drop it just over some friction like this. i'll definitely look into
configured
. i'm not sure i understand your third point, though. are we misusing modes (i.e. should modes be defined at a per-pipeline level instead of just one mode per environment, with every pipeline using the same set of modes?)
s

schrockn

03/11/2021, 4:02 PM
ahhhhhh
s

Steve Pletcher

03/11/2021, 4:02 PM
...now that i say that out loud, that pattern makes much more sense
s

schrockn

03/11/2021, 4:02 PM
there we go
yes modes will generally be per-pipeline
there are cases where reusing them make sense
some folks have pipelines which has very similar structures (maybe they only have one node that is different) and in that case cross-pipeline mode reuse makes sense
but that doesn’t sound like it is the case here
s

Steve Pletcher

03/11/2021, 4:05 PM
yeah, that makes sense to me. thanks for helping me clear that up. i'm a bit embarrassed.
s

schrockn

03/11/2021, 4:05 PM
please don’t be
we are still figuring out the right way to message how these concepts inter-relate and it’s not always obvious
@Steve Pletcher this has actually started an internal discussion thread because we’re not satisfied with the answers we provided here
the feedback is invaluable so keep it coming
🙏 1
@Steve Pletcher just to follow up here. If the behavior was that a mode only boots up (and requires config from) the resources required by all the solids in a particular pipeline, that would solve your problem nicely, correct?
s

Steve Pletcher

03/11/2021, 4:34 PM
that's exactly it, yes
👍 1
s

schrockn

03/11/2021, 4:34 PM
cool. no promises but stay tuned 🙂