on a similar note is there an established pattern for config dagster #announcements

on a similar note, is there an established pattern...

Steve Pletcher

03/11/2021, 2:34 PM

on a similar note, is there an established pattern for configuring resources that may have different settings to configure depending on what mode you're using?

schrockn

03/11/2021, 2:44 PM

Most common pattern is to have two different resources. One for each mode but they implement the same interface

Steve Pletcher

03/11/2021, 2:46 PM

we're following that pattern at the moment, but the distinct resources require different config settings, e.g. an external api resource requires an api url whereas the "mock" local filesystem version of that resource requires a working directory

Steve Pletcher

03/11/2021, 2:46 PM

hence the issue i described

schrockn

03/11/2021, 2:49 PM

Yeah there should be two different

@resource

declarations with two different config schemas

schrockn

03/11/2021, 2:57 PM

so for example, here’s how you would vary config based on mode:

Copy code

from dagster import pipeline, solid, resource, ModeDefinition, ResourceDefinition, execute_pipeline


@solid(required_resource_keys={"a"})
def print_me(context):
    <http://context.log.info|context.log.info>(str(context.resources.a.value))


class ContainsSomething:
    def __init__(self, value):
        self.value = value


@resource({"value": int})
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])


@resource({"string_value": str})
def contains_string(context):
    return ContainsSomething(context.resource_config["string_value"])


mock_mode = ModeDefinition(
    name="mock", resource_defs={"a": contains_int, "b": ResourceDefinition.none_resource()}
)

real_mode = ModeDefinition(
    name="real", resource_defs={"a": contains_string, "b": ResourceDefinition.none_resource()}
)


@pipeline(mode_defs=[mock_mode, real_mode])
def i_need_resources_pipeline():
    print_me()


if __name__ == "__main__":
    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"value": 1}}}},
        mode="mock",
    )

    execute_pipeline(
        i_need_resources_pipeline,
        run_config={"resources": {"a": {"config": {"string_value": "foo"}}}},
        mode="real",
    )

schrockn

03/11/2021, 2:58 PM

i included the “none” resource because it is an example of a way to make it so that resource doesn’t exist at all in a particular mode

schrockn

03/11/2021, 3:06 PM

i just whipped that up so let me know what isn’t clear

Steve Pletcher

03/11/2021, 3:20 PM

right, i understand that pattern. i think i might not be explaining my request clearly - we've already set up resources with multiple different sets of config parameters. we are now running into cases where we're invoking a pipeline in a context that could be run in multiple different modes, which means we'd need to provide different attributes to the resource in the run config depending on the mode. does dagster have tooling to support this, or will we have to handle building the distinct resource configurations (as part of a pipeline execution request) ourselves?

schrockn

03/11/2021, 3:22 PM

Yeah I think we might not be understanding each other. Do you have a code sample?

Steve Pletcher

03/11/2021, 3:37 PM

here's a quick toy example:

Copy code

def generate_run_request(self) -> RunRequest:
    return RunRequest(
        run_key='whatever',
        run_config={
            "solids": {
                ...
            },
            "resources": {
                "some_api": {
                    "config": {
                        # the prod version of this resource requires the 'api_url' setting,
                        # but the local/test version only requires the 'working_directory' setting
                        "api_url": '',
                    }
                }
            }
        }
    )

Steve Pletcher

03/11/2021, 3:38 PM

does dagster have a more elegant way than just a set of if/else statements to provide different options to a resource depending on what mode you're trying to use for a pipeline?

Steve Pletcher

03/11/2021, 3:41 PM

(we can build a less brittle solution ourselves if need be, i'm just making sure that dagster doesn't have any tooling for this sort of thing)

schrockn

03/11/2021, 3:42 PM

The way to support this in dagster would to have two

@resource

declarations

schrockn

03/11/2021, 3:43 PM

How elegant that ends up being is kind of dependent on the underlying representation of the underlying object

schrockn

03/11/2021, 3:43 PM

otherwise

schrockn

03/11/2021, 3:43 PM

you can just totally opt out of schema validation

schrockn

03/11/2021, 3:44 PM

e.g.

Copy code

@resource(dict)
def contains_int(context):
    return ContainsSomething(context.resource_config["value"])

Steve Pletcher

03/11/2021, 3:45 PM

yeah, my question is just a matter of elegance. we've already got the two resource declarations set up, i'm just thinking through how to write code that passes the correct parameters to the resource.

Steve Pletcher

03/11/2021, 3:46 PM

but it sounds like this is something we'll have to handle ourselves. thanks for the feedback.

schrockn

03/11/2021, 3:51 PM

do you have a sense of how you would want it to look in an ideal world?

Steve Pletcher

03/11/2021, 3:57 PM

hmm. the majority of this issue is dealing with boilerplate like i mentioned in my earlier post - dagster's schema validation demanding configuration for resources in a mode that aren't referenced in a pipeline is tedious. the other half is that a lot of the resource config my team has done is static at the environment level. expanding default configuration through presets (and similar tooling) would prevent this from being an issue for most of our resources, including making presets available more broadly (e.g. in sensors/run requests)

Steve Pletcher

03/11/2021, 4:00 PM

"default" config blocks are easy enough to set up manually, of course, but having them as actual dagster objects would be valuable for encouraging best practices (and having them manageable in dagit would massively improve our feedback cycle when tweaking configuration)

schrockn

03/11/2021, 4:00 PM

so 1) you can always drop schema validation. it’s purely opt-in. 2) There is also

configured

for currying in configuration which does not vary at runtime. 3) And if you aren’t using a resource in the mode and you just change the definition of the resource in that modeo to require no configuration

Steve Pletcher

03/11/2021, 4:02 PM

the schema validation is definitely valuable to us, though, so i'd prefer not to drop it just over some friction like this. i'll definitely look into

configured

. i'm not sure i understand your third point, though. are we misusing modes (i.e. should modes be defined at a per-pipeline level instead of just one mode per environment, with every pipeline using the same set of modes?)

schrockn

03/11/2021, 4:02 PM

ahhhhhh

Steve Pletcher

03/11/2021, 4:02 PM

...now that i say that out loud, that pattern makes much more sense

schrockn

03/11/2021, 4:02 PM

there we go

schrockn

03/11/2021, 4:03 PM

yes modes will generally be per-pipeline

schrockn

03/11/2021, 4:03 PM

there are cases where reusing them make sense

schrockn

03/11/2021, 4:04 PM

some folks have pipelines which has very similar structures (maybe they only have one node that is different) and in that case cross-pipeline mode reuse makes sense

schrockn

03/11/2021, 4:04 PM

but that doesn’t sound like it is the case here

Steve Pletcher

03/11/2021, 4:05 PM

yeah, that makes sense to me. thanks for helping me clear that up. i'm a bit embarrassed.

schrockn

03/11/2021, 4:05 PM

please don’t be

schrockn

03/11/2021, 4:05 PM

we are still figuring out the right way to message how these concepts inter-relate and it’s not always obvious

schrockn

03/11/2021, 4:12 PM

@Steve Pletcher this has actually started an internal discussion thread because we’re not satisfied with the answers we provided here

schrockn

03/11/2021, 4:12 PM

the feedback is invaluable so keep it coming

🙏 1

schrockn

03/11/2021, 4:28 PM

@Steve Pletcher just to follow up here. If the behavior was that a mode only boots up (and requires config from) the resources required by all the solids in a particular pipeline, that would solve your problem nicely, correct?

Steve Pletcher

03/11/2021, 4:34 PM

that's exactly it, yes

👍 1

schrockn

03/11/2021, 4:34 PM

cool. no promises but stay tuned 🙂

2 Views

Open in Slack

Previous Next