https://dagster.io/ logo
Title
t

Tobias Macey

01/20/2022, 6:54 PM
Hey folks. I'm working through the migration to jobs/ops/graphs and one of the last pieces is to move my usage of
PresetDefinition
. The examples show how to handle that in the case that you have one being passed to
preset_defs
in the
@pipeline
decorator, but not a good example of handling a list of them. It seems that the answer is that I have to create multiple instances of the job to be able to pass it into the schedule with the appropriate config?
The use case is that I have one pipeline that gets used for 3 different installations of an application, each of which needs to pass its configuration from a corresponding yaml file.
s

sandy

01/20/2022, 7:01 PM
Hi Tobias - our recommendation would be to create a job for each of those presets. Does that cause trouble for you?
t

Tobias Macey

01/20/2022, 7:18 PM
No, not a problem, just seems a bit clunky. I'll give it a go and send my feedback once it's functional. Thanks!
a

alex

01/20/2022, 7:49 PM
The hope is that you can bucket the
job
s in to separate repositories and only load the relevant repository per installation
t

Tobias Macey

01/20/2022, 9:08 PM
Roger that
p

paul.q

01/20/2022, 11:27 PM
We have a similar requirement but we found that having different repos and loading the appropriate one for the target environment introduces other problems, e.g. where we have
GraphQL
requests that need to insert the repository location and name. Our approach was to have the a config module that relies on the presence of a special environment variable (e.g.
dagster_env
). At run time we can use this to determine our environment. We use YAML files like dev.yaml, prod.yaml, etc to provide the run_config, then we use something like this to ingest the appropriate YAML:
import pkg_resources
import yaml
def get_run_config(env):
    yaml_file_name = f"{env}.yaml"
    package_string = <your folder>
    yaml_content = pkg_resources.resource_string(package_string,                                                 resource_name=yaml_file_name).decode("utf-8")
    config = yaml.safe_load(yaml_content)
    return config
In this way we can keep a consistently named repo whose name reflects the purpose of the jobs within.
t

Tobias Macey

01/21/2022, 12:28 PM
Unfortunately the situation in my case is that I need to run these different pipeline configurations all in the same deployment environment. It is pulling from multiple installations of an application that we need to keep distinct from each other from a data collection standpoint.
I was able to work through the syntax changes and get things loading again. https://github.com/mitodl/ol-data-pipelines/pull/189