Dear Dagster Community pray I have a question regarding the dagster #ask-community

Dear Dagster Community :pray: I have a question r...

Vilette Braun

02/20/2024, 4:32 PM

Dear Dagster Community 🙏 I have a question regarding the asset runtime configuration concept (described here: https://docs.dagster.io/concepts/configuration/config-schema#specifying-runtime-configuration) I don't understand the intended best practice for how I can pass in configuration depending on different environment (e.g. local while using dagster dev, and "in production" within containers in a pod on K8s). I can successfully configure an asset using Launchpad in the UI, but I don't understand how to supply such configuration "from code" (or from a config.yaml) in a running deployment of Dagster. Let's say I have a very basic asset

Copy code

class AssetConfig(Config):
    mlflow_endpoint: str

@asset
def my_model_asset(config: AssetConfig) -> None:
    # Use config.mlflow_endpoint in some way, e.g. to track an ML experiment

And I'd have a very simple configuration

Copy code

# config.prod.yaml

ops:
  model:
    config:
      mlflow_endpoint: "http://..."

How would I supply this configuration to a production deployment of Dagster, while at the same time have a different configuration (e.g.

config.dev.yaml

) for a local defelopment environment (dagster dev), and maybe yet another configuration for a CI/CD pipeline? To me, none of the 3 options in https://docs.dagster.io/concepts/configuration/config-schema#specifying-runtime-configuration seem to apply to this situation, while I assume that the use case I've described is a common scenario? I might be completely thinking in the wrong direction, thank you for any best practices you could share for how to properly do this. All the best

dagster yay 1

Zach

02/20/2024, 6:13 PM

One option could be to use an environment variable that gets injected into the container you're running in at build time to switch between configs:

Copy code

deployment = os.getenv("DEPLOYMENT")
if deployment == "dev":
  with open("config.dev.yaml", "r") as conf_in
    config = conf_in.read()
elif deployment == "prod":
  with open("config.prod.yaml", "r") as conf_in
    config = conf_in.read()

job = define_asset_job(AssetSelection.assets(my_model_asset), config=config)

If you're using Dagster Cloud, they inject env vars that indicate the deployment the code is executing on which you could also use for deciding how to load deployment-specific configurations

Vilette Braun

02/20/2024, 7:48 PM

Hi @Zach, thank you very much for taking the time to answer and help! dagster yay Ah, that makes sense! As I didn't find an example of this scenario in the official Dagster documentation, I didn't think of this / wasn't sure that config reading/validation logic is expected to be implemented by the user - I may have expected Dagster to provide "API support" for this scenario out of the box 🙂 You introduced

job

into the picture, is there a specific necessity for this in this scenario, or would my original example make sense combined with your suggestion as well? I.e. something like this:

Copy code

class AssetConfig(Config):
    mlflow_endpoint: str
 
deployment = os.getenv("DEPLOYMENT")
if deployment == "dev":
  with open("config.dev.yaml", "r") as conf_in:
    config = conf_in.read()
elif deployment == "prod":
  with open("config.prod.yaml", "r") as conf_in:
    config = conf_in.read()
 
@asset
def my_model_asset(config: AssetConfig = config) -> None:
    # Use config.mlflow_endpoint

All the best

Zach

02/20/2024, 8:11 PM

I just pretty much always use asset jobs because there's a bunch of stuff that the base

@asset

definition doesn't seem to provide, like binding static config to the asset. If you don't want to make it into a job then I'm not really sure how to bind static configuration to the asset. You might be able to do it like you suggest, but Dagster might complain that it's missing config as I don't know if it's able to resolve defaults for config parameters. If you don't plan on changing the config in the launchpad you could just do

Copy code

class AssetConfig(Config):
    mlflow_endpoint: str
 
deployment = os.getenv("DEPLOYMENT")
if deployment == "dev":
  with open("config.dev.yaml", "r") as conf_in:
    config = conf_in.read()
elif deployment == "prod":
  with open("config.prod.yaml", "r") as conf_in:
    config = conf_in.read()

asset_config = AssetConfig(**config)

@asset
def my_model_asset() -> None:
    mlflow_endpoint = asset_config.mlflow_endpoint

Alexis Manin

02/21/2024, 7:15 AM

For this kind of use-cases, I tend to use resources instead. Resources are declared in Definitions object, and you can specify there that you want to make it use an environment variable by default (it is still editable via the launchpad when required). Example :

Copy code

class MyWebServiceAccess(ConfigurableResource):
    url: str = Field(description="access url to ...")


@asset
def my_asset(my_web_service: MyWebServiceAccess):
    request(my_web_service.url)


definitions = Definitions(
    assets = [my_asset],
    resources = { 
        "my_web_service": MyWebServiceAccess(url=EnvVar(MY_WEB_SERVICE_URL))
    }
)

Vilette Braun

02/21/2024, 8:44 AM

Hi @Alexis Manin and @Zach Thank you for your help! I haven't yet read into the concept of Resources, thanks for pointing them out, sounds like a good fit for the scenario I'm trying to solve. I'll experiment also with this approach 🙂 Many thanks and all the best

👍 1

Zach

02/21/2024, 4:37 PM

Yeah resources can be a nice way to handle this. You could do the same thing with configs too

Alexis Manin

02/21/2024, 4:40 PM

About that. I have not watched it yet, but a "Deep dive: Configuration and resources" video has been posted yesterday :

https://youtu.be/i6m7k16W-yg?si=OUKOmp_nL0WAqN1b▾

Open in Slack

Previous Next