How can I make an optional op config schema field dagster #ask-community

Join Slack

How can I make an optional op config schema field?

# ask-community

Aaron T

05/19/2023, 5:51 PM

How can I make an optional op config schema field?

🤖 1

Zach

05/19/2023, 6:02 PM

You can use

is_required=False

or make it `Noneable`:

Copy code

from dagster import Field, Noneable, op, OpExecutionContext

@op(
  config_schema={"optional_field": Field(str, is_required=False,
                 "noneable_field": Field(Noneable(str), is_required=True, default_value=None)}
)
def something(context: OpExecutionContext):
    context.op_config["optional_field"] # will throw KeyError if optional_field not included in run config
    assert context.op_config["noneable_field"] == None

Aaron T

05/19/2023, 6:03 PM

Cool thanks! Didn't know I could do fields. Was just using standard typing and ended up settling on

Noneable

jamie

05/19/2023, 6:09 PM

if you’re using the newer Pythonic/Pydantic config system, you would do it like this

Copy code

from pydantic import Field 
from dagster import Config 
from typing import Optional

class MyConfig(Config):
     optional_field: Optional[str] = Field(default=None)

note that here

Field

is imported from

pydantic

not

dagster

D 1

Aaron T

05/19/2023, 6:10 PM

Cool. I have never used Pydantic. Is that the preferred way of making standard configs?

jamie

05/19/2023, 6:11 PM

Pydantic config was fully released in 1.3 and is now our recommended approach, but the

config_schema

dictionaries are still fully supported, so if that works better for your use case feel free to use it!

jamie

05/19/2023, 6:13 PM

What Zach wrote will totally work, just wanted to include the Pydantic method since there are two ways you could do this fwiw with Pydantic config you provide it to the asset like this

Copy code

from pydantic import Field 
from dagster import Config, asset
from typing import Optional

class MyConfig(Config):
     optional_field: Optional[str] = Field(default=None)

@asset
def my_asset(config: MyConfig):
     if config.optional_field is not None:
         ...

daggy love 2

❤️ 1

D 1

🌈 1

🤖 1

Aaron T

05/19/2023, 6:14 PM

Is that a key word arg? If we wanted the context too, does order matter?

jamie

05/19/2023, 6:57 PM

yeah to pipe the config through correctly, the parameter has to be named

config

and have the right type annotation. You can use the context too!

Copy code

@asset
def my_asset(context, config: MyConfig):
     if config.optional_field is not None:
         ...

would be totally fine I don’t think the order matters (worth verifying with a small sample asset though)

jamie

05/19/2023, 6:57 PM

relevant docs for ya https://docs.dagster.io/concepts/configuration/config-schema

👍 1

Aaron T

05/19/2023, 6:58 PM

Thanks! will take a look

Aaron T

05/23/2023, 6:54 PM

@jamie how would this look if I had this config for an op and wanted a job to pass through the config to the op?

jamie

05/23/2023, 6:59 PM

If you want to specify config at execution time, this is the relevant docs section https://docs.dagster.io/concepts/configuration/config-schema#specifying-runtime-configuration. If you want to hard-code some configuration you can do something like this

Copy code

from pydantic import Field
from dagster import Config, job, op, RunConfig
from typing import Optional

class MyConfig(Config):
     optional_field: Optional[str] = Field(default=None)

@op
def my_op(config: MyConfig):
     if config.optional_field is not None:
          ...

@job(
    config=RunConfig(ops={"op_name": MyConfig(optional_field="foo")})
)
def my_job():
     my_op()

Aaron T

05/23/2023, 7:03 PM

If I don't know the run config values and want to use dagit to execute, do I need to specify the config, or just pass through the config in the launchpad?

jamie

05/23/2023, 7:04 PM

you can just specify the config in the launchpad https://docs.dagster.io/concepts/configuration/config-schema#dagit

Aaron T

05/23/2023, 7:05 PM

got it, thanks. I wasn't sure if I also had to specify in the code

Aaron T

05/23/2023, 7:25 PM

I am trying to materialize an asset from an op in a job. That asset has a config associated to it, but when I try reloading my code location I get a TypeError

asset_1() missing 1 required positional argument: 'config'

. I am using the

materialize()

function in my op

jamie

05/24/2023, 2:16 PM

materialize

should generally just be used for testing assets (ie in unit tests or integration tests) and not called from within ops. What is the larger goal of materializing the asset within the op? there is likely another way to accomplish the goal

Aaron T

05/24/2023, 3:36 PM

We currently use SOLR as a cache that we write to from Dagster. In order to update the schema of this I originally created a job with ops to complete the schema updates. This leaves residual objects in our SOLR cache, that can be used as a backup. Eventually, after an independent data review, I want to cleanup those residual objects. I thought Dynamic Partitions would be good for this, because I can cleanup by partition_key with a downstream asset or job, while keeping those update objects separated. The way I create the partition_key is to use a job

run_config

that will make some api calls and get the partition key, and then materialize the asset with that partition_key

jamie

05/24/2023, 3:40 PM

ok, i think i see. My recommendation in this case would be to have the job that makes the partition key, and then maybe a run_status_sensor that materializes the assets when that job completes. It’s a bit of indirection, but i think that should accomplish what you want

Aaron T

05/24/2023, 3:53 PM

How would the sensor get the partition key from job output?

Aaron T

05/24/2023, 4:02 PM

Now I am just thinking have the job add a dynamic partition key, then the user can just materialize that partition key manually. But that is kind of what I am doing now, just calling

materialize

instead of someone manually using the UI to materialize a partition. It just creates an ephemeral run, not sure of the implications that has, but I think an additional thread is held because the Op that calls materialize stays as running while the asset is materializing. That is the only real downside. I could change it to async, but maybe for one-off uses this is ok

Aaron T

05/24/2023, 4:04 PM

the other thing I tried using was the graphql client, but I didn't see how to add a partition_key to a graphapi submit_job_execution

9 Views

Open in Slack

Previous Next