https://dagster.io/ logo
Title
c

Charles Lariviere

12/16/2021, 2:55 PM
Hey 👋 Am I correct in understanding that the Configured API can only be used on Ops only for fields defined in
config_schema
(and not
Ins
)? Since we can configure Op inputs in run configuration, I was expecting to also be able to do that with the Configured API, but I’m getting errors that make me think it isn’t the case.
I’m trying to wrap my head around what belongs as a
config_schema
and what belongs as an
In
. It seems like
In
gives the most flexibility (can pass the output from another Op, while config can’t) We have a database resource with methods such as
create_schema(name, replace, **options)
,
create_table(name, columns, **options)
, etc. for which I want to create matching Ops that can be reused in any job we write. In some jobs, I would want some arguments to be provided by other Op’s outputs, while in other jobs I would want them to be hard-coded within the job definition and not configurable. I understand this is what the Configured API is for — though again, it doesn’t seem like it can be used for
Ins
🤔
m

max

12/16/2021, 5:27 PM
@sandy @chris
c

chris

12/16/2021, 11:41 PM
Configured can actually work for you here if you use it at the graph layer:
@op
def my_op_with_in(x):
    ...

@graph(config=ConfigMapping(config_fn=lambda outer: outer, config_schema=Permissive())
def my_graph():
    my_op_with_in()

my_graph.configured({"ops": {"my_op_with_in": {"inputs": ...}}})
You are correct that the configured api can only be used for fields defined in the
config_schema
of an
op
. The idea there is that being able to configure inputs in advance becomes tricky with error messaging and control flow.
Ins
are usually used to represent some sort of upstream dagster dependency, while configuration represents some hyperparameter to the computation. Your use case makes a ton of sense though, and exposes some tricky ambiguities between the functionality of both.
c

Charles Lariviere

12/16/2021, 11:54 PM
Makes a ton of sense, thank you! 🙏
a

Arslan Aziz

07/21/2022, 5:33 PM
@chris Could you please expand on the example you provided for using configured with inputs for ops? I have a similar use case but when I try your approach I get the following error:
dagster.core.errors.DagsterInvalidDefinitionError: Only graphs utilizing config mapping can be pre-configured. The graph "image_percy_graph" does not have a config mapping, and thus has nothing to be configured.
My graph consists of ops that have
Ins
and I'm trying to preconfigure the
Ins
using the configured API on the graph.
c

chris

07/21/2022, 5:36 PM
😅 didn't include this in the example, but in order to use configured like this on the graph, you need to set a config mapping on the graph. Will update with that
🙌 1
Alternatively, you could just set the config directly on the graph:
@op
def my_op_with_in(x):
    ...

@graph(config={"ops": {"my_op_with_in": {"inputs": ...}}})
def my_graph():
    my_op_with_in()
a

Arslan Aziz

07/21/2022, 6:13 PM
Thanks - this worked perfectly!
@chris I'm running into a
DagsterInvalidConfigError
when trying to launch the job from the Dagit UI. The error string is
Error 1: Received unexpected config entries "['ops', 'resources']" at the root. Expected:
where the
Expected
is a list of all the ops which I set the
inputs
for in the graph config.
c

chris

07/21/2022, 6:44 PM
Mind sharing a code snippet?
Or is this what's happening with the code that I sent?
a

Arslan Aziz

07/21/2022, 6:45 PM
This is an error I'm getting using the snippet you shared setting the config directly on the graph
c

chris

07/21/2022, 6:45 PM
gotcha - let me investigate
a

Arslan Aziz

07/21/2022, 6:47 PM
For more context, under
ops
of the same config, I've also included ops that take a
config_schema
. Example in the config:
{"ops": {"my_op_with_config": {"config": {...}}}
c

chris

07/21/2022, 6:48 PM
in that case, sending a code snippet with your use case might help me get to the issue faster
a

Arslan Aziz

07/21/2022, 6:53 PM
Please see the attached file for my pipeline
c

chris

07/21/2022, 6:56 PM
you're specifying job config for the graph. The config blob that you pass to the graph contains resources and ops at the top level in addition to inputs; so the config machinery doesn't know how to interpret it
resource config should only be specified at the job level
a

Arslan Aziz

07/21/2022, 7:02 PM
Thanks for looking into it. I've moved the resource config to the job level config (as the
config
argument of
to_job
); however, I'm still receiving the same error when I try to run the pipeline. Thanks for looking into it. I've moved the resource config to the job level config (as the
c

chris

07/21/2022, 7:04 PM
The other piece is that you need to set {"value": some_value} as the input value, not just the input itself
as in:
"inputs": {
                "disk_device_name": {"value": DISK_DEV}},
a

Arslan Aziz

07/21/2022, 7:06 PM
Ah ok - let me go back and update that
I'm still receiving the same error. To simplify it a bit I removed the ops that had a config_schema and removed the job-level config for the resource.
c

chris

07/21/2022, 7:18 PM
can you send the updated code?
a

Arslan Aziz

07/21/2022, 7:18 PM
It seems to expect all of the ops that I had pre-configured in the graph config to be supplied as top-level config in the Dagit UI when I try to launch a run
Sorry - I edited the wrong file 🤦‍♂️ Let me try again.
I'm receiving a different error this time. Here is the update code.
Here is the new error I am getting:
For context, I've been running this on Dagster version 0.13.19 but I get the same issue when I try it on latest.