Hi all maybe it is just basic question but I could not get m dagster #ask-community

Hi all, maybe it is just basic question, but I cou...

Tung Dang

05/12/2022, 10:19 AM

Hi all, maybe it is just basic question, but I could not get my code working. I would like to have two ops that run shell command in order. I haven’t found a way to inject ins={“start”: In(Nothing)} directly to to create_shell_command_op directly, so I need to wrap them inside a factory. But in the inner op method, I could not get the command triggered because there is no context for invoking it. It is related to a previously posted question. Could some one help me?

Tung Dang

05/12/2022, 2:25 PM

Ok, found no way to reuse create_shell_command_op in the inner op. My workaround is to reuse the dagster_shell.utils.execute() method to have the execution done nicely as using the normal shell_op.

yuhan

05/12/2022, 6:30 PM

Sorry that I missed your ping. Yea I believe at this point, making a factory doesn’t help reduce the boilerplate so I think your workaround is what I’d recommend.

Tung Dang

05/16/2022, 2:12 PM

@yuhan Maybe could you please give me a hint, I am trying to grasp the concept of dependency in dagster. As long as an op consumes output of another one, so there is a dependency relation. But for most of built-in ops (e.g. create_shell_command_op, dataproc_op, etc… ), they don’t have this input definition. So user must either extend them or implement the logic from scratch. Is it correct or have I missed something? Thank you so much.

yuhan

05/17/2022, 5:34 PM

Hi Tung! underneath

create_shell_command_op

it’s constructing an op with a Nothing input definition (source code) - this was built under the hypothesis that most of the shell command ops would not have upstream data dependency connected to them. You can read about Nothing type here and here’s an example with code snippet: https://docs.dagster.io/concepts/ops-jobs-graphs/jobs-graphs#order-based-dependencies-nothing-dependencies

Tung Dang

05/18/2022, 8:48 AM

Hi @yuhan, ok thanks for pointing out the start input for shell_op but how about other ops? Do you think is it a good Idea to provide a default “after” input for every op, that could be used for such situation? I think that could be done pretty easily by changing this line to: https://github.com/dagster-io/dagster/blob/2d219f4a405fba56bc3466f1165e5403c760e02b/python_modules/dagster/dagster/core/definitions/op_definition.py#L71

{"after": Nothing}.update({input_def.name: In.from_definition(input_def) for input_def in self.input_defs})

Tung Dang

05/18/2022, 8:54 AM

@sandy, @owen: I would like to hear your opinions about the default “after” input as well. The fact is, currently chaining op by order of execution in dagster is quite difficult (wrapping in group, another op def, etc.). With after, the entrypoint to new dagster user like me is much lowered. Sorry if I am not aware of any discussion about this before.

sandy

05/18/2022, 4:06 PM

We've discussed this in the past and I recall they may have been an edge case where this breaks compatibility. @alex - do you recall off the top of your head?

alex

05/18/2022, 4:24 PM

The one thing I remember was breaking existing inputs on name collision. In the proposed change above,

after

is overridden by any users inputs with the same name which is one approach to solve for that. historical discussion around this on https://github.com/dagster-io/dagster/issues/4274

Tung Dang

05/19/2022, 10:22 AM

@sandy, @yuhan Based on the discussion in #4274 and the feature request in https://github.com/dagster-io/dagster/issues/7863 I think it is reasonable to add this ability to dagster op. IMHO it is the most intuitive way to define the op dependency.

4 Views

Open in Slack

Previous Next