https://dagster.io/ logo
Title
d

Davi

10/05/2022, 12:38 PM
Hello guys, With airflow, we can chain different tasks/actions without, mandatorily, having outputs from one inputing the others. On Dagster however, all examples on the doc show pipelines with
op
functions that mandatorily follow this input/output paradigm. Am I required to generate outputs and inputs in my functions to create a graph/job in Dagster? Currently I want to orchestrate functions that don't generate output. Thank you all !
m

Mat Brady

10/05/2022, 1:24 PM
Hi Davi, I’m new here and to Dagster (for a question of my own) so I’ll let them correct me, but saw your question on login and I’m pretty sure with ops you can just have a function that doesn’t return anything and use that as
ins
to another
op
or use the functional dependency chaining of
@job
definitions if it’s dependent e.g.
op3(op2(op1()))
Example of what I mean by functional chaining https://docs.dagster.io/tutorial/ops-jobs/connecting-ops#a-more-complex-dag I’m using an
@asset
pipeline and for that type there’s a specific `non_argument_deps`param to
@asset
which tells it to depend on another
@asset
without requiring that asset’s output if it generates any - as per what I’ve read
@asset
s are built on top of
@op
so I imagine this is the way `@op`s can be used. Example of
non_argument_deps
https://docs.dagster.io/tutorial/assets/non-argument-deps#an-unzipped-csv-of-cereal-ratings Hope this helps and is potentially correct :)
d

Davi

10/05/2022, 3:30 PM
@Mat Brady Thank you so much! Does it bother you to send here a snapcode of how you use the functional chaining on your project ? Thank you so much !
m

Mat Brady

10/05/2022, 3:41 PM
🤔 So you have something like this in code:
from dagster import asset, OpExecutionContext


@asset
def asset1():
    # do some stuff and don't return anything
    pass

@asset(non_argument_deps={'asset1'}, required_resource_keys={'api'})
def asset2(context: OpExecutionContext):
    # do some stuff and don't return anything
    pass

@asset(non_argument_deps={'asset2'}, required_resource_keys={'api'})
def asset3(context: OpExecutionContext):
    # do some stuff and don't return anything
    pass
but in the above case for you you’re seeing that say,
asset2
will run in parallel with
asset1
?
I’m seeing a cascade - serial execution - when I use something like the above.
s

sandy

10/05/2022, 3:54 PM
if you're using ops, not assets, here's how to define dependencies without passing data: https://docs.dagster.io/concepts/ops-jobs-graphs/graphs#defining-nothing-dependencies
m

Mat Brady

10/05/2022, 4:42 PM
Ah! I was so close and yet so far 😄
d

Davi

10/06/2022, 9:26 AM
@Mat Brady To avoid chaining to many parentheses on the job, you can do it like this as well(I prefer):
@op
def op_1():
    # Do something
    return Nothing


@op(ins={"depends": In(Nothing)})
def op_2():
    # Do something
    return Nothing


@op(ins={"depends": In(Nothing)})
def op_3():
    # Do something

@job
def job():
    dependency_link = op_1()
    dependency_link = op_2(dependency_link)
    op_3(dependency_link)