Hi all I have a question that I ve been trying to get answer dagster #ask-community

Hi, all. I have a question that I've been trying t...

Michael Hood

07/13/2023, 2:16 PM

Hi, all. I have a question that I've been trying to get answered on my own for the last couple of days. I created this example to better illustrate the situation I'm thinking about. I have a very simple job that produces some Assets (

col1

col2

my_df

) and then the Op

log_sums

logs the results of a simple transformation. My question is if there is a way to not have to restate in the job definition that

my_df

takes

col1

and

col2

as upstream dependencies before I can pass the result of

my_df

into

log_sums

? It seems to me that I have already specified the dependencies between the assets. In this case, it is not that big of a deal since we are only talking about a small number of assets, but this could be rather tedious to do with a much larger DAG of assets. I figure there might be a way to do something succinct like:

Copy code

@job
def log_sums_job():
    df = do_something_to_materialize_result(my_df)
    log_sums(df)

Anyways, this might be just be a conceptual misunderstanding on my part. But I appreciate any suggestions or pointers.

Untitled.py

Zach

07/13/2023, 3:21 PM

Dependency inference does indeed work the way you think it should, but you're not supposed to mix assets into @job definitions. @job definitions are for defining op-based DAGs. To define an asset job you use

define_asset_job

, which will handle inferring the inputs to each asset. I also don't think you can really mix ops and assets - https://docs.dagster.io/concepts/ops-jobs-graphs/jobs#from-software-defined-assets

keanu thanks 1

Michael Hood

07/13/2023, 3:23 PM

Copy code

I also don't think you can really mix ops and assets

I was beginning to suspect something like this, but I hadn't seen it explicitly stated.

Michael Hood

07/13/2023, 3:25 PM

Somehow, the distinction between a op-based job and an asset-based one had escaped me.

Zach

07/13/2023, 3:29 PM

Yeah I agree that this could be better highlighted, it comes up pretty often in this channel that folks try to mix assets and ops as nothing really prevents you from doing so until you get weird errors and it's not really documented

🌈 1

chris

07/13/2023, 4:48 PM

You actually kind of can mix ops and assets; zach is totally right regarding the way dependencies work, but you can also specify a dependency to your downstream op job on my_df: https://github.com/dagster-io/dagster/discussions/10802

keanu thanks 1

Open in Slack

Previous Next