Hi I have a collection of Assets These Assets get collected dagster #ask-community

Hi - I have a collection of Assets. These Assets g...

NateV

04/06/2023, 1:36 AM

Hi - I have a collection of Assets. These Assets get collected in a Graph, like

Copy code

@asset
# some assets ... 

@op
def myop(dfs):
   #do something with these input data frames

@graph
def mygraph():
    myop([asset1(), asset2()])

mygraph.to_job()

I've discovered two problems that I think are probably symptoms of the same common problem that this pattern isn't doing what I think its doing. First, in the UI, the View As Asset Graph view is empty, and tells me that things will only appear there when I add Assets to my Definitions. Second, I've realized that the 'assets' (or what I'm thinking of as assets) are getting stored not in the blob storage's directory, but in separate directories under a 'storage' directory. So I think--although my assets are defined as Assets (and they work as assets, storing themselves correctly outside this graph/job)--they're no longer assets when they're pulled into this graph? But I'm pretty confused about how to fix it. Any help would be very appreciated.

NateV

04/06/2023, 1:58 AM

What I'm trying to do is simply collect a bunch of data frames and send emails about them. Each data frame is an asset, and I have an op that takes a list of data frames. It works, as I've set this up, but with this weird storage situation, where the assets are materialized in

/storage/something-like-a-uuid/asset/result

, under a different uuid for each run of the job.

sandy

04/06/2023, 5:39 PM

Are you trying to materialize those assets inside the job? Or just read from them? The former is not currently possible, but the latter is. Here's an example of how to do the latter: https://docs.dagster.io/concepts/ops-jobs-graphs/graphs#loading-an-asset-as-an-input

NateV

04/06/2023, 7:52 PM

thank you! That's the example I'm trying to use. When I do this following the example, bringing the assets into a job directly, I get an unsatisfying looking visual that shows the ops that are part of the job, but don't give any info about the underlying assets -

NateV

04/06/2023, 7:53 PM

In my case I'm also using an op factory to make an op for each of the assets i'm sending emails about - so I end up with a bunch of these unconnected ops, and nothing to show what assets they rely on.

NateV

04/06/2023, 7:55 PM

When I use a graph, the UI does show me how the assets relate to the send-emails op. But when the job runs, it seems to be a discrete thing that materializes the assets every time and stores them in a separate sub-directory.

NateV

04/06/2023, 7:59 PM

I'm just trying to understand why these things work this way - There's probably a reason i'm mistaken to think that the assets should appear in the first job-based version of this. Some way I'm thinking about this system the wrong way. Is there something about what Ops and Assets are for that explains why the behavior i'm seeing is the behavior I should want?

sandy

04/06/2023, 8:58 PM

Still trying to understand exactly what your goal is: do you want your job to materialize those assets every time it runs? Or do the assets get materialized separately, and you just want your job to read them and then send emails based on them?

NateV

04/06/2023, 9:00 PM

i'm can deal with either way of doing it. I'm fine with the constraint that the job emailing about the assets can only read them after they've been materialized.

NateV

04/06/2023, 9:00 PM

i just wish that job showed me, in the ui, what assets it depends on

NateV

04/06/2023, 9:01 PM

(and the fact that the ui doesn't show me, makes me think I'm still doing something in a non idiomatic (dagster-matic?) way)

NateV

04/06/2023, 9:04 PM

i'm experimenting with trying to set the

ins

explicitly

ins={"errs": AssetIn(asset_key[0])})(err)

, trying to use the OpFactory pattern from the docs

sandy

04/06/2023, 9:08 PM

i just wish that job showed me, in the ui, what assets it depends on

Got it - that makes total sense. I agree with you. I filed an issue to track this: https://github.com/dagster-io/dagster/issues/13428.

NateV

04/06/2023, 9:09 PM

okay, sweet. Thanks! And i guess that also means I'm not just doing something boneheaded. Thanks for the help!

Open in Slack

Previous Next