Hi - I have a collection of Assets. These Assets g...
# ask-community
n
Hi - I have a collection of Assets. These Assets get collected in a Graph, like
Copy code
@asset
# some assets ... 

@op
def myop(dfs):
   #do something with these input data frames

@graph
def mygraph():
    myop([asset1(), asset2()])

mygraph.to_job()
I've discovered two problems that I think are probably symptoms of the same common problem that this pattern isn't doing what I think its doing. First, in the UI, the View As Asset Graph view is empty, and tells me that things will only appear there when I add Assets to my Definitions. Second, I've realized that the 'assets' (or what I'm thinking of as assets) are getting stored not in the blob storage's directory, but in separate directories under a 'storage' directory. So I think--although my assets are defined as Assets (and they work as assets, storing themselves correctly outside this graph/job)--they're no longer assets when they're pulled into this graph? But I'm pretty confused about how to fix it. Any help would be very appreciated.
What I'm trying to do is simply collect a bunch of data frames and send emails about them. Each data frame is an asset, and I have an op that takes a list of data frames. It works, as I've set this up, but with this weird storage situation, where the assets are materialized in
/storage/something-like-a-uuid/asset/result
, under a different uuid for each run of the job.
s
Are you trying to materialize those assets inside the job? Or just read from them? The former is not currently possible, but the latter is. Here's an example of how to do the latter: https://docs.dagster.io/concepts/ops-jobs-graphs/graphs#loading-an-asset-as-an-input
n
thank you! That's the example I'm trying to use. When I do this following the example, bringing the assets into a job directly, I get an unsatisfying looking visual that shows the ops that are part of the job, but don't give any info about the underlying assets -
In my case I'm also using an op factory to make an op for each of the assets i'm sending emails about - so I end up with a bunch of these unconnected ops, and nothing to show what assets they rely on.
When I use a graph, the UI does show me how the assets relate to the send-emails op. But when the job runs, it seems to be a discrete thing that materializes the assets every time and stores them in a separate sub-directory.
I'm just trying to understand why these things work this way - There's probably a reason i'm mistaken to think that the assets should appear in the first job-based version of this. Some way I'm thinking about this system the wrong way. Is there something about what Ops and Assets are for that explains why the behavior i'm seeing is the behavior I should want?
s
Still trying to understand exactly what your goal is: do you want your job to materialize those assets every time it runs? Or do the assets get materialized separately, and you just want your job to read them and then send emails based on them?
n
i'm can deal with either way of doing it. I'm fine with the constraint that the job emailing about the assets can only read them after they've been materialized.
i just wish that job showed me, in the ui, what assets it depends on
(and the fact that the ui doesn't show me, makes me think I'm still doing something in a non idiomatic (dagster-matic?) way)
i'm experimenting with trying to set the
ins
explicitly
ins={"errs": AssetIn(asset_key[0])})(err)
, trying to use the OpFactory pattern from the docs
s
i just wish that job showed me, in the ui, what assets it depends on
Got it - that makes total sense. I agree with you. I filed an issue to track this: https://github.com/dagster-io/dagster/issues/13428.
n
okay, sweet. Thanks! And i guess that also means I'm not just doing something boneheaded. Thanks for the help!