https://dagster.io/ logo
#dagster-support
Title
# dagster-support
a

Alec Ryan

04/18/2022, 12:30 PM
Hey there 👋, so I've spent some time trying to digest the concept of an asset. I understand it to be an op that materializes an asset (table, dbt model, etc.). In a typical pipeline, I assume that you would use a combination of ops and assets in the same job. For example, an op that deletes a partition out of a table and then an asset that loads data into that table. How can I use them in conjunction? The docs don't show an example of this that I can find, so I'm wondering if this is intentional. Thanks!
1
j

johann

04/18/2022, 8:39 PM
cc @owen
o

owen

04/18/2022, 8:43 PM
just elaborating on my response from another thread, the way I would conceptualize this would be that the code required to update this external data object would be the combination of those two operations (as the delete step must be run every time you want to add data to that table). This means that both these steps be part of the definition of this asset. For now, the best solution would be to combine both the deletion and the load into the same op, but this has obvious drawbacks. For cases like these, we're extending the asset interface to allow an asset to be backed by a graph of ops (in this case, delete_op->load_op), rather than a single op.
a

Alec Ryan

04/18/2022, 8:44 PM
@owen how would I include this in a job?
It seems like assets & graphs are somewhat disjointed?
o

owen

04/18/2022, 8:50 PM
the way I would view it is that, when you define a group of assets (AssetGroup), you are saying that "this is a group of data objects that I want to exist, and here's how to build each one from its parents". Dagster then is able to create a job that will be able to materialize these data objects (automatically creating/organizing a graph of operations based on the relationships between each asset)
so it's less of "adding assets into a job" and more of "creating a job from a set of assets"
a

Alec Ryan

04/18/2022, 8:53 PM
Got it, that does seem clear. Say I have an op or graph that I want to run prior to creating a job from a set of assets. How can I do that? In my case, it is loading data from s3 to snowflake prior to materializing my dbt models
Does the s3 to snowflake bit need to be encapsulated in an asset as well?
o

owen

04/18/2022, 8:54 PM
I think that would be the most straightforward way to do it (and mesh the best with the UI/concepts built around SDAs)
the dbt asset integration parses the sources in the dbt project, so you'd need to have a source (or sources) set up to represent the stuff you're loading into snowflake
a

Alec Ryan

04/18/2022, 8:56 PM
So I guess the question is, why would I ever use graphs? It seems like the best way to build any data pipeline would be to use SDAs and create jobs from them
In some sense, SDAs are defining graphs right?
o

owen

04/18/2022, 8:59 PM
Sometimes you'll have intermediate steps that do not create persistent data objects, so for that use case people will still need to interact with the graph object (even if that's just one subgraph in a larger graph of connected assets), but in general you're right that for people defining data pipelines, we hope that most people see SDAs as the superior option 🙂
a

Alec Ryan

04/18/2022, 9:01 PM
How can I build dependencies between assets and ops that don't create persistent data objects?
o

owen

04/18/2022, 9:01 PM
SDAs do get compiled into graphs (and from there, jobs). in some sense, graphs/jobs/ops are the core executable entities for Dagster, and SDAs are a more convenient way of creating these executable things
🎉 1
💡 1
as for the dependencies thing, this interface does not exist yet (although will soon!):
Copy code
@graph_asset
def my_cool_asset(some_dep):
    return write_to_table_op(transform_op1(transform_op2(some_dep))
here, we have a bunch of different ops involved in the creation of the single asset
my_cool_asset
a

Alec Ryan

04/18/2022, 9:04 PM
ooo that would be great
So then that asset can be added to an asset group?
o

owen

04/18/2022, 9:04 PM
yep exactly
a

Alec Ryan

04/18/2022, 9:05 PM
can we also build dependencies between that example asset ^ and say a dbt_model_asset?
How would dbt know what it's source(s) are?
o

owen

04/18/2022, 9:06 PM
for upstream dependencies, we parse the sources defined in the dbt project: https://docs.getdbt.com/docs/building-a-dbt-project/using-sources
a

Alec Ryan

04/18/2022, 9:07 PM
And you can tell what is upstream from that source table?
An s3 bucket for example?
If there is an op that loads data to s3 and that s3 file is manifested as an asset that is
o

owen

04/18/2022, 9:10 PM
so for your usecase of (s3 -> snowflake) -> dbt, I would specify whatever snowflake table is being updated in that first step as a source to the dbt project (in sources.yml). Then, the SDA created by dagster for that dbt project will know that it depends on some upstream asset (that snowflake table). You could then supply a definition for that asset (which is just the s3->snowflake process), by creating another SDA. Now that Dagster knows "dbt depends on upstream snowflake table" and "this step creates that snowflake table", it can wire those steps in series.
a

Alec Ryan

04/18/2022, 9:19 PM
And these can all be included in the same AssetGroup?
o

owen

04/18/2022, 9:19 PM
yeah exactly
a

Alec Ryan

04/18/2022, 9:20 PM
Okay, interesting. Have you seen most users shifting to using assets only? It seems based on the content of recent youtube uploads that these are relatively new
o

owen

04/18/2022, 9:22 PM
yep they are definitely new (basically didn't advertise them at all until a couple months ago) but we have seen a pretty good amount of adoption. we're still missing a lot of docs / tutorial content, as well as some features (like the @graph_asset thing) that we'd like before we start pointing people to these interfaces as the primary way to interact with Dagster
a

Alec Ryan

04/18/2022, 9:32 PM
Good to know, thanks for all of the helpful info. Looking forward to digging into this more
Final question... Say I have a dbt asset group, a snowflake asset group, etc.. Can multiple asset groups be grouped into one asset group?
o

owen

04/18/2022, 9:37 PM
Would this mostly be for organizational purposes? right now, you would just create a single asset group containing all of your assets, but we're looking into other ways of organizing assets (potentially doing something like what you're describing of combining multiple asset groups into a single one)
and no problem 🙂
a

Alec Ryan

04/18/2022, 9:38 PM
Yup, just for organization was my thought
d

Dimitris Stafylarakis

05/16/2022, 10:13 AM
hi @owen I’m also testing Dagster’s SDAs these days and I’d like to butt in here with a small question 😄 trying to implement your approach, it seems to work nicely 👍 dbt assets are now labeled with “upstream changed” after materializing them, but I don’t understand why, perhaps you have a clue? Maybe worth mentioning that the upstream op is (daily) partitioned and I’ve only run one partition till now. ..and running dagster 0.14.15
o

owen

05/16/2022, 4:16 PM
hi @Dimitris Stafylarakis, thanks for the report! that definitely sounds like a bug -- we can look into what's going on there 🙂
2 Views