Hi support slightly smiling face ` AssetsDefinition object h dagster #ask-community

Hi support :slightly_smiling_face: `'AssetsDefinit...

Chris Histe

07/21/2022, 3:39 PM

Hi support 🙂

'AssetsDefinition' object has no attribute 'configured'

is this in the backlog? Or am I using Dagster incorrectly? Thanks in advance.

jamie

07/21/2022, 4:00 PM

Hi @Chris Histe can you share a code snippet? that would be super helpful for figuring out what's going on my best guess based on the error message: I am assuming you are creating an asset from a graph and then attempting to use the configured api on the asset. if that is correct, instead you will want to use the

configured

api on the ops that make up the graph, then turn the graph into an asset. Let me know if that assumption is incorrect!

Chris Histe

07/21/2022, 5:00 PM

Yes I’m trying to use

.configured

. It works on ops but not for assets. I’m a bit confused about when to use assets, when to use ops and when use graphs. Shouldn’t the asset and op APIs be equivalent?

Chris Histe

07/21/2022, 5:00 PM

I can share a snippet when I’m back on my laptop

jamie

07/21/2022, 5:32 PM

cool, once there's a code snippet it'll be easier to give more useful advice. In the meantime, we have this quick guide that goes over when to use ops vs assets https://docs.dagster.io/guides/dagster/enriching-with-software-defined-assets#when-should-i-use-software-defined-assets (the whole page might be useful to you, but this section is probably the most relevant) For graphs: graphs are a way to organize ops and define the dependencies between them. If you think about a graph in the data structure sense, an

op

is like a node and a

graph

defines the set of edges between the nodes

Chris Histe

07/21/2022, 7:52 PM

Copy code

@op(
    required_resource_keys={"bigquery"},
    config_schema={"table_name": str},
    ins={"dataframe": In()},
)
def write_to_bigquery(context: OpExecutionContext, dataframe: pd.DataFrame) -> None:
    table_name = context.op_config["table_name"]
    context.resources.bigquery.load_table_from_dataframe(dataframe, destination).result()

@graph
def my_graph():
    names, ages, heights = get_dataframes()

    write_to_bigquery.configured({"table_name": "names"}, "write_metadata_to_bigquery")(names)
    write_to_bigquery.configured({"table_name": "ages"}, "write_packages_to_bigquery")(ages)
    write_to_bigquery.configured({"table_name": "heights"}, "write_repositories_to_bigquery")(heights)

This is simplified example of my code. I would like the @op to be an asset since it’s actually creating tables. That link you sent is very useful thanks a lot.

Chris Histe

07/21/2022, 7:52 PM

get_dataframes

is a

@multi_asset

Chris Histe

07/21/2022, 8:00 PM

I’m sensing that maybe I’m doing something incorrectly. My @multi_asset called get_dataframes could simply be an @op with multiple outputs. Its output is not directly kept permanent but use it to produce my table. Should I just make my tables as assets and any computation to produce them as ops?

Chris Histe

07/21/2022, 8:02 PM

The problem here is that my graph is producing 3 different tables, I would like them to be 3 different assets. Or should I consider those 3 tables as the same asset? They will always be updated at the same time.

jamie

07/21/2022, 8:14 PM

based on the code snippet i think you could start by doing the following: • create three assets:

names

ages

, and

heights

◦ it's helpful to note that you don't need to write a separate op to produce an asset, the function that the

@asset

decorator decorate is what should produce the asset ◦ within the body of the asset function you can use the bigquery resource to upload the dataframe ▪︎ eventually you should consider using/writing a bigquery IO manager. basically this means that the dagster machinery would call the uploading functions for you. docs starting out this way might help you get a feel for assets and how they work. then it should be pretty trivial to combine them into a multi_asset. here's some pseudocode of what i'm thinking for each phase

Copy code

@asset
def names(context):
    names_df = # code to create the names dataframe
    return names_df

start with something like this for each of your assets. When you materialize them, they'll be saved to your local computer. So next thing we'll do it set them up in bigquery

Copy code

@asset(
    required_resource_keys={"bigquery"}
)
def names(context):
    names_df = # code to create the names dataframe
    context.resources.bigquery.load_table_from_dataframe(names_df, "names")
    return names_df

this will upload everything to bigquery, but still save all the dfs to your local machine (this is because dagster uses the file system io manager by default, so everything that is returned from an asset will be saved to your local file system) The next iteration would be to write a bigquery io manager and then you can make your assets more simple again

Copy code

@asset(
    io_manager_key="bigquery"
)
def names(context):
    names_df = # code to create the names dataframe
    return names_df # when this is returned the bigquery io manager will upload it to bigquery

combining everything into a multi asset would look like this (this multi asset would be instead of three separate assets like in the previous examples)

Copy code

@multi_asset(
    io_manager_key="bigquery"
)
def all_data():
    names_df = # code to create names dataframe 
    ages_df = # code to create ages df 
    heights_df = # code to create heights df 
    return names_df, ages_df, heights_df

Chris Histe

07/21/2022, 8:24 PM

I love the IO manager idea, will definitely implement that. My code is a bit more complex. My multi_asset is the output of a Kubernetes pod. I really need this to be a multi_asset that returns multiple outputs. And then to write multiple tables with that. But I think this gave me some ideas on how remove the

.configured

jamie

07/21/2022, 8:46 PM

if you have a bigquery io manager, when you return multiple outputs from the multi-asset, it will create a new table for each output! definitely let me know if there is anything else i can help out with as you work on this

Chris Histe

07/21/2022, 9:01 PM

I see, what if I had an intermediary step between my multi assets and writing to bigquery. Should I just make this code part of the multiasset code? It seems like graph is the right abstraction for me. Do you agree?

jamie

07/21/2022, 9:04 PM

yeah if you have multiple steps to get your assets into the state you want them stored as, you should look into graph backed assets https://docs.dagster.io/concepts/assets/software-defined-assets#graph-backed-assets basically you will write all of the ops you need to make your assets and then the values you return from the final graph are what will be stored as assets by the io manager

Chris Histe

07/21/2022, 9:10 PM

Perfect thanks a lot

34 Views

Open in Slack

Previous Next