Hi support :slightly_smiling_face: `'AssetsDefinit...
# ask-community
c
Hi support 🙂
'AssetsDefinition' object has no attribute 'configured'
is this in the backlog? Or am I using Dagster incorrectly? Thanks in advance.
j
Hi @Chris Histe can you share a code snippet? that would be super helpful for figuring out what's going on my best guess based on the error message: I am assuming you are creating an asset from a graph and then attempting to use the configured api on the asset. if that is correct, instead you will want to use the
configured
api on the ops that make up the graph, then turn the graph into an asset. Let me know if that assumption is incorrect!
c
Yes I’m trying to use
.configured
. It works on ops but not for assets. I’m a bit confused about when to use assets, when to use ops and when use graphs. Shouldn’t the asset and op APIs be equivalent?
I can share a snippet when I’m back on my laptop
j
cool, once there's a code snippet it'll be easier to give more useful advice. In the meantime, we have this quick guide that goes over when to use ops vs assets https://docs.dagster.io/guides/dagster/enriching-with-software-defined-assets#when-should-i-use-software-defined-assets (the whole page might be useful to you, but this section is probably the most relevant) For graphs: graphs are a way to organize ops and define the dependencies between them. If you think about a graph in the data structure sense, an
op
is like a node and a
graph
defines the set of edges between the nodes
c
Copy code
@op(
    required_resource_keys={"bigquery"},
    config_schema={"table_name": str},
    ins={"dataframe": In()},
)
def write_to_bigquery(context: OpExecutionContext, dataframe: pd.DataFrame) -> None:
    table_name = context.op_config["table_name"]
    context.resources.bigquery.load_table_from_dataframe(dataframe, destination).result()

@graph
def my_graph():
    names, ages, heights = get_dataframes()

    write_to_bigquery.configured({"table_name": "names"}, "write_metadata_to_bigquery")(names)
    write_to_bigquery.configured({"table_name": "ages"}, "write_packages_to_bigquery")(ages)
    write_to_bigquery.configured({"table_name": "heights"}, "write_repositories_to_bigquery")(heights)
This is simplified example of my code. I would like the @op to be an asset since it’s actually creating tables. That link you sent is very useful thanks a lot.
get_dataframes
is a
@multi_asset
I’m sensing that maybe I’m doing something incorrectly. My @multi_asset called get_dataframes could simply be an @op with multiple outputs. Its output is not directly kept permanent but use it to produce my table. Should I just make my tables as assets and any computation to produce them as ops?
The problem here is that my graph is producing 3 different tables, I would like them to be 3 different assets. Or should I consider those 3 tables as the same asset? They will always be updated at the same time.
j
based on the code snippet i think you could start by doing the following: • create three assets:
names
,
ages
, and
heights
◦ it's helpful to note that you don't need to write a separate op to produce an asset, the function that the
@asset
decorator decorate is what should produce the asset ◦ within the body of the asset function you can use the bigquery resource to upload the dataframe ▪︎ eventually you should consider using/writing a bigquery IO manager. basically this means that the dagster machinery would call the uploading functions for you. docs starting out this way might help you get a feel for assets and how they work. then it should be pretty trivial to combine them into a multi_asset. here's some pseudocode of what i'm thinking for each phase
Copy code
@asset
def names(context):
    names_df = # code to create the names dataframe
    return names_df
start with something like this for each of your assets. When you materialize them, they'll be saved to your local computer. So next thing we'll do it set them up in bigquery
Copy code
@asset(
    required_resource_keys={"bigquery"}
)
def names(context):
    names_df = # code to create the names dataframe
    context.resources.bigquery.load_table_from_dataframe(names_df, "names")
    return names_df
this will upload everything to bigquery, but still save all the dfs to your local machine (this is because dagster uses the file system io manager by default, so everything that is returned from an asset will be saved to your local file system) The next iteration would be to write a bigquery io manager and then you can make your assets more simple again
Copy code
@asset(
    io_manager_key="bigquery"
)
def names(context):
    names_df = # code to create the names dataframe
    return names_df # when this is returned the bigquery io manager will upload it to bigquery
combining everything into a multi asset would look like this (this multi asset would be instead of three separate assets like in the previous examples)
Copy code
@multi_asset(
    io_manager_key="bigquery"
)
def all_data():
    names_df = # code to create names dataframe 
    ages_df = # code to create ages df 
    heights_df = # code to create heights df 
    return names_df, ages_df, heights_df
c
I love the IO manager idea, will definitely implement that. My code is a bit more complex. My multi_asset is the output of a Kubernetes pod. I really need this to be a multi_asset that returns multiple outputs. And then to write multiple tables with that. But I think this gave me some ideas on how remove the
.configured
j
if you have a bigquery io manager, when you return multiple outputs from the multi-asset, it will create a new table for each output! definitely let me know if there is anything else i can help out with as you work on this
c
I see, what if I had an intermediary step between my multi assets and writing to bigquery. Should I just make this code part of the multiasset code? It seems like graph is the right abstraction for me. Do you agree?
j
yeah if you have multiple steps to get your assets into the state you want them stored as, you should look into graph backed assets https://docs.dagster.io/concepts/assets/software-defined-assets#graph-backed-assets basically you will write all of the ops you need to make your assets and then the values you return from the final graph are what will be stored as assets by the io manager
c
Perfect thanks a lot