Chris Histe
07/21/2022, 3:39 PM'AssetsDefinition' object has no attribute 'configured'
is this in the backlog? Or am I using Dagster incorrectly? Thanks in advance.jamie
07/21/2022, 4:00 PMconfigured
api on the ops that make up the graph, then turn the graph into an asset. Let me know if that assumption is incorrect!Chris Histe
07/21/2022, 5:00 PM.configured
. It works on ops but not for assets.
I’m a bit confused about when to use assets, when to use ops and when use graphs.
Shouldn’t the asset and op APIs be equivalent?jamie
07/21/2022, 5:32 PMop
is like a node and a graph
defines the set of edges between the nodesChris Histe
07/21/2022, 7:52 PM@op(
required_resource_keys={"bigquery"},
config_schema={"table_name": str},
ins={"dataframe": In()},
)
def write_to_bigquery(context: OpExecutionContext, dataframe: pd.DataFrame) -> None:
table_name = context.op_config["table_name"]
context.resources.bigquery.load_table_from_dataframe(dataframe, destination).result()
@graph
def my_graph():
names, ages, heights = get_dataframes()
write_to_bigquery.configured({"table_name": "names"}, "write_metadata_to_bigquery")(names)
write_to_bigquery.configured({"table_name": "ages"}, "write_packages_to_bigquery")(ages)
write_to_bigquery.configured({"table_name": "heights"}, "write_repositories_to_bigquery")(heights)
This is simplified example of my code. I would like the @op to be an asset since it’s actually creating tables.
That link you sent is very useful thanks a lot.get_dataframes
is a @multi_asset
jamie
07/21/2022, 8:14 PMnames
, ages
, and heights
◦ it's helpful to note that you don't need to write a separate op to produce an asset, the function that the @asset
decorator decorate is what should produce the asset
◦ within the body of the asset function you can use the bigquery resource to upload the dataframe
▪︎ eventually you should consider using/writing a bigquery IO manager. basically this means that the dagster machinery would call the uploading functions for you. docs
starting out this way might help you get a feel for assets and how they work. then it should be pretty trivial to combine them into a multi_asset.
here's some pseudocode of what i'm thinking for each phase
@asset
def names(context):
names_df = # code to create the names dataframe
return names_df
start with something like this for each of your assets. When you materialize them, they'll be saved to your local computer. So next thing we'll do it set them up in bigquery
@asset(
required_resource_keys={"bigquery"}
)
def names(context):
names_df = # code to create the names dataframe
context.resources.bigquery.load_table_from_dataframe(names_df, "names")
return names_df
this will upload everything to bigquery, but still save all the dfs to your local machine (this is because dagster uses the file system io manager by default, so everything that is returned from an asset will be saved to your local file system)
The next iteration would be to write a bigquery io manager and then you can make your assets more simple again
@asset(
io_manager_key="bigquery"
)
def names(context):
names_df = # code to create the names dataframe
return names_df # when this is returned the bigquery io manager will upload it to bigquery
combining everything into a multi asset would look like this (this multi asset would be instead of three separate assets like in the previous examples)
@multi_asset(
io_manager_key="bigquery"
)
def all_data():
names_df = # code to create names dataframe
ages_df = # code to create ages df
heights_df = # code to create heights df
return names_df, ages_df, heights_df
Chris Histe
07/21/2022, 8:24 PM.configured
jamie
07/21/2022, 8:46 PMChris Histe
07/21/2022, 9:01 PMjamie
07/21/2022, 9:04 PMChris Histe
07/21/2022, 9:10 PM