:wave: Heya folks i’m working trough <snowflake, d...
# ask-community
b
👋 Heya folks i’m working trough snowflake, dagster SDAs and i’ve got a question - it’s highly likely that i’m failing to understand something but in cases when i want to read in a table from snowflake that already exists in snowflake, how do i use that asset later on. The example gives the following:
Copy code
@asset(
    key_prefix=["my_schema"]  # will be used as the schema in snowflake
)
def my_table() -> pd.DataFrame:  # the name of the asset will be the table name
    ...
dagster bot responded by community 1
🤖 1
j
@Bojan you'll need to specify an IO manager to replace the default filesystem IO manager (which just stores the pickled value that the function returns). dagster-snowflake provides an IO manager that does exactly what you're describing.
b
Oh i did all of that, the issue is that i can’t really use the initial table in the example above, i don’t know how to reference it.
j
@Bojan if you're using the IO manager from dagster-snowflake, that IO manager provides you the contents of the table as an input to the downstream asset.
If you instead want to operate on the table directly, you might need to define a custom IO manager that provides a table reference instead of the contents.
Almost all of our IO managers are custom because of similar needs
(e.g., we don't want to fetch full tables all of the time).
b
gotcha, i’m still testing this out but i suppose i’m getting some of these asset concepts wrong. eg. even when i’m using the dagsters official snowflake io manager. Lets say that i want to read in the whole table as a dataframe, slite it around and then pass it downstream. For that initial table read, should i try and create an asset (and if so how would i go about it) or do i use an op. eg.
Copy code
@asset(
    key_prefix=["my_schema"]  # will be used as the schema in snowflake
)
def scrub_my_table() -> pd.DataFrame:
    return my_table.dropna() # This obv doesn't work but how would i read in the my_table initially 
    ...
j
You could do 2 things: 1. Create an asset to represent the table and pass it downstream. The IO manager for your table asset will then be responsible for how the table is presented to the downstream asset. 2. Just configure your downstream asset with a resource to access the table and do the reading in the op (the function of the SDA).
If you need to ensure ordering between the table and any downstream assets (e.g., you want to update the table then consume it) - I would go with option 1.
If the table is static, or you don't need to worry about ordering, option 2 is much simpler.
b
Sorry for not replying @James Hale i got busy with other stuff and completely forgot - i’ll get back to this tomorrow Thanks for the input you’re awesome !
I’m essentially trying to achieve
1.
but i guess i haven’t figured out how to create and return the table from an asset using snowflake io manager
@James Hale sorry to be bothersome but if you have an example for
Copy code
Create an asset to represent the table and pass it downstream.
it would be quite helpful !
j
@Bojan do you have logic that adds values to the table? For us, table creation is handled in the IO manager. E.g., we have an IO manager that implements
handle_output()
as: 1. Store the
obj
output yielded by the software-defined asset as a JSON temp file 2. Create a table on Snowflake if it doesn't exist 3. Create a stage on Snowflake if it doesn't exist 4.
put
the file to the stage 5.
merge
the contents of the file into the table You can see that IO manager here - it's implemented with SQLAlchemy: https://gist.github.com/jayhale/c5f08dcd1656db1b82e3177425911091
❤️ 1