When using `dagster-snowflake-pandas`, when an `@a...
# integration-snowflake
b
When using
dagster-snowflake-pandas
, when an
@asset
function returns a Data Frame, does Dagster download the data into Dataframe and hold it in memory? If yes, what mechanism does it use to download and build a Panda dataframe? Can it be possible to convert into a Polars dataframe ?
j
yes, dagster downloads the data into a dataframe in memory. We use the
pd.read_sql
method to do this (here’s the line in the source code). If you wanted to download as a Polars dataframe, you could write a new type handler for the snowflake io manager. This process isn’t documented right now (it’s a pretty new internal feature, and writing docs for it is on my to do list). I’m happy to help walk you through the process though if that’s something you’re interested in
b
Thanks for the information @jamie, yes I am interested in creating such type handler for Polars I am guessing the process involves creating sub-type of
DbTypeHandler
like https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster[…]andas/dagster_snowflake_pandas/snowflake_pandas_type_handler.py And “registering” it someplace to have it injected during load/unload ?
j
yep! that’s exactly it - here are a couple other examples of type handlers (for duckdb) pandas and pyspark once you write the type handler, you “register” it with the snowflake io manager using the
build_snowflake_io_manager
command like this https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster[…]andas/dagster_snowflake_pandas/snowflake_pandas_type_handler.py
👍 1
basically the flow of control goes like this: an asset/op returns an output, then the snowflake io manager figures out what part of the table needs to be deleted (ie if you are re-materializing a partition, we delete the old data so that it gets replaced with the new data). Then the
handle_output
method of the type handler is called, and that method is responsible for actually uploading the output to snowflake
b
and I realize that dagster first uses
PUT
snowflake command to upload the parquet file and then
COPY
snowflake command to create/upsert table entries.
is it also doing reverse
UNLOAD/GET
to download the data ?
j
can you link to the lines where you’re seeing PUT commands? just want to make sure we’re looking at the same thing
j
ah so that’s a method on the snowflake resource that you can call from the body of an op/asset if you want. it’s not used by the io manager at all
this is the io manager file - it does create a SnowflakeConnection object (that’s what’s in resources.py) but it’s only used to create a connection to snowflake. all of the SQL commands are in the io manager file
it’s admittedly a bit confusing that there’s a separate io manager and resource right now. we should probably consolidate into one resource that can also be an io manager
👍 1