When using `dagster-snowflake-pandas`, when an `@a...
# integration-snowflake
When using
, when an
function returns a Data Frame, does Dagster download the data into Dataframe and hold it in memory? If yes, what mechanism does it use to download and build a Panda dataframe? Can it be possible to convert into a Polars dataframe ?
yes, dagster downloads the data into a dataframe in memory. We use the
method to do this (here’s the line in the source code). If you wanted to download as a Polars dataframe, you could write a new type handler for the snowflake io manager. This process isn’t documented right now (it’s a pretty new internal feature, and writing docs for it is on my to do list). I’m happy to help walk you through the process though if that’s something you’re interested in
Thanks for the information @jamie, yes I am interested in creating such type handler for Polars I am guessing the process involves creating sub-type of
like https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster[…]andas/dagster_snowflake_pandas/snowflake_pandas_type_handler.py And “registering” it someplace to have it injected during load/unload ?
yep! that’s exactly it - here are a couple other examples of type handlers (for duckdb) pandas and pyspark once you write the type handler, you “register” it with the snowflake io manager using the
command like this https://github.com/dagster-io/dagster/blob/master/python_modules/libraries/dagster[…]andas/dagster_snowflake_pandas/snowflake_pandas_type_handler.py
👍 1
basically the flow of control goes like this: an asset/op returns an output, then the snowflake io manager figures out what part of the table needs to be deleted (ie if you are re-materializing a partition, we delete the old data so that it gets replaced with the new data). Then the
method of the type handler is called, and that method is responsible for actually uploading the output to snowflake
and I realize that dagster first uses
snowflake command to upload the parquet file and then
snowflake command to create/upsert table entries.
is it also doing reverse
to download the data ?
can you link to the lines where you’re seeing PUT commands? just want to make sure we’re looking at the same thing
ah so that’s a method on the snowflake resource that you can call from the body of an op/asset if you want. it’s not used by the io manager at all
this is the io manager file - it does create a SnowflakeConnection object (that’s what’s in resources.py) but it’s only used to create a connection to snowflake. all of the SQL commands are in the io manager file
it’s admittedly a bit confusing that there’s a separate io manager and resource right now. we should probably consolidate into one resource that can also be an io manager
👍 1