:snowflake: New in `dagster-snowflake` 0.19.0 :sn...
# integration-snowflake
j
❄️ New in
dagster-snowflake
0.19.0 ❄️ Happy 1.3 release everyone! • The Snowflake tutorial and reference page have been updated to use the pythonic resources and config versions of Snowflake I/O managers. If you are still using the pre-0.19.0 versions of these I/O managers, you can still see the older versions of the tutorial and reference page by selecting a prior version of the documentation in the dropdown at the top of the page • This release includes a breaking change for the
SnowflakePandasIOManager
and any other I/O managers using the
SnowflakePandasTypeHandler
. Due to a longstanding issue with the snowflake-python connector, Timestamp data cannot be directly stored in Snowflake (the data get’s garbled and stored as non-sensical dates). Prior to this release, we converted all timestamp data to strings when storing dataframes, and did the reverse conversion when loading dataframes. However, you can also avoid the issue by ensuring that timestamp data has a timezone attached to it. We think this is a less invasive change, and so the
SnowflakePandasIOManager
will now attach the UTC timezone to timestamp data that does not already have a timezone. However, if you would like to continue storing timestamp data as strings, or you have already materialized timestamp data and do not want to migrate the corresponding Snowflake table to a table with a TIMESTAMP column, you can set the configuration value
store_timestamps_as_strings=True
. If you have materialized an asset with timestamp data that was stored as strings and would like to migrate the corresponding table so that you can store timestamp data as timestamps, you can follow the instructions in the migration guide
f
Hello 👋 Snowflake type TIMESTAMP_NTZ is a datetime "without time zone", whereas TIMESTAMP_TZ is a timezone aware datetime. Using the snowflake-python connector, datetimes with timezone get converted to TIMESTAMP_NTZ (as "timezone naive" UTC). Therefore the original timezone and timezone offset information is lost. I think this loss of information could be problematic for datasets containing non-utc time zones, in scenarios where the timezone offset information is important
j
hey Felix! yeah you’re totally correct about this, so thanks for bringing it up! I did some investigation and it looks like there’s actually a bug in our code. When
store_timestamps_as_strings=True
and you have timestamp data with timezones, the pandas function we are using to determine if a value is a timestamp-like type returns False (why? i don’t know yet) so the column doesn’t get converted to strings. This problem existed long before this release, so with
store_timestamps_as_strings=True
you’ll still lose timestamp info. I will look into this and hopefully find a replacement function. Again thanks for pointing this out!
f
Hey Jamie, no problem. When testing the snowflake-python connector, I found some unexpected results happening when loading dataframes to Snowflake and back. It depends on the column type and whether there are Nones in the values or not. For instance, integers in python will become floats after a round-trip (to and from Snowflake). Times will become floats as well if I remember correctly. Lists, Sets and Dicts will become string representations
I did not check if (and how) Dagster's Snowflake IO manager is handling these inconsistencies