I m storing assets in Snowflake and have just tried to add a dagster #ask-community

I'm storing assets in Snowflake and have just trie...

Ryan Meekins

03/22/2023, 7:42 PM

I'm storing assets in Snowflake and have just tried to add a new column to an asset. However, I'm getting a

Copy code

snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 1 at position 156
invalid identifier 'COLUMN_NAME'

error. It seems like

dagster-snowflake

is trying to insert data into this column without first adding the column to the current asset in snowflake. Any ideas on how I can add columns to snowflake assets without manually running the DML operations in snowflake? I tried to find docs about asset schema drift but not seeing anything.

jamie

03/22/2023, 8:13 PM

hey @Ryan Meekins i’ll test this out. we use the snowflake pandas connector and the snowflake pyspark connector to handle storing pandas DFs and pyspark DFs (respectively), so the issue might be in how those connectors handle the addition of new columns. Just so i can more accurately test this out - is your asset partitioned? are you using pandas or pyspark?

Ryan Meekins

03/22/2023, 8:15 PM

My asset is partitioned using a

DailyPartitionsDefinition

and I'm using pandas

Ryan Meekins

03/22/2023, 8:18 PM

but thanks, let me know! seems like this would be a common use case, but not sure if it'd be up to the underlying snowflake package or something you guys could do on your end

jamie

03/22/2023, 8:19 PM

ok. i think partitioning might be the cause. usually before we store the data for an asset in snowflake we drop the whole table, but for partitioned assets we only delete the rows corresponding to the partition. i’ll need to look into how we can handle adding a new column in the partitioned case. thanks for bringing this up! you’re right that it seems like a common enough use case

Ryan Meekins

03/22/2023, 8:23 PM

okay great! if you add this to GitHub, can you share the link here so I can follow? thanks

jamie

03/22/2023, 8:23 PM

so it looks like there isn’t a direct way to tell the pandas

to_sql

function to create columns that dont exist. So we’d have to either: a) introspect on the dataframe, find the new column, map it to a snowflake type, and manually create the column b) copy the full table into memory, append the new partition data with the new column, drop the table, the load the new table into snowflake

Ryan Meekins

03/22/2023, 8:25 PM

I'd say a) would be much more scalable. my partitioned assets are >1M rows, seems like a lot of IO

jamie

03/22/2023, 8:27 PM

introspecting on the type of the data could be rough, but we might be able to dig some mapping out of the snowflake connector library big cry

jamie

03/22/2023, 8:28 PM

https://github.com/dagster-io/dagster/issues/13098 feel free to add any additional info!

👍 1

Open in Slack

Previous Next