I'm storing assets in Snowflake and have just trie...
# ask-community
I'm storing assets in Snowflake and have just tried to add a new column to an asset. However, I'm getting a
Copy code
snowflake.connector.errors.ProgrammingError: 000904 (42000): SQL compilation error: error line 1 at position 156
invalid identifier 'COLUMN_NAME'
error. It seems like
is trying to insert data into this column without first adding the column to the current asset in snowflake. Any ideas on how I can add columns to snowflake assets without manually running the DML operations in snowflake? I tried to find docs about asset schema drift but not seeing anything.
hey @Ryan Meekins i’ll test this out. we use the snowflake pandas connector and the snowflake pyspark connector to handle storing pandas DFs and pyspark DFs (respectively), so the issue might be in how those connectors handle the addition of new columns. Just so i can more accurately test this out - is your asset partitioned? are you using pandas or pyspark?
My asset is partitioned using a
and I'm using pandas
but thanks, let me know! seems like this would be a common use case, but not sure if it'd be up to the underlying snowflake package or something you guys could do on your end
ok. i think partitioning might be the cause. usually before we store the data for an asset in snowflake we drop the whole table, but for partitioned assets we only delete the rows corresponding to the partition. i’ll need to look into how we can handle adding a new column in the partitioned case. thanks for bringing this up! you’re right that it seems like a common enough use case
okay great! if you add this to GitHub, can you share the link here so I can follow? thanks
so it looks like there isn’t a direct way to tell the pandas
function to create columns that dont exist. So we’d have to either: a) introspect on the dataframe, find the new column, map it to a snowflake type, and manually create the column b) copy the full table into memory, append the new partition data with the new column, drop the table, the load the new table into snowflake
I'd say a) would be much more scalable. my partitioned assets are >1M rows, seems like a lot of IO
introspecting on the type of the data could be rough, but we might be able to dig some mapping out of the snowflake connector library big cry
https://github.com/dagster-io/dagster/issues/13098 feel free to add any additional info!
👍 1