Is there any way to wrap the `snowflake_resource` ...
# announcements
b
Is there any way to wrap the
snowflake_resource
in an sqlalchemy connection? Or would you guys be open to a PR to make the
snowflake_resource
use sqlalchemy? The reason is, some pandas function such as
.to_sql
require an sqlalchemy connection, so as far as I can tell / and via my own testing the pandas SQL utilities won't be available for Snowflake using the Dagster
snowflake_resource
directly as it is now. Any thoughts?
d
Not sure if you've seen this already, but the snowflake python package
pandas_tools
module also has a builtin
write_pandas()
method for doing this: https://docs.snowflake.com/en/user-guide/python-connector-api.html#label-python-connector-api-write-pandas
we ultimately decided not to use it because IIRC it requires writing intermediate data temporarily to disk, but it is another option
for writing DataFrames to Snowflake, the pattern we ultimately landed on is writing the DataFrame to S3 using a
df_to_s3
solid that writes the DataFrame as a file to S3 and outputs the S3 object location, and then passes that S3 location to a downstream
s3_to_snowflake
solid that dynamically builds the
COPY INTO
statement based on the S3 object location:
we landed on this way because we noticed it was significantly more performant than the
write_pandas
and
.to_sql()
routes
b
Yeah that all makes sense and I was aware of the
write_pandas
function. The reason I've been wanting to use
.to_sql
from my dataframe is because it will create the table for me if it doesn't exist already, with the proper schema. I'm migrating a postgres DB to snowflake and have 50+ tables to copy over, so I don't want to have to manually create each table, define types etc. Any other workaround you can think for this issue?
write_pandas
won't create the table for me
d
ah I see I see. Yeah, not sure of another route that would create the tables for you. The only thing I could think of is to create a solid that iterates through all postgres tables, inspects the schema, and dynamically creates and runs an equivalent
CREATE TABLE
snowflake command before copying data
s
@Basil V I've actually faced this issue myself. If you wanted to submit a PR, I'd be happy to take a look
b
@sandy just put up a PR https://github.com/dagster-io/dagster/pull/3214 — I tried to maintain all the default behavior, very open to suggestions for improving it. Lmk what you guys think. Thanks!
Hey @sandy Happy New Year to you guys! I just got around to responding to your PR comments and updating accordingly (for PR #3214 link above). Thanks for reviewing—let me know if you need me to do anything else on it in order to get it merged. Much appreciated!
s
Thanks @Basil V! I'll take another look
💯 1
b
Hey @sandy I went through your second review comments, updated the formatting etc. It looks like it is still failing 3 tests—but not sure if those are due to my PR or something else. Anyways let me know if it looks good or if you need me to make any other changes. Thanks again!
Hey @sandy saw that you merged my PR thanks again for all the help reviewing! Do you know when the next release is expected / when can I anticipate that these changes will be available? Thanks!
s
Hey Basil - our next release should be on Thursday this week
b
Awesome thanks!