What’s the best way to store and retrieve JSON blo...
# integration-snowflake
d
What’s the best way to store and retrieve JSON blobs in Snowflake, using dataframes and the Snowflake I/O manager?
dagster bot responded by community 1
🤖 1
A few options I can think of: 1. Dump the JSON to a string, add that to the result dataframe, and have it written as a
VARCHAR
in Snowflake. Then load it back into Python in downstream assets. 2. Write the data using a query via the Snowflake resource, and use parse_json to make sure it gets stored as the
VARIANT
type 3. Set the right
dtype
on the dataframe? (I’m new to pandas, so just a guess)
m
It depends on your data, but for relatively simple JSONs you might be able to flatten out the fields to a data frame. If so, a combination of a pydantic model mirroring the json structure, and some logic to load it in from columnar data (I think Pandera has some functionality that achieves this) could be a good approach
f
Hi Dan. If the JSON structure is constant and if it flattens well into a dataframe, then Matt's approach is interesting and would provide data validation when loading data in and out of Snowflake. Otherwise, your option 1 is a quick win. Assuming your JSON are serializable.
Storing the JSON as VARIANT could reduce storage consumption and improve query performance for large datasets, however VARCHAR would ensure that the original data and types are not modified
d
Thanks, @Matt Clarke and @Félix Tremblay 🙂 I think we're going to do the quick win for now, but will definitely look into the pydantic/pandera modeling options as things get more complicated.