https://dagster.io/ logo
#integration-snowflake
Title
# integration-snowflake
a

Andrea

02/27/2023, 12:45 AM
Hi! My understanding is that all the new cool stuff like partitions goes through the IOManager, which (again per my understanding) converts data back and forth between snowflake and pandas. For large data, you probably want to run the processing directly in snowflake instead. What's the best practice in that case? How to still be able to have partitioning etc.?
v

Vinnie

02/27/2023, 8:00 AM
If you wanna keep the processing solely within snowflake, you can write a custom IO Manager that doesn’t load anything into memory or return
None
. The asset could simply be a query that gets executed remotely (as is the case in dbt assets/models). Partitions are passed to the asset function through the
context
variable (check https://github.com/dagster-io/dagster/blob/master/python_modules/dagster/dagster/_core/execution/context/compute.py#L84) so you can refer to them in your queries.
a

Andrea

02/27/2023, 11:03 AM
So a good abstraction could be to have asset functions that return the query itself? And everything else is handled by the IOManager?
Copy code
@asset(io_manager_key="snowflake_pushdown")
def some_snowflake_table():
    query = '''SELECT * FROM some_table INNER JOIN some_other_table'''
    return query
with the IOManager worrying about materialising the query into a table in snowflake so the user does not have to do it. Potentially could also add some default metadata (or leave it configurable by the user)
v

Vinnie

02/27/2023, 11:04 AM
That would be one way to do it, yes
a

Andrea

02/27/2023, 11:08 AM
Thanks! we usually do use dbt for these kinds of pushdown ELT jobs but there are a lot of cases where having full control of the query can be useful!