Hi! I'm curious if there are thoughts on how to ha...
# integration-bigquery
r
Hi! I'm curious if there are thoughts on how to handle schema changes? The way the IO manager is currently configured for non partitioned tables it truncates the table explicitly which maintains the schema. the bigquery
load_table_from_dataframe
by default doesn't allow column addition. The 2 config options that bigquery seems to expose are: • setting the job WriteDisposition to
WRITE_TRUNCATE
(this replaces the whole table and is nice because it also allows incompatible changes like changing a column type, but doesn't fit in as well with the DBIOManager abstraction and would only work for non partitioned tables) • passing
ALLOW_FIELD_ADDITION
as a schemaUpdateOption I'm not sure the best way to expose these options but one or both would very helpful (we currently have a handwritten bq io manager that we've been using which uses `WRITE_TRUNCATE`ours doesn't support partitions though so excited to migrate!)
j
Hey @Rob Martorano! I don’t have a best-practices suggestion for you, but just wanted to let you know that supporting adding columns to the DB IO managers is something I’m planning to work on shortly! Here’s the corresponding GH issue for you to track! https://github.com/dagster-io/dagster/issues/13098
r
gotcha, glad its on the roadmap! thank you!
d
@Rob Martorano, I've been dropping tables when I do schema migrations (I've also added this to the ticket, @jamie) Details: • I'm using Snowflake (not sure if that changes anything) • I have smaller tables than most (the largest one has 2-3 million records) so performance-wise this is fine for me, but as suggested on the bug it might not be fine for other • I think this does fit nicely with the notion that I can recreate my entire data graph from scratch (again, my graph may be smaller than most - like 20-30 nodes) • The biggest problem for me has been than I have a few Snowflake roles (some for user-facing apps) that only have access to a few specific tables (so I can't just grant select on all future tables in a schema), and I lose those permission grants when I drop the tables. ◦ My best idea so far is include a statement granting those permissions as part of any such assets.