Jan Hajny
04/14/2023, 3:00 PMbigquery_pandas_io_manager
. It also receives a dataframe from an upstream asset. What I want to do is to persist the incoming dataframe so that only new data are added and existing rows updated based on a datetime
index. Maybe I'm missing something elementary but I can't find out how to get access to the existing data in the database. The easiest thing would be, of course, to rely on the index and let BigQuery figure out which rows should be added which ones should be updated and which ones should be left untouched. But is something like that even possible? Thank in advance for any help on this.Tim Castillo
04/14/2023, 3:11 PMbigquery_pandas_io_manager
, you can use the dagster-gcp
resource to run a SQL query to do the upsert for you on BQ. You can define the dependencies using the non_argument_deps
parameter on an asset definition.
This assumes that you don't want to be loading (what I assume) is a huge table into memory.Jan Hajny
04/14/2023, 3:19 PMdagster_type
annotation) and then persist the result into my BigQuery table. The problem is that some of the new data entries may already exist in the database. Now that I think about it, I probably want to ignore the ones that are already saved (i.e. not update). It seems I can't run a query using the dagster-gcp
resource in this case, can I?Tim Castillo
04/14/2023, 3:22 PMJan Hajny
04/14/2023, 3:36 PM