Jan Hajny04/14/2023, 3:00 PM
. It also receives a dataframe from an upstream asset. What I want to do is to persist the incoming dataframe so that only new data are added and existing rows updated based on a
index. Maybe I'm missing something elementary but I can't find out how to get access to the existing data in the database. The easiest thing would be, of course, to rely on the index and let BigQuery figure out which rows should be added which ones should be updated and which ones should be left untouched. But is something like that even possible? Thank in advance for any help on this.
Tim Castillo04/14/2023, 3:11 PM
, you can use the
resource to run a SQL query to do the upsert for you on BQ. You can define the dependencies using the
parameter on an asset definition. This assumes that you don't want to be loading (what I assume) is a huge table into memory.
Jan Hajny04/14/2023, 3:19 PM
annotation) and then persist the result into my BigQuery table. The problem is that some of the new data entries may already exist in the database. Now that I think about it, I probably want to ignore the ones that are already saved (i.e. not update). It seems I can't run a query using the
resource in this case, can I?
Tim Castillo04/14/2023, 3:22 PM
Jan Hajny04/14/2023, 3:36 PM