Steven Murphy
06/09/2023, 12:28 PM@job
that has numerous @ops
within it. The job reads a BigQuery table, does some processing, then writes the results to another table.
At the moment I'm using the following IO manager.
from dagster_gcp_pandas import BigQueryPandasIOManager
It appears that this overwrites the results of the previous job every time. The desired outcome it to have the job append those results to the target table.
I realise I could probably write my own IO manager to get around this, but I'm wondering if I'm approaching this from the wrong mindset. I saw this question on StackOverflow, where someone had the same observation on a file system IO manager. Any thoughts on it?
https://stackoverflow.com/questions/76153838/how-not-to-overwrite-materialized-assets-in-dagsterBrendan Jackson
06/09/2023, 1:35 PMBrendan Jackson
06/09/2023, 1:37 PMSteven Murphy
06/09/2023, 1:43 PMSteven Murphy
06/09/2023, 1:45 PMBrendan Jackson
06/09/2023, 1:47 PMBrendan Jackson
06/09/2023, 1:48 PMSteven Murphy
06/09/2023, 1:52 PMSo the op, every day, is to ping 3rd-party-API with N requests, and persist the callback URLs?When I invoke the API, I give it a callback URL. So that the API knows where to send the actual result once it's finished doing it's thing. The only thing I expect back from the API immediately is a job ID and an accepted status
Does each request have a label of some sort (even an integer!) for the day?Each request can be identified by one of the parameters within it (we pass a URL param within the request JSON). When I get the acknowlegement back from that API, it gives a job ID relating to that specific request, which I'd like to log
Brendan Jackson
06/09/2023, 1:52 PMSteven Murphy
06/09/2023, 1:53 PMBrendan Jackson
06/09/2023, 1:54 PMSteven Murphy
06/09/2023, 1:56 PMBrendan Jackson
06/09/2023, 1:57 PMSteven Murphy
06/09/2023, 1:58 PMBrendan Jackson
06/09/2023, 2:00 PMsandy
06/09/2023, 3:45 PMSteven Murphy
06/09/2023, 3:53 PM