I wrote a bunch of pipelines to migrate data from Mongo (App DB) to Postgres (temporary budget data warehouse until we upgrade to something like SnowFlake). Realised too late that what I was really doing was defining Airbyte Sources
https://docs.airbyte.com/connector-development/tutorials/building-a-python-source and Destinations which are already well defined and robust.
Our main Software Engineer is touchy about App DB query load. I'd imagine that Airbyte will be just as efficient as anything I've written if not more so? My mongo queries retrieve all the data in a collection since a timestamp from just before the previous pipeline execution if that makes sense. And then upsert on a key so a tiny bit of record overlap doesn't matter.
Basically my pipelines can be a little buggy, should I just migrate?
@Stephen Bailey
I do use copy_expert for my backfills which makes them speedy, not sure if Airbyte would do that. But I guess the fact that Airbyte takes everything and loads it into json means that I wouldn't need to run a backfill because there's no transformation logic to get wrong or change. Not posting on Airbyte slack because Airbyte devs will obviously tell me to use Airbyte.