Hello, I have a use case where I need to read from...
# ask-community
n
Hello, I have a use case where I need to read from a BigQuery table generated using dbt and write it to a PostgreSQL database. I wrote a simple IOManager that can write to PostgreSQL but I don't really know the right approach to load the data from BigQuery in Dagster. From what I've seen, one way to achieve this would be : • Set the
io_manager_key
on my dbt assets to a
BigQueryPandasIOManager
to tell Dagster that any Asset created using dbt should be handled using a BQ IO Manager • Create an Asset with a custom IO Manager for PostgreSQL that simply takes the dbt model name in arg and returns it, which will then use the
BigQueryPandasIOManager
to read the input data and the custom PostgreSQL one to write it. Is this the right approach ? If so, I believe that Dagster will load the whole table in-memory, is there any way to mitigate this ? Thank's in advance for your inputs !
o
Hi @Nicolas Guary! You're correct that using
BigQueryPandasIOManager
would result in the entire table being loaded in memory, but I believe working around this is a more general problem rather than dagster-specific. In most cases, for cross-db replication, you'll need some process to read all of your data from the source, and write all of the data to the destination. Depending on the scale of your data, this could be done as you describe, but once the size of the table no longer fits in the memory of a single process, you'll need a more elaborate process. There are special purpose tools for this usecase, such as fivetran, airbyte, or stitch, or you could build your own scheme for handling this. Another option (depending on how your tables are structured) might be to use partitions to limit the amount of data handled at any given time.
n
Thank's a lot Owen for the input ! I'll try to solve the current use case using
BigQueryPandasIOManager
+ a custom IO Manager for PostgreSQL (by the way, is there any plans on releasing an official IOManager for PostgreSQL ?) and I'll see if the problem scales a lot how I can use an external tool for this. I'll look into Partitions too, this could be useful !
o
No problem! There are no immediate plans on the PostgreSQL IOManager front (we haven't seen a lot of demand for this)
🙏 1
n
Yes I can understand this, writing data to an external PostreSQL Database is a common use-case and the ability to do it directly from Dagster using built-in functionalities would be awesome !