Hi again, I’m still working my way through the doc...
# announcements
p
Hi again, I’m still working my way through the documentation and code base. Let’s say I have some data in Postgres (in RDS) and want to export this to S3, to ingest via Redshift. Is this an appropriate use case? Would I use the dagster libraries for this, or should I be coding the export, copy, import steps from scratch?
p
dagster-postgres
has postgres-backed implementations of dagster’s internal run storage and event storage, but might also have some utility code for setting up a connection to the postgres db if you need that. It does not have any solids that encapsulates reading from the db however (probably the output type would be data-dependent).
dagster-aws
does define an
s3
resource that would be useful for uploading to an S3 bucket… It is a thin wrapper around the
boto
package.
You might want to look at the
airline_demo
here: https://docs.dagster.io/latest/learn/demos/airline_demo
It explicitly has connections to a postgres and Redshift database as a resource, and also uses s3 as a storage mechanism for some raw data files.
p
gotcha. would dagster still be the appropriate solution for the extract from postgres to s3 then, or should that be offloaded to different tooling? just trying to wrap my head around the ecosystem
i think in the airline example, it presumes the files are already in S3, not sure if thats the recommended starting point
m
if you have the option of modeling it as a solid within dagster, i would do that
p
cool, i’ll give it a shot! thank you.
👍 2
m
the airline demo starts with downloading from an internet source and uploading copies to s3
t
I'm currently using a bash solid to run a bash script that runs https://github.com/rongfengliang/tap-minio-csv and https://pypi.org/project/target-parquet/
I then load BigQuery with the Parquet file and then save a copy of the Parquet file in in GCS.