https://dagster.io/ logo
Title
c

Chris Nogradi

09/29/2022, 9:38 PM
I hate to ask a dumb question but I must since a search did not return answers: are there any SQL out of the box IO Managers available in Dagster? I see internal support for MySQL/PostGres and what appears to be a dagster-sqlalchemy module at some point in time but can't find an existing IO Manager for sql. Is the reason for this, that it is so trivial to write one? I just don't want to re-invent the wheel if there is one rolling somewhere.
z

Zach P

09/29/2022, 9:44 PM
It's relatively simple to write iomanagers with a small scope, however, I'd guess the feature for iomanagers is relatively new so there's not many out of the box ones out there. I'd suggest looking at some of the ones that are to get an idea of best practices. Perhaps some community contributions could improve this as well
For reference, I looked at some of the S3 pickle io manager when i was trying to write my own.
c

Chris Nogradi

09/29/2022, 10:12 PM
ok Thanks @Zach P, I wrote a MongoDB one a while back but for some reason assumed that SQL ones would be pretty common.
z

Zach P

09/29/2022, 10:41 PM
I think it is more about the different types of connections needed & db specifics. Eg: spark dataframes, pandas dataframes, dictionaries, numpy objs, etc. Combined with differences DB to DB it can be a bit hard to make one that's generalized. Looking at the snowflake one for example seems to be pretty good idea I'd you're looking to implement a more SQL style one.
s

Sean Lindo

09/29/2022, 10:49 PM
I’ve got some code I throw in as well if anyone wants to try or take a look. It works for my purposes.
Specifically Postgres
Boy that didn’t paste right
There’s nothing defined in handle_output, but it should be straightforward to write to a table
I think you’ll need to handle appends vs full replace and constraints etc. manually
c

Chris Nogradi

09/29/2022, 11:37 PM
Thanks @Sean Lindo
👍 1
c

claire

09/30/2022, 5:20 PM
Hi Chris. Yep, +1 to what Zach mentioned about lots of dialect differences among MySQL/Postgres/SQLite databases, so implementing a generalized IO manager could be tricky. Another suspicion I've personally had is that most users aren't reading and writing from SQL databases (mostly users are using the s3 / gcs IO managers). As the others mentioned, it's probably easiest to implement your own.
👍 1