https://dagster.io/ logo
Title
a

Ashkan Al e Ali

11/22/2022, 10:13 PM
Hello experts, I have a question about the best practice when working with SQL databases. Let's imagine I want to insert some data returned by some public API into SQLite. I can think of two ways of doing this: 1. Define an op to run the sql insert or update queries 2. Define an io manager to do the same. Is there a preferred way?
:dagster-bot-resolve: 1
s

sandy

11/22/2022, 10:22 PM
These are both valid options. The advantage of the I/O manager option: • Makes it easier to test your function without writing to your production database • Less boilerplate if you have many ops that could make use of the same I/O manager The advantage of the inside-your-op option: • Less boilerplate if you only have one or two ops that could make use of the I/O manager
❤️ 1
a

Ashkan Al e Ali

11/22/2022, 10:30 PM
Thanks for your comment! That makes sense. I am leaning toward the IO Manager mainly because I can mock the database for testing. Is it possible to yield RunRequest from inside the IO Manager? My intent here is to run different processes for the newly added rows vs the rows that have updated value -- based on some key column.
s

sandy

11/22/2022, 10:35 PM
That makes sense. I am leaning toward the IO Manager mainly because I can mock the database for testing.
That sounds sensible
Is it possible to yield RunRequest from inside the IO Manager? My intent here is to run different processes for the newly added rows vs the rows that have updated value -- based on some key column.
That's not currently possible. If you wanted to launch a run from the I/O manager, the best way would probably be with the GraphQL client: https://docs.dagster.io/concepts/dagit/graphql-client#submitting-a-job-run
❤️ 1
a

Ashkan Al e Ali

11/22/2022, 10:38 PM
Gotcha! The GraphQL path seems a bit hacky to me. Maybe a better approach is to define a few more ops/assets and break down things further.