The cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability.

dagster

Hello experts,
I have a question about the best practice when working with SQL databases.
Let's imagine I want to insert some data returned by some public API into SQLite.
I can think of two ways of doing this:
1. Define an op to run the sql insert or update queries
2. Define an io manager to do the same.
Is there a preferred way?

These are both valid options.  The advantage of the I/O manager option:
• Makes it easier to test your function without writing to your production database
• Less boilerplate if you have many ops that could make use of the same I/O manager
The advantage of the inside-your-op option:
• Less boilerplate if you only have one or two ops that could make use of the I/O manager

Thanks for your comment!
That makes sense. I am leaning toward the IO Manager mainly because I can mock the database for testing.
Is it possible to yield RunRequest from inside the IO Manager? My intent here is to run different processes for the newly added rows vs the rows that have updated value -- based on some key column.

&gt; That makes sense. I am leaning toward the IO Manager mainly because I can mock the database for testing.
That sounds sensible
&gt; Is it possible to yield RunRequest from inside the IO Manager? My intent here is to run different processes for the newly added rows vs the rows that have updated value -- based on some key column.
That's not currently possible. If you wanted to launch a run from the I/O manager, the best way would probably be with the GraphQL client: <https://docs.dagster.io/concepts/dagit/graphql-client#submitting-a-job-run>

Gotcha!
The GraphQL path seems a bit hacky to me. Maybe a better approach is to define a few more ops/assets and break down things further.