We currently use database cursors for postgres as a resource dagster #ask-community

We currently use database cursors for postgres as ...

Daniel Mosesson

05/31/2022, 10:46 AM

We currently use database cursors for postgres as a resource, and it works, but it leads to use having to pass that around between functions, and there is a split between the functions that take

context

and can do the logging, etc, and the utility functions that don't take a context that can't do logging the same way (I suppose I could pass the context object around as well, but that is also not ideal.) 1. Is this a problem that writing an IO manager would help with? Most of what all of these functions are doing is helping store or retrieve data. a. Most cases would be get/update a table, but there are some cases where I need to execute a custom query. Would that be possible? 2. How difficult is it to create an IO manager that does this? I looked at the code for

fs_io_manager

(https://github.com/dagster-io/dagster/blob/8e8ee8537146aad35d3dd75e181f998fd989325f/python_modules/dagster/dagster/core/storage/fs_io_manager.py) and it looks straightforward, but is there something that makes my use case different?

🤖 1

owen

05/31/2022, 4:57 PM

hi @Daniel Mosesson! For the logging bit, you can always do

from dagster import get_dagster_logger

, and then

get_dagster_logger().info("some msg")

, so you should be able to avoid threading through that context parameter in a bunch of places. As for the IO manager question, it depends on exactly where those functions are being called. In general, the IO manager is used purely to store the return value of an op, then load it as input to a downstream op. If the functions that are using the cursor resource are at the end of your op, then an IO manager could be used in its place (it's also possible that some of the functions inside your op could be broken out into their own separate ops)

owen

05/31/2022, 4:59 PM

IOManagers probably aren't the right abstraction to run custom queries. IOManagers can depend on other resources (the

required_resource_keys

parameter to the

@io_manager

decorator), so it's possible that you could just write an IOManager that uses the existing resource (to handle all the get/update stuff), and then use the existing resource for the custom queries

owen

05/31/2022, 5:00 PM

and in general, writing an IOManager is fairly straightforward as long as you have a simple method for figuring out where (i.e. which table) a given output should be stored to / given input should be read from

Daniel Mosesson

05/31/2022, 6:12 PM

Thanks for the info about the logger. Does it support logging events, or just messages to console (for this, what I really want to save/send are `AssetMaterialization`s) I'm to give writing the IOManager at least for the output and input cases that make sense that just load/dump tables. For running custom queries, what is the right abstraction?

owen

05/31/2022, 6:19 PM

ah yeah unfortunately the logger only supports normal messages (not events)

owen

05/31/2022, 6:20 PM

for running custom queries that are not associated w/ inputs or outputs of an op, the right abstraction is a resource.

Daniel Mosesson

05/31/2022, 6:33 PM

makes sense, thanks for the help

20 Views

Open in Slack

Previous Next