https://dagster.io/ logo
#ask-community
Title
# ask-community
m

MasaN

06/17/2022, 1:31 PM
Hello. I have 4 ops: the first gets xml data from a url, the second gets some data from a DB (these two run in parallel as they are independent). The third one does something with data from the both previous two, and then the fourth one saves this modified data to that same DB mentioned in the second op. I defined this DB as a Dagster Resource, and provided it to
resource_defs
in my
<http://graph.to|graph.to>_job
job definition to make it available to all ops. I thought this way I will be able to reuse the same instance of my resource across all ops; instead, it seems like the resource is built for each op separately (and then also torn down). I assume it has something to do with the
multiprocess_executor
. But I don't want to use the
in_process_executor
(which seems to indeed create the resource only once and reuse it, but then ops can't run in parallel). Is there another way to reuse the same instance of a resource in my case? FYI: my resource initializes a connection to DB and has methods to execute sqls and close the connection (perhaps my definition is not optimal yet). I would like to use the same instance of my resource (that is, the same connection) in both, second and last job.
dagster bot responded by community 1
z

Zach

06/17/2022, 6:06 PM
I'm not sure there's a very well supported way for essentially global / singleton resources at the moment aside from using the
in_process_executor
. I think it's a limitation of using python multiprocessing - how would you share the singleton between the processes? More generally one pattern for creating globalish/singletonish resource-type objects is to construct it in an upstream 'setup' op and pass it to any downstream ops that are dependent on the resource as an input. That being said it's probably unlikely that your db connection is serializable...
m

MasaN

06/20/2022, 7:19 AM
Thank you for your explanation. I had the same idea about passing the connection down between ops. Actually opening and closing the connection per-op doesn't really bother me though, I used it as an example for another use case. Perhaps I will "pay the price" and use the
in_process_executor
for now after all.
16 Views