Hello! Is there a way to define a resource that wo...
# ask-community
j
Hello! Is there a way to define a resource that would be initialized dynamically and then statically as long as it has not been reloaded (i.e. as long as I have not clicked on the reload code button)? In the code below, at each execution the resource is recalculated, which is restricting because it prevents me from parallelizing because only one connection is allowed at a time to access the DB. I use these resources in almost all assets to do mapping and add metadata so I need a resource that I can use easily.
Copy code
class DatabaseResource(ConfigurableResource):
    db: dict

    def get_customers(self):
        return [customer for customer in self.db["name"]]


@asset
def get_customers(db_resource: DatabaseResource):
    customers = db_resource.get_customers()


def call_DB():
    # goal: Do not call this function at each execution
    return [
        {"name": "A", "app": "app_A"},
        {"name": "B", "app": "app_B"},
    ]


defs = Definitions(
    assets=[get_customers],
    resources={"db_resource": DatabaseResource(db=call_DB())},
    executor=in_process_executor,
)
t
Hi! There currently isn't a way, but we hear this often enough that we think about it and are aware of it. More granularly, you can use the in_process_executor to have your job maintain the same instance of a resource throughout the run, but it is currently restricted to having everything run serially.
n
Is it possible to utilize multiprocessing Manager or Value to share data across executors? That would be helpful for single DB connection scenarios
j
Thanks for the answer. I am aware that I could create a separate asset that would correspond to my resource where I could write the result of my query to a file on the S3 for example. Then I could call this file every time I need it (i.e. in almost every asset). However I don't find this solution very good in terms of optimization and understanding. Do you see a more appropriate method?