Hello Is there a way to define a resource that would be init dagster #ask-community

Hello! Is there a way to define a resource that wo...

Jordan

04/21/2023, 7:24 PM

Hello! Is there a way to define a resource that would be initialized dynamically and then statically as long as it has not been reloaded (i.e. as long as I have not clicked on the reload code button)? In the code below, at each execution the resource is recalculated, which is restricting because it prevents me from parallelizing because only one connection is allowed at a time to access the DB. I use these resources in almost all assets to do mapping and add metadata so I need a resource that I can use easily.

Copy code

class DatabaseResource(ConfigurableResource):
    db: dict

    def get_customers(self):
        return [customer for customer in self.db["name"]]


@asset
def get_customers(db_resource: DatabaseResource):
    customers = db_resource.get_customers()


def call_DB():
    # goal: Do not call this function at each execution
    return [
        {"name": "A", "app": "app_A"},
        {"name": "B", "app": "app_B"},
    ]


defs = Definitions(
    assets=[get_customers],
    resources={"db_resource": DatabaseResource(db=call_DB())},
    executor=in_process_executor,
)

Tim Castillo

04/21/2023, 11:07 PM

Hi! There currently isn't a way, but we hear this often enough that we think about it and are aware of it. More granularly, you can use the in_process_executor to have your job maintain the same instance of a resource throughout the run, but it is currently restricted to having everything run serially.

Nikolaj Galak

04/24/2023, 7:47 AM

Is it possible to utilize multiprocessing Manager or Value to share data across executors? That would be helpful for single DB connection scenarios

Jordan

04/24/2023, 12:36 PM

Thanks for the answer. I am aware that I could create a separate asset that would correspond to my resource where I could write the result of my query to a file on the S3 for example. Then I could call this file every time I need it (i.e. in almost every asset). However I don't find this solution very good in terms of optimization and understanding. Do you see a more appropriate method?

44 Views

Open in Slack

Previous Next