Hi team I am seeing that the user code repository is being i dagster #ask-community

Hi team, I am seeing that the user code repository...

Arun Kumar

06/19/2023, 9:33 PM

Hi team, I am seeing that the user code repository is being initialized in the beginning of the every job run from with in the run pod. Curious why we need this, as I was under the assumption that the run pod would only communicate with the user code GRPC server for any info about the job definitions. Is there any way to avoid this? We are doing some heavy lifting during the initialization phase to load the definitions from external sources that includes multiple API calls and doing it for every job run is causing a huge load on our API servers.

Arun Kumar

06/19/2023, 10:09 PM

For context, we have implemented our own RepositoryData class

Copy code

class CustomRepositoryData(RepositoryData):
    def __init__(self):
        self._jobs: Dict[str, JobDefinition] = {}
        self._schedules = []
        self._sensors = []

    def get_all_pipelines(self):
        ... call APIs and build jobs definitions

johann

06/20/2023, 8:25 PM

The run pod doesn’t reach out to the gRPC server, instead it loads the repository locally (it runs with the same user code image as the grpc server). One way that other customers who do a heavy init have handled this is to use the

DAGSTER_RUN_JOB_NAME

env var to only load the relevant job

johann

06/20/2023, 8:27 PM

It won’t be set when the code is invoked in the grpc server, and it will be set for runs

Arun Kumar

06/20/2023, 9:06 PM

I see. Thanks @johann Should I use the

DAGSTER_RUN_JOB_NAME

with in my

RepositoryData

implementation ? Not sure how

DAGSTER_RUN_JOB_NAME

would work with the custom repository implementation like shown above

johann

06/20/2023, 9:09 PM

You should be able to check for the env var in

get_all_pipelines

, and if it’s present you only need to build the one job

🙏 1

8 Views

Open in Slack

Previous Next