https://dagster.io/ logo
#ask-community
Title
# ask-community
a

Arun Kumar

06/19/2023, 9:33 PM
Hi team, I am seeing that the user code repository is being initialized in the beginning of the every job run from with in the run pod. Curious why we need this, as I was under the assumption that the run pod would only communicate with the user code GRPC server for any info about the job definitions. Is there any way to avoid this? We are doing some heavy lifting during the initialization phase to load the definitions from external sources that includes multiple API calls and doing it for every job run is causing a huge load on our API servers.
For context, we have implemented our own RepositoryData class
Copy code
class CustomRepositoryData(RepositoryData):
    def __init__(self):
        self._jobs: Dict[str, JobDefinition] = {}
        self._schedules = []
        self._sensors = []

    def get_all_pipelines(self):
        ... call APIs and build jobs definitions
j

johann

06/20/2023, 8:25 PM
The run pod doesn’t reach out to the gRPC server, instead it loads the repository locally (it runs with the same user code image as the grpc server). One way that other customers who do a heavy init have handled this is to use the
DAGSTER_RUN_JOB_NAME
env var to only load the relevant job
It won’t be set when the code is invoked in the grpc server, and it will be set for runs
a

Arun Kumar

06/20/2023, 9:06 PM
I see. Thanks @johann Should I use the
DAGSTER_RUN_JOB_NAME
with in my
RepositoryData
implementation ? Not sure how
DAGSTER_RUN_JOB_NAME
would work with the custom repository implementation like shown above
j

johann

06/20/2023, 9:09 PM
You should be able to check for the env var in
get_all_pipelines
, and if it’s present you only need to build the one job
🙏 1
8 Views