Saurabh
01/19/2023, 2:42 PMrex
01/19/2023, 3:02 PMYeachan Park
01/19/2023, 3:22 PMThis is to have isolations in run execution: if the jobs launched by the K8sRunLauncher relied on the gRPC server, and the gRPC server went down, we don’t want all the jobs to stop working while they are still in progress.Doesn't it just need to sync the code between the repository and the job pod once at the beginning? If I understand correctly, information about OP status/progress are just directly written to the database anyway, so I'm not sure why the job would fail halfway. Even if it can't sync at the beginning, I expect this to be less of an issue since I assume most people will be running high available repositories, seeing as no jobs can be scheduled if the repository is down - and I expect the repository will be getting re-deployed frequently since the image of the repository will be used to launch a job from the K8RunLauncher?
rex
01/19/2023, 3:28 PMYeachan Park
01/19/2023, 3:42 PM> all the dependencies that are required to run the code will be needed. This is why the K8sRunLauncher pulls the image and runs the code from there.Ah OK, so the main reason the job pod isn't connected to the repository is that there would have been too much load on it to sync this? I guess that would be more relevant if users are running actual work within the job pod itself, using for example the
in_process_executor
? For our use case, we're using something similar to the k8s_job_executor
(we're just using pods without the jobs abstraction). All the other dependencies for the job pod would be relatively static to just be able to start pods, etc and could be baked into the job pod image, since the actual dependencies needed to run the business logic is baked into the image that gets started for an OP by the job pod.
So is the reason it's set up this way because the majority of users running their business logic inside the scheduler?rex
01/19/2023, 3:54 PMSo is the reason it’s set up this way because the majority of users running their business logic inside the scheduler?not sure what you mean here - our scheduler is a daemon process. the daemon doesn’t execute any work, but submits runs to launch. In the case of the default run launcher, this run is executed within the gRPC server itself; in the case of the K8sRunLauncher, as we’re talking about, the run is executed in a job pod (with the same image running on the gRPC server)
Yeachan Park
01/19/2023, 4:07 PMnot sure what you mean here - our scheduler is a daemon processSorry for being unclear, I was just asking whether the majority of users run their business logic within dagster (not the actual scheduler daemon): so in the context of the K8RunLauncher running their actual business logic inside the job pod that gets spun up, as opposed to us where we calculate the business logic outside of Dagster in a different container - i.e. dagster is only responsible for scheduling.
I’m not even sure if syncing was an option up for contention when we integrated with gRPC (but another core team member could chime in here)Yeah, I would be interested in understanding if possible