Hi guys, Is it correct that the gRPC server is onl...
# ask-community
o
Hi guys, Is it correct that the gRPC server is only responsible for giving Dagit UI information about the code location? I.e. this code is not serialized and then used by the daemon / k8srunlauncher? Am I correct to understand that the Daemon always fetches the latest image instead of taking what has been serialized from the gRPC server? In our usecase we rsync all dagster assets from a bucket to a dagster repository but if the k8sRunLauncher pulls a new image then that image wont have our dagster assets 😞 Thanks!
d
Hey Oscar - this is correct, Dagster does not serialize code.
for what you're describing to work you may need that to happen in the run pod as well
s
@daniel I think it would be good/nice to mention this (more) explicitly in the docs. The separate GRPC server is mentioned in the docs for more complex/advanced setups, our understanding/assumption based on the docs was that the GRPC server would be used by the UI and by the runs (in our case via k8sRunLauncher), but apparently the GRPC server is only used for the UI/dagit :(
Not really sure there's much benefit to the separate GRPC service then, at least in our case, might as well just let dagit handle it
d
it's not quite true that it's only used by dagit - it serves metadata about the job that's used during the run, but it still needs to be able to load the asset code during the run. but point taken about the docs
s
Ok, so the relationship is a bit more complex
d
yeah - they also run code for schedules and sensors
s
Ok, thanks for the clarification 👍 FYI just did some digging through the docs and couldn't really find an explicit mention of runs requiring access to the Python modules/files.
d
there's also the DefaultRunLauncher that launches the run as a subprocess on the server, but there are some definite tradeoffs to using that vs. the K8sRunLAuncher
Would it be possible to also do the rsync as part of each run launch somehow? or would that be too expensive
s
I expect that would be too expensive/take too long, even if it were quick we probably wouldn't want it to happen given we're running about 1000 Runs a day. With Airflow in the past we had a fixed amount of workers, they all synced the DAGs. Since I believe Airflow 2.0 the DAGs are serialized, so this was no longer necessary, which was a nice simplification. We'll have to look into how to handle this with Dagster, also because we're running on GKE and normal GCE volumes don't support readwritemany for PVCs :(
d
we've talked about a deployment setup before where there are a fleet of servers that take requests to launch runs using a variant of the DefaultRunLauncher - sounds similar
and actually sounds similar to what @Tomas Gatial is experimenting with a few posts up (but isn't currently supported out of the box) https://dagster.slack.com/archives/C01U954MEER/p1673352948142039
s
Thanks, we'll have a look at that. We'll need to reflect a bit on our usage patterns/use-cases in general and consider what is and isn't possible out of the box with Dagster, also considering things like this https://github.com/dagster-io/dagster/discussions/10772#discussioncomment-4644048 In any case, thanks for the help! We'll try to write up our use-case(s) and share them in the relevant issues or discussions
condagster 1