<@U016C4E5CP8> Hi Daniel, thanks for your <Github ...
# announcements
d
@daniel Hi Daniel, thanks for your Github reply regarding gRPC. I just realized that I completely misunderstood the gRPC server for a pipeline code distribution mechanism. In my all-in-one docker example I didn't even realize that the pipeline runs on the gRPC instead of the worker šŸ˜‘
I'm trying to get a setup where pipeline definitions (i.e. the code) can be shared between the servers, with a simple mechanism to update in test environments, but a versioned mechanism in production. We're forced to rely on separate workers because our workloads are heavyweight and require specialized hardware (GPUs and other). This is the main reason we're evaluating the Celery integration: With it, we can publish tasks to queues where only workers with the required hardware listen to.
d
Hi! No problem - I was actually just looking to see if this user-contributed executor might be closer to what you want: https://github.com/dagster-io/dagster/blob/fd720475ee72c366f2990bb60c4b43205a8575fe/python_modules/libraries/dagster-celery-docker/dagster_celery_docker/executor.py#L72 That would potentially let celery workers execute individual steps, but the actual execution would still be on the gRPC servers, so probably not what you're looking for.
d
No that won't fit either. Due to the hardware requirements on the workers we're forced to run specific solids on specific workers. How do others typically share code between multiple workers running on different systems?
d
I see. @alex or @cat are probably better positioned to answer that, I know we have partners using gRPC for code versioning but not sure how they share code across different machines.
a
How do others typically share code between multiple workers running on different systems?
Typically by deploying the same docker image to the different workers
Is there a reason that deploying the same image to the different workers wonā€™t work for your production use case? I think we have a few different solutions to the ā€œsimple mechanisms for testingā€ constraint you have
d
Packing the workflows up into a Docker image is possible, but not suitable for the quick iterations in the test env. So if we can find a solution that is suitable for testing, Docker is fine for production. We'll probably end up with a smaller env for testing, but one that still has different worker machines with different capabilities. The use case I'm talking about will primarily involve pipeline development: A developer makes repeated changes to one or more solids and pipelines and needs to be able to run and verify the result on a test cluster.
a
would the volume mount work in the test env?
trying to think what other options would work given that you want to test in a cluster. The mode & resources abstractions are mostly helpful for being able to do local testing or even unit testing but unclear if they can be leveraged for your situation
d
That would mean shared network mounts. I have my doubt that this will work reliably with many machines/processes accessing the files and a developer trying to update them at the same time... But if you tell me that a lot of people are happy with such a solution, we'll give it a try :-)
I think our main challenge comes from the fact that we need those machines with specialized hardware, even for pipeline development (most part of it at least)
a
yeah it really depends on how the test cluster / environments are managed - but certainly plenty of ways it could go wrong
just to step back, how do you wish it worked?
for example would a
git
centric model work where a certain tag/commit/branch would be synced before beginning each execution unit work?
d
I thought about git, too, and this is why I was excited about the gRPC server as (git-backed) code server. Although I misunderstood that, as you can read above. Git would be a good solution, if there was a way to automatically reference a git commit when running pipelines. Workers would then be able to grab the pipeline sources from that commit, like this there would be no race conditions when rapidly pushing to the same branch. Something along the lines of: ā€¢ Developer pushes a new commit containing a pipeline change ā€¢ Dagit somehow automatically gets notified and reloads the pipeline ā€¢ The user triggers a pipeline run ā€¢ The pipeline run is started, passing along the git repo and commit ā€¢ The workers fetch the commit from the repo and run the pipeline
a
alright thanks. While we donā€™t have this now, it was something we had in mind as we built several of the abstractions so something we are considering adding
d
That sounds good. I'll give the two approaches (shared drive vs git-based) more thought and discuss with the team. Thanks a lot for your inputs!
a
thank you for sharing details about your constraints šŸ˜„