< daniel> Hi Daniel thanks for your <https github com dagste dagster #announcements

<@U016C4E5CP8> Hi Daniel, thanks for your <Github ...

Dani Michel

08/18/2020, 2:39 PM

@daniel Hi Daniel, thanks for your Github reply regarding gRPC. I just realized that I completely misunderstood the gRPC server for a pipeline code distribution mechanism. In my all-in-one docker example I didn't even realize that the pipeline runs on the gRPC instead of the worker 😑

Dani Michel

08/18/2020, 2:43 PM

I'm trying to get a setup where pipeline definitions (i.e. the code) can be shared between the servers, with a simple mechanism to update in test environments, but a versioned mechanism in production. We're forced to rely on separate workers because our workloads are heavyweight and require specialized hardware (GPUs and other). This is the main reason we're evaluating the Celery integration: With it, we can publish tasks to queues where only workers with the required hardware listen to.

daniel

08/18/2020, 2:44 PM

Hi! No problem - I was actually just looking to see if this user-contributed executor might be closer to what you want: https://github.com/dagster-io/dagster/blob/fd720475ee72c366f2990bb60c4b43205a8575fe/python_modules/libraries/dagster-celery-docker/dagster_celery_docker/executor.py#L72 That would potentially let celery workers execute individual steps, but the actual execution would still be on the gRPC servers, so probably not what you're looking for.

Dani Michel

08/18/2020, 2:47 PM

No that won't fit either. Due to the hardware requirements on the workers we're forced to run specific solids on specific workers. How do others typically share code between multiple workers running on different systems?

daniel

08/18/2020, 2:56 PM

I see. @alex or @cat are probably better positioned to answer that, I know we have partners using gRPC for code versioning but not sure how they share code across different machines.

alex

08/18/2020, 3:08 PM

How do others typically share code between multiple workers running on different systems?

Typically by deploying the same docker image to the different workers

alex

08/18/2020, 3:12 PM

Is there a reason that deploying the same image to the different workers won’t work for your production use case? I think we have a few different solutions to the “simple mechanisms for testing” constraint you have

Dani Michel

08/18/2020, 3:24 PM

Packing the workflows up into a Docker image is possible, but not suitable for the quick iterations in the test env. So if we can find a solution that is suitable for testing, Docker is fine for production. We'll probably end up with a smaller env for testing, but one that still has different worker machines with different capabilities. The use case I'm talking about will primarily involve pipeline development: A developer makes repeated changes to one or more solids and pipelines and needs to be able to run and verify the result on a test cluster.

alex

08/18/2020, 3:27 PM

would the volume mount work in the test env?

alex

08/18/2020, 3:30 PM

trying to think what other options would work given that you want to test in a cluster. The mode & resources abstractions are mostly helpful for being able to do local testing or even unit testing but unclear if they can be leveraged for your situation

Dani Michel

08/18/2020, 3:33 PM

That would mean shared network mounts. I have my doubt that this will work reliably with many machines/processes accessing the files and a developer trying to update them at the same time... But if you tell me that a lot of people are happy with such a solution, we'll give it a try :-)

Dani Michel

08/18/2020, 3:35 PM

I think our main challenge comes from the fact that we need those machines with specialized hardware, even for pipeline development (most part of it at least)

alex

08/18/2020, 3:35 PM

yeah it really depends on how the test cluster / environments are managed - but certainly plenty of ways it could go wrong

alex

08/18/2020, 3:35 PM

just to step back, how do you wish it worked?

alex

08/18/2020, 3:37 PM

for example would a

git

centric model work where a certain tag/commit/branch would be synced before beginning each execution unit work?

Dani Michel

08/18/2020, 3:42 PM

I thought about git, too, and this is why I was excited about the gRPC server as (git-backed) code server. Although I misunderstood that, as you can read above. Git would be a good solution, if there was a way to automatically reference a git commit when running pipelines. Workers would then be able to grab the pipeline sources from that commit, like this there would be no race conditions when rapidly pushing to the same branch. Something along the lines of: • Developer pushes a new commit containing a pipeline change • Dagit somehow automatically gets notified and reloads the pipeline • The user triggers a pipeline run • The pipeline run is started, passing along the git repo and commit • The workers fetch the commit from the repo and run the pipeline

alex

08/18/2020, 3:48 PM

alright thanks. While we don’t have this now, it was something we had in mind as we built several of the abstractions so something we are considering adding

Dani Michel

08/18/2020, 3:49 PM

That sounds good. I'll give the two approaches (shared drive vs git-based) more thought and discuss with the team. Thanks a lot for your inputs!

alex

08/18/2020, 3:49 PM

thank you for sharing details about your constraints 😄

Open in Slack

Previous Next