Hi I wanted to know if it s possible to run pipeline on mult dagster #announcements

Hi, I wanted to know if it’s possible to run pipel...

NawafSheikh

03/23/2021, 9:20 AM

Hi, I wanted to know if it’s possible to run pipeline on multiple cpu cores with using simple daemon run, like without using Kubernetes, docker or celery etc.? If not, then what is the best method to deploy dagster which doesn’t have much resource extensive pipelines i.e. most of the pipelines are used to fetch, read, a small parsing and then write a report?

daniel

03/23/2021, 12:35 PM

Hi, you can use the multiprocess executor for this. See https://docs.dagster.io/_apidocs/execution#dagster.multiprocess_executor and https://docs.dagster.io/concepts/executors

NawafSheikh

03/24/2021, 9:04 AM

Ok thats great. And what would be the best method to deploy such systems. Should I go for kubernetes?

daniel

03/24/2021, 1:22 PM

There’s no single best answer here, it really depends on your use case. But we do have good support for kubernetes - https://docs.dagster.io/deployment#hands-on-guides-to-deploying-dagster-on-kubernetes is an overview, that same page has guides to deploying in AWS or GCP as well.

NawafSheikh

03/24/2021, 1:39 PM

Yeah that I know. Ok lets see Other then that, is it possible to limit the run concurrency based on tags in such a way that each value of tag has that concurrency. Like say I've set a tag's key limit to 2. Then for each value of that tag the limit would remain 2. And 2 pipeline would run for each value. Is it possible?

daniel

03/24/2021, 1:51 PM

That's not currently possible - could you say more about what the goal would be there? Typically these rules are used to prevent there from being too many runs at once (to prevent the system from using too many resources at once). It doesn't seem like a rule like that would help with resource limits since there could still be an unbounded number of runs happening at once, so I'm curious what problem it would be solving instead.

NawafSheikh

03/24/2021, 3:44 PM

Actually in our use case the scheduler is queuing multiple pipeline for multiple users. Now we want to limit that for each user from a queue single pipeline should run at a time. And then there would be another tag on all of those pipeline which would be limiting those concurent runs of the pipeline. So we can say at any instance, lets say 5 concurent pipelines would be running and each pipeline is of different user. The reason we want to run single pipeline per user is because of throttling issue of the api as the burst rate for that api is 1 sec. So for a user if multiple pipelines are running most of the pipeline's solid would return a retry request.

daniel

03/24/2021, 3:49 PM

When you say "The reason we want to run single pipeline per user is because of throttling issue of the api as the burst rate for that api is 1 sec." - I see, so the API you're using is throttled per user? Like it won't complain if lots of people are querying at the same time, as long as they are different users?

NawafSheikh

03/24/2021, 3:49 PM

Yes

daniel

03/24/2021, 3:51 PM

Got it, thanks for the explanation. The best answer I have for you right now is to use a global limit on the total number of runs as an approximation for this. but I see how that is not a perfect solution.

NawafSheikh

03/24/2021, 3:52 PM

Ok sure. Thanks.

Open in Slack

Previous Next