Hi Dagster Team wave skin tone 2 I had a few questions regar dagster #dagster-plus

Hi Dagster Team! :wave::skin-tone-2: I had a few ...

Soroush

06/12/2023, 1:35 PM

Hi Dagster Team! 👋🏻 I had a few questions regarding Serverless Cloud deployments. 1. Which cloud provider and region is the instance hosted in? Would we be able to choose? 2. Looking at the Limitations mentioned, "4500 step-minutes per day" is considering the compute time on the 4 vCPUs? (~19 step-hours per vCPU?) 3. Looking at the 2nd pricing FAQ, what is the unit for "up to 10K", "10K to 100K", "100K+"? Thanks for your input!

🤖 1

daniel

06/12/2023, 1:38 PM

Hi Soroush - happy to answer these. 1. These are currently hosted in AWS in us-west-2. We'd like to add multi-region support in the future, but it's not currently available 2. vCPUs are not currently taken into consideration when computing step minutes. It's just the total amount of time the step takes. 3. The units there are step-minutes.

Soroush

06/12/2023, 1:43 PM

Thanks for the quick clarification @daniel With the step-minutes, if I understood correctly, there should be maximum 1440 (24*60) step-minutes per day. Correct? Or is the definition different? I'm wondering how 4500 or 100K step minutes occur per day 🤔

daniel

06/12/2023, 1:45 PM

That’s the number of minutes in a day, but you can have multiple jobs happening simultaneously, or multiple steps happening at the same time within a single run

Soroush

06/12/2023, 1:48 PM

Ah yes, I overlooked that! So, with a 4500 step-minute limitation, why are step-minutes greater that than mentioned in the pricing? (e.g. 10K step-minutes)

daniel

06/12/2023, 1:50 PM

That pricing also includes hybrid deployment that doesn’t have a limit, and you can also request a quota increase for a serverless deployment

Soroush

06/12/2023, 1:51 PM

👍🏻 Thanks Daniel for your help!

condagster 1

John Smith

06/16/2023, 10:02 AM

apologies for hijacking this thread but wanted to clarify #2: I believe a step could take varying amount of time depending on how much resource is available. a) are we guaranteed at least 1vcpu or could that be sharing a single vcpu with other steps ? b) if our function runs threads in the Rust or C++ layer, would it be able to utilize more than 1 vcpu? and if so is it limited by a ECS / EKS pod limit or EC2 limit? c) what is the max memory the process has? d) are there any SLAs on how quickly you can scale up / out steps if I require burst capacity to run 1000 steps (e.g. partitions) simultaneously? e) is there a limit to how many steps I can run concurrently?

daniel

06/16/2023, 1:44 PM

Each run happens in its own isolated ECS fargate task with these limitations https://docs.dagster.io/dagster-cloud/deployment/serverless#limitations and each step happens in a subprocess within that task. Based on that information i believe the answers to your questions are: a) steps can share CPUs with other steps within the same run since they are subprocesses within the same ECS task b) and c) CPU and memory limits are in the link I shared d) In serverless it all has to fit within that ECS task. In hybrid kubernetes you have more options for e.g. running each step in its own kubernetes pod e) no limit on the number of steps specifically, the limit is on the overall memory/CPU usage

daniel

06/16/2023, 1:55 PM

There's also a limit of 50 concurrent in-progress runs in serverless deployments

John Smith

06/16/2023, 2:29 PM

thanks, to clarify: a) if I had 100 partitions, only 50 of them would run at a time correct? but each of them would have a 4vcpu 16GB ram process to work with? b) apologies I can't find the definition of a step. it doesn't appear in the jobs or ops sections within concepts (and the search for "step" produces too much noise)

daniel

06/16/2023, 2:34 PM

a) I think this is actually an option you can select in the UI when doing a backfill (see "Single run" vs. "Multiple runs" here: https://docs.dagster.io/concepts/partitions-schedules-sensors/backfills#launching-backfills) b) A step is synonymous with an op for the purpose of these questions (technically the op is the thing you write in code, the step is what is executed)

❤️ 1

daniel

06/16/2023, 2:35 PM

but by default, each partition is its own run, yeah

John Smith

06/16/2023, 3:45 PM

thank you ❤️ sorry for belabouring the point, so theoretically in Dagster Serverless, I could spawn 50 runs, each with 4 vcpus to give me 200 vcpus to run my workload simultaneously if I structured my job correctly.

daniel

06/16/2023, 3:47 PM

That's correct, yeah. And the 50 run limit quota is potentially liftable, just a conversation with support to bump that quota

❤️ 1

John Smith

06/16/2023, 3:51 PM

so to reword my original question in (d) above: how quickly will my 50 runs receive an ECS task each to run in when it's triggered by a sensor? presumably there is some delay for the ECS autoscaler to provision 50 pods with my image.

daniel

06/16/2023, 3:53 PM

ECS can take some time to provision a new task, yeah. I think the average is about 30 seonds to a minute but i've seen it take a few minutes in the worst case. If run start latency is a concern then you could also consider a hybrid deployment running in kubernetes which generally has lower task startup times

John Smith

06/16/2023, 3:59 PM

could you share your ECS auto scaling settings so I have an idea of roughly what I'm working with? I'd very much prefer to use the Serverless offering if possible, as we're a lean team. under a minute average case and under 5 mins worst case is acceptable.

daniel

06/16/2023, 3:59 PM

ECS Fargate doesn't actually have auto-scaling settings per se

🫣 1

John Smith

06/16/2023, 4:01 PM

🙃 oops. let me tap my contacts at AWS who can tell me more about fargate then thank you. I take it that means you're just using their "defaults"

daniel

06/16/2023, 4:03 PM

I think everybody who is using fargate is using their defaults - they don't really expose configuration options there. The main thing that I've seen affect fargate startup time is the size of the Docker image being used.

11 Views

Open in Slack

Previous Next