Does hybrid deployment on my own infra allow for more than o dagster #dagster-plus

Join Slack

Does hybrid deployment on my own infra allow for m...

# dagster-plus

Josh Alexy

01/31/2023, 8:15 PM

Does hybrid deployment on my own infra allow for more than one simultaneous isolated run unlike serverless?

Josh Alexy

01/31/2023, 8:19 PM

Also, does hybrid on ECS eliminate the 3 minute boot while provisioning resources?

daniel

01/31/2023, 8:33 PM

Am I correct in thinking that by "one simultaneous isolated run" you meant "one simultaneous non-isolated run"? (Confusing naming)

Josh Alexy

01/31/2023, 8:35 PM

Hmm, maybe I’m confused — isn’t the rub with serverless that you can either have: 1. Fast startup with non-isolated run, but you get limited CPU and RAM because you share VM 2. Slow startup with isolated run, but a lot of RAM//cpu because it’s it’s own machine

Josh Alexy

01/31/2023, 8:35 PM

-and- that you can only have one isolated run at a time

daniel

01/31/2023, 8:36 PM

Everything there is correct except for the last bit - there isn't a limit on the number of isolated runs you can have at once

daniel

01/31/2023, 8:36 PM

it's non-isolated runs that have that limit (mostly because of the risk of making the task running them run out of memory)

Josh Alexy

01/31/2023, 8:37 PM

I see, so I can set the parameters for the isolated run and theoretically have multiple runs going in parallel?

daniel

01/31/2023, 8:37 PM

That's right

Josh Alexy

01/31/2023, 8:38 PM

Ok, cool, that’s good news. I do have concerns about the 3 minute boot — but I’ve found that most of the time it’s a lot less than that. I’m in a weird position where the dagster model fits my use case perfectly, but some of the edges are causing problems — trying to fit into that

daniel

01/31/2023, 8:38 PM

To your second question - that bootup time is mostly from ECS fargate, so it would affect hybrid ECS as well assuming you're using fargate

Josh Alexy

01/31/2023, 8:38 PM

ok, got it

daniel

01/31/2023, 8:38 PM

you can also set up ECS on EC2 - that requires more operational work though

Josh Alexy

01/31/2023, 8:39 PM

Is this a similar concept to cold starts with serverless functions? Like is there a period of time where that boot might be lowered if I -just-had a run because the resources are provisioned?

daniel

01/31/2023, 8:40 PM

I don't think that makes a huge difference in Fargate unfortunately

daniel

01/31/2023, 8:41 PM

There is an angry mob complaining about this problem to AWS here https://github.com/aws/containers-roadmap/issues/696

daniel

01/31/2023, 8:42 PM

We'd definitely like to make this better in dagster serverless though - not trying to just pass the blame to AWS

daniel

01/31/2023, 8:42 PM

we have some ideas around having some run tasks ready to go when you press launch

Josh Alexy

01/31/2023, 8:43 PM

Got it — is there a good knowledge base for ec2 backed deployments? I’m in a weird spot where I’m not looking for kafka-esque real time SLAs, but am aiming for a job to run and return a result in 1-3 minutes on a semi-regular basis — maybe slightly trying to fit a square peg into a round hole but the tooling makes sense outside of that

daniel

01/31/2023, 8:44 PM

Would hybrid kubernetes/EKS ever be on the table? I think that's a lot less burden to set up than ECS+EC2

Josh Alexy

01/31/2023, 8:45 PM

I’ve generally avoided it because I’m not super familiar with kubernetes, but if it makes this particular case work then I’d definitely be open to it

Dagster Jarred

01/31/2023, 8:46 PM

hey josh, can you share a little bit more about the impact of the latency on jobs, are you referring to jobs that would be run on an adhoc basis where you’re waiting for the results to come back?

Josh Alexy

01/31/2023, 8:48 PM

general process is something like this: 1. User uploads document of some variety (think pdf for this case) 2. Uploads to s3 3. Lamda trigger fires a dagster job to pull the pdf from s3 and perform a bunch of text extraction, loading it into different dbs 4. Write back to main application when the job is done to let the user know their document is ready

Josh Alexy

01/31/2023, 8:49 PM

this would be multiple users in this case possibly running these jobs concurrently — ie 5 people upload a pdf at the same time

Josh Alexy

01/31/2023, 8:50 PM

Generally all of the extraction stuff combined is taking like 10-90 seconds, but then I’m adding an additional boot up time and it makes it a bit unpredictable

Dagster Jarred

01/31/2023, 9:40 PM

i see, so this is to make Dagster work as a piece of infrastructure serving end users who are not your teammates, is that right? presumably the benefit here to using Dagster is that you get better visibility into the pipeline than writing a separate service for it, is that also right?

Josh Alexy

01/31/2023, 9:58 PM

Yeah — that’s correct. I’ll say that it makes it much easier to process different types of data — I just configure different jobs for pdfs, docs, audio etc. The service id design to do this process would probably just look similar to dagster without some of the material benefits like caching results, storing runs, etc. The SLA for end users is not like “instant” — but a minute or a couple of minute.

6 Views

Open in Slack

Previous Next