Does hybrid deployment on my own infra allow for m...
# dagster-plus
j
Does hybrid deployment on my own infra allow for more than one simultaneous isolated run unlike serverless?
Also, does hybrid on ECS eliminate the 3 minute boot while provisioning resources?
d
Am I correct in thinking that by "one simultaneous isolated run" you meant "one simultaneous non-isolated run"? (Confusing naming)
j
Hmm, maybe I’m confused — isn’t the rub with serverless that you can either have: 1. Fast startup with non-isolated run, but you get limited CPU and RAM because you share VM 2. Slow startup with isolated run, but a lot of RAM//cpu because it’s it’s own machine
-and- that you can only have one isolated run at a time
d
Everything there is correct except for the last bit - there isn't a limit on the number of isolated runs you can have at once
it's non-isolated runs that have that limit (mostly because of the risk of making the task running them run out of memory)
j
I see, so I can set the parameters for the isolated run and theoretically have multiple runs going in parallel?
d
That's right
j
Ok, cool, that’s good news. I do have concerns about the 3 minute boot — but I’ve found that most of the time it’s a lot less than that. I’m in a weird position where the dagster model fits my use case perfectly, but some of the edges are causing problems — trying to fit into that
d
To your second question - that bootup time is mostly from ECS fargate, so it would affect hybrid ECS as well assuming you're using fargate
j
ok, got it
d
you can also set up ECS on EC2 - that requires more operational work though
j
Is this a similar concept to cold starts with serverless functions? Like is there a period of time where that boot might be lowered if I -just-had a run because the resources are provisioned?
d
I don't think that makes a huge difference in Fargate unfortunately
There is an angry mob complaining about this problem to AWS here https://github.com/aws/containers-roadmap/issues/696
We'd definitely like to make this better in dagster serverless though - not trying to just pass the blame to AWS
we have some ideas around having some run tasks ready to go when you press launch
j
Got it — is there a good knowledge base for ec2 backed deployments? I’m in a weird spot where I’m not looking for kafka-esque real time SLAs, but am aiming for a job to run and return a result in 1-3 minutes on a semi-regular basis — maybe slightly trying to fit a square peg into a round hole but the tooling makes sense outside of that
d
Would hybrid kubernetes/EKS ever be on the table? I think that's a lot less burden to set up than ECS+EC2
j
I’ve generally avoided it because I’m not super familiar with kubernetes, but if it makes this particular case work then I’d definitely be open to it
d
hey josh, can you share a little bit more about the impact of the latency on jobs, are you referring to jobs that would be run on an adhoc basis where you’re waiting for the results to come back?
j
general process is something like this: 1. User uploads document of some variety (think pdf for this case) 2. Uploads to s3 3. Lamda trigger fires a dagster job to pull the pdf from s3 and perform a bunch of text extraction, loading it into different dbs 4. Write back to main application when the job is done to let the user know their document is ready
this would be multiple users in this case possibly running these jobs concurrently — ie 5 people upload a pdf at the same time
Generally all of the extraction stuff combined is taking like 10-90 seconds, but then I’m adding an additional boot up time and it makes it a bit unpredictable
d
i see, so this is to make Dagster work as a piece of infrastructure serving end users who are not your teammates, is that right? presumably the benefit here to using Dagster is that you get better visibility into the pipeline than writing a separate service for it, is that also right?
j
Yeah — that’s correct. I’ll say that it makes it much easier to process different types of data — I just configure different jobs for pdfs, docs, audio etc. The service id design to do this process would probably just look similar to dagster without some of the material benefits like caching results, storing runs, etc. The SLA for end users is not like “instant” — but a minute or a couple of minute.