https://dagster.io/ logo
c

Chris Roth

04/16/2020, 8:18 PM
haha so many questions for you guys today! it's deploy day. i'm also setting a bunch of "This pipeline run has been marked as failed from outside the execution context"
a

alex

04/16/2020, 8:22 PM
lots of questions are good! helps us fix a lot of stuff. that message should only occur when you are using dagit to do executions and it loses the subprocess unexpectedly that it was using for the execution
you may have something in stdout/stderr where ever dagit is running - this should only happen if the process crashes
c

Chris Roth

04/16/2020, 9:39 PM
hm i'm not seeing anything in the logs
also having items just get stuck in the "starting" status
a

alex

04/16/2020, 10:06 PM
hmm what type of machine is dagit running on ?
c

Chris Roth

04/16/2020, 10:07 PM
ECS fargate w/ 8gb memory, 1vCPU
to be fair i am programmatically spawning about 50-100 pipeline runs at a time which i don't think is exactly a use case you guys are planning around
a

alex

04/16/2020, 10:09 PM
oo ok tell me more about how youre kicking these off
you may want to set
max_concurrent_runs
if they are executing via dagit https://docs.dagster.io/latest/deploying/instance/#dagit
c

Chris Roth

04/16/2020, 10:10 PM
nope they are executing only in separate celery workers via redis
a

alex

04/16/2020, 10:12 PM
well theres where the “run” is happening and where the “steps” are happening
so i assume you are using the celery engine which executes each step (based on the solid it came from) via celery
c

Chris Roth

04/16/2020, 10:13 PM
oh
hm
a

alex

04/16/2020, 10:13 PM
but there is also the overall run which is coordinating and putting stuff in to the queues
c

Chris Roth

04/16/2020, 10:13 PM
right
a

alex

04/16/2020, 10:13 PM
so how are you kicking off the runs?
c

Chris Roth

04/16/2020, 10:13 PM
i believe it is coordinated via dagit and each step is running on celery
i created a pipeline that spawns a bunch of new pipeline runs
using the RemoteDagitRunLauncher
a

alex

04/16/2020, 10:14 PM
ahhhh i seeeeeee
c

Chris Roth

04/16/2020, 10:14 PM
a

alex

04/16/2020, 10:15 PM
ok cool cool cool
well im guessing the issue is that even though actual execution isn’t happening in those processes - its too much for that one vcpu and 8g of ram to have hundreds of subprocesses going at the same time
some options are: * set
max_concurrent_runs
on the dagit instance settings - this will use an in memory queue so isnt the safest but should allow things to proceed without crashing * give the dagit box more resources * write your own run launcher that does whatever you can dream for where to handle these processes
c

Chris Roth

04/16/2020, 10:21 PM
hm ok
i'll try max_concurrent_runs
what is the default for that?
a

alex

04/16/2020, 10:22 PM
when not set no queue is used
so unbounded
c

Chris Roth

04/16/2020, 10:22 PM
ah ok
a

alex

04/16/2020, 10:23 PM
you could also try to introduce some delay between each run submission
since once the pipeline is running it just sleeping and checking on celery - the majority of contention will be at start up time
c

Chris Roth

04/16/2020, 10:24 PM
that makes sense
i'm gonna give both of those a shot