https://dagster.io/ logo
#announcements
Title
# announcements
s

Sam Rausser

06/04/2020, 2:23 AM
I've been getting a bunch of these errors. does anyone know what might be causing them?
m

max

06/04/2020, 2:28 AM
can you describe your system a little -- is this local, what OS, how many pipelines are you running concurrently, schedules, etc
s

Sam Rausser

06/04/2020, 2:46 AM
Centos 6 running 14 pipelines only about 8 running at any given time
in production
m

max

06/04/2020, 2:48 AM
hmm, i know nothing about centos. i would be interested if you could see whether theres an abnormal number of open files or of processes
also curious about disk space
s

Sam Rausser

06/04/2020, 2:49 AM
is there anything i can ask my systems team specifically that would be helpful?
m

max

06/04/2020, 2:50 AM
i'll note also that the LocalComputeLogManager is not really intended for production - are you running in a cloud?
i assume you're not running in a container -- you're running dagster right on the metal/VM
s

Sam Rausser

06/04/2020, 2:51 AM
systems team manages all the nodes we use in 2 datacenters
dont think we use vms
everything is on prem
m

max

06/04/2020, 2:52 AM
gotcha
s

Sam Rausser

06/04/2020, 2:53 AM
what should i use instead of
local_compute_log_manager
?
or is there a way to turn it off
m

max

06/04/2020, 2:54 AM
do you have an on-prem equivalent to an object store like S3? if not, i'd suggest configuring it to point at a shared filesystem if you have one. but i'm not certain that's the issue
i think it'd be good if your systems folks could run
lsof
and
ps
or equivalent and see if they see either an abnormal number of open files or a large number of python processes
if not, that'll at least rule some things out
how frequently do these pipelines run; and do you have any sense of about how long the server has been up, about how many pipelines in total it's run
s

Sam Rausser

06/04/2020, 2:58 AM
we did have s3 set up for compute los and one of the systems guys yelled at me sayin there is no reason to store the logs there as they get sent to kafka anyway, so i had to turn it off.
they run as soon as they get 15-25k messages from kafka
so anywhere from 5 sec to the 5 min timeout
791 ran since 5pm
turned it off 10 min ago cause i didn't want to get paged throughout the night
m

max

06/04/2020, 3:03 AM
are you running off master, or 0.7.15, or another version?
s

Sam Rausser

06/04/2020, 3:05 AM
Copy code
0.7.13
m

max

06/04/2020, 3:05 AM
ok, as an interim step, i would turn the compute log manager off -- this is a totally fine way to run if you have some other facility that aggregates stdout/stderr
you should be able to do the following in your dagster.yaml
s

Sam Rausser

06/04/2020, 3:05 AM
cool, so just remove
compute_logs
section from prod yaml file?
m

max

06/04/2020, 3:06 AM
Copy code
compute_logs:
    module: dagster.core.storage.local_compute_log_manager
    class: NoOpComputeLogManager
i have a hunch what might be causing this - it'd be helpful if you could provide those diagnostics - and we can dig in tomorrow
s

Sam Rausser

06/04/2020, 3:10 AM
will do, i'll deploy this and see if i get less pager duties and we'll reconvene tomorrow and share the stats. thank you for your help!
no one is online to accept my diff, guess it'll have to wait till the morning
m

max

06/04/2020, 3:26 AM
apologies for this
a

alex

06/04/2020, 1:34 PM
they run as soon as they get 15-25k messages from kafka
what exactly is the set up for kicking off pipeline runs? If the kafka listener is a long lived process its possible the issue is a memory / file descriptor leak from accidentally holding on to references
s

Sam Rausser

06/04/2020, 2:45 PM
once the min number of messages is hit, the messages are pickled to disk and the pipeline is called
3 Views