I'm using Google Cloud to run dagster daemon and d...
# announcements
l
I'm using Google Cloud to run dagster daemon and dagit. For the moment, I'm running it inside a tmux to analyze how everything is running. In my pipeline, we're getting some JSON data and posting it to Big Query every minute. Usually, the whole process takes 4s. However, analyzing my CPU utilization on Google Monitoring it keeps increasing in a constant rate. What can we do to keep CPU down? My impression is that dagster is open more process without closing old ones or something like this.
d
Hi Laura - a couple of follow-up questions about this - which version of dagster are you on, and are you using any schedules and sensors currently?
(Depending on your version, there are a few recent improvements we've made that could help with this)
j
Hi @daniel, I work with Laura.
We are using dagster 0.10.7, which is the latest one, right?
There is a cron scheduler (* * * * *) calling the job
Dagster is running for 17h but there are 1082 threads opened
full htop pic
d
Got it - are you using any sensors? Or just the scheduler?
j
just the scheduler
d
Got it - I'll do some investigation and see if I can reproduce this. We've actually made some big improvements in our process management in the daemon in 0.10.8 (going out later today) but I need to confirm whether this issue will be fixed as part of that.
Would it be possible to post your dagster.yaml for repoduction purposes?
j
d
ah perfect, thanks! Will report back.
j
we are still figuring out dagster, so code structure might not be the best
we also had another problem with broken imports, but we can discuss that later
d
Are there any errors or anything else suspicious in the output from dagster-daemon run?
I'm not really clear on why there would be so many dagster-daemon run processes in that htop - just confirming, it's set up in Google Cloud to only run a single dagster-daemon process?
particularly the fact that many of them only run for a couple of seconds is confusing - it's supposed to be a single long-running process, could something in google cloud be repeatedly trying to restart it for some reason?
j
deamon logs look fine
we just ran one process
the google cloud instance just has dagster and dagit installed
nothing else
d
hmmm, any idea where all those other dagster-daemon run could be coming from? I'm not an htop expert, but I'm not seeing that when I run the daemon locally in 0.10.7 and run htop
And I can't think of anything in dagster that automatically kicks off a "dagster-daemon run" process
j
we are running the daemon through a Makefile, https://github.com/RJ-SMTR/maestro/blob/main/Makefile
can this be a problem?
d
What triggers a 'make run-daemon' call? If that's running automatically in some way, that could explain this.
j
it is manually trigged
there are a lot of grpc jobs as well
I am running without Makefile and it is still creating dozens of daemons
d
So just to confirm the repro steps: You run "dagster-daemon run", without the Makefile, run htop, and then see dozens of "dagster-daemon" prcoesses?
when you start up a new dagster-daemon process, does the deployment script do anything to ensure that the previous one stops?
j
yes for the first question
no for the second. What should I do? It just runs through the pipeline
d
how strange - does that also happen to you if you run dagster-daemon locally? or just in Google Cloud?
Ah, so just backing up a sec - how often are you running 'Make run-daemon'? Are you running it to manually kick off the dagster pipeline?
If you have a single pipeline that you want to manually trigger via command line, then I think what you want is the 'dagster pipeline launch' CLI command (or to launch the pipeline from dagit). You don't actually need the daemon (or a schedule) at all in this case. This is one of the nice things about dagster compared to, say, Airflow - you don't actually need a schedule or a scheduler at all if you want to manually trigger a pipeline.
I'd be happy to hop on a quick call to discuss too if that's easier
(You can also launch the pipeline manually from dagit if you have dagit running - that also doesn't require deploying the daemon unless you want some advanced features like sensors or limiting the number of runs happening at once)
l
hi daniel, we have daemon on because we're running a data acquisition process every minute
can we hop on a quick call?
d
definitely - is now good?
I can DM you
(just following up here quickly - laura showed me the tmux output that shows a dangling thread, I'm going to try to reproduce that locally and will follow up here. That sounds like a different thing than the 'multiple daemon processes' topic that I was discussing with Joao)