https://dagster.io/ logo
Title
q

Qumber Ali

11/15/2021, 6:25 AM
Hi all, I'm facing the issue of slowness like I have a job which takes 3 to 4 minutes to get complete when I run a single job it completes in 3 to 4 minutes but when I run around 32 concurrent jobs the time for each job reach to 1 hour or more even though my ec2 server is large and using one 20% of its resources only, i'm not sure why it's happening that by increasing the concurrent jobs speed getting slow.
Anyone?
d

daniel

11/15/2021, 3:56 PM
Hi Qumber - if no other resources are approaching the limit, one thing i'd check is whether writing to your DB might be the bottleneck (especially if you are using the default Sqlite DB - switching to Postgres could dramatically improve your write performance, there are instructions for how to do that here: https://docs.dagster.io/deployment/dagster-instance#instance-configuration-yaml)
Quick note on response times, most of the team is in PST so you might not see a response right away if you post outside of business hours in that timezone. There's also no need to post in multiple channels, dagster-support is the right place for technical questions like this. More on the Slack guidelines here: https://docs.dagster.io/community/code-of-conduct#slack-content-guidelines
q

Qumber Ali

11/16/2021, 12:48 PM
I have updated the logs with postgres, i'm not sure why i'm getting this error is it due to Postgres or else can you please have a look
Nov 16 12:40:30 ip-172-31-38-244 dagster-daemon[23386]: 2021-11-16 12:40:30 - SensorDaemon - ERROR - Sensor daemon caught an error for sensor zappos_to_walmart_new_file : Exception: Timed out waiting for gRPC server to start with arguments: "/usr/bin/python3 -m dagster.grpc --lazy-load-user-code --socket /tmp/tmpesr_c4ai --heartbeat --heartbeat-timeout 120 --fixed-server-id a016452f-e679-4bde-8873-7cb8f42b47d4 -f /home/ubuntu/apps/inventory-jobs/./repositories/products_matching_repo.py -d /home/ubuntu/apps/inventory-jobs/./". Most recent connection error: grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
Nov 16 12:40:30 ip-172-31-38-244 dagster-daemon[23386]:         status = StatusCode.UNAVAILABLE
Nov 16 12:40:30 ip-172-31-38-244 dagster-daemon[23386]:         details = "failed to connect to all addresses"
Nov 16 12:40:30 ip-172-31-38-244 dagster-daemon[23386]:         debug_error_string = "{"created":"@1637066430.571063850","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3159,"referenced_errors":[{"created":"@1637066430.571062845","description":"failed to connect to all addresses","file":"src/core/lib/transport/error_utils.cc","file_line":147,"grpc_status":14}]}"
j

johann

11/16/2021, 2:31 PM
^ This is from the daemon logs?
q

Qumber Ali

11/16/2021, 2:35 PM
yes
j

johann

11/16/2021, 2:38 PM
There may be other lines in the log which will have the actual error from the grpc server. Otherwise, you may be able to reproduce it by trying to start the grpc server yourself:
/usr/bin/python3 -m dagster.grpc --lazy-load-user-code --socket /tmp/tmpesr_c4ai --heartbeat --heartbeat-timeout 120 --fixed-server-id a016452f-e679-4bde-8873-7cb8f42b47d4 -f /home/ubuntu/apps/inventory-jobs/./repositories/products_matching_repo.py -d /home/ubuntu/apps/inventory-jobs/./
q

Qumber Ali

11/16/2021, 3:02 PM
@daniel @johann I have switched the logs path to Postgress but still when I run more then 20 jobs at the same time the jobs getting delayed by an hour or more, I'm really fedup of this issue please help on this.
j

johann

11/16/2021, 3:05 PM
You were able to get things working with the daemon?
q

Qumber Ali

11/16/2021, 3:05 PM
yeah
j

johann

11/16/2021, 3:05 PM
What was the issue?
q

Qumber Ali

11/16/2021, 3:06 PM
ooh you are talking about the issue not that issue isn't solved yet.
grpc still persist.
j

johann

11/16/2021, 3:07 PM
Hmm but that doesn’t block you from doing that performance test?
q

Qumber Ali

11/16/2021, 3:08 PM
after updating the dagster it restricts most of the times.
j

johann

11/16/2021, 3:13 PM
q

Qumber Ali

11/16/2021, 3:14 PM
it doesn't give me any error
j

johann

11/16/2021, 3:15 PM
If this connection issue is sporadic (and happens when you’re trying your load test), it seems likely that your hitting some sort of resource limit on your host that’s causing issues spinning up new processes.