https://dagster.io/ logo
#dagster-plus
Title
# dagster-plus
r

Robele Baker

02/22/2024, 8:23 AM
Another question - I modified the
branch_deployment.yml
and
deploy.yml
files to run with Python 3.11, and now the GH Actions jobs take around 28 minutes and often fail via timeout. The log will spam the following, and it will often fail.
Still waiting for agent to sync changes to analytics_dagster. This can take a few minutes.
Sometimes, I get lucky, and it works, though. Is there a known fix?
1
m

Mathieu Larose

02/22/2024, 2:11 PM
Hi, That's the agent waiting for your code location to be loading. Can you provide a timeframe during which this error occurred so that I peek at the logs and see if there is any suspicious traces there that could help? Does it work locally?
r

Robele Baker

02/22/2024, 2:50 PM
I kicked off a
Serverless Branch Deployment
job for the
dooly
organization on Thu, 22 Feb 2024, 073411 GMT. There was a series of failures for the branch deployment. Local runs work with no problems. Eventually, I could have the branch deployment workflow work, and I pushed to prod with no issues so far.
m

Mathieu Larose

02/23/2024, 7:22 PM
@Robele Baker - Do you still have issues with this? It seems the memory usage might be an issue there. It does spike pretty high and fast on start up.
r

Robele Baker

02/26/2024, 12:24 PM
Hey @Mathieu Larose - I haven't deployed to a branch in a while, but builds to the prod environment is working well.
Actually, it's still failing sometimes on prod deployments too 😞
m

Mathieu Larose

02/26/2024, 3:54 PM
Hi @Robele Baker, I would like to bump your resource allocation on our side. Is it a good time for me to proceed to that change?
r

Robele Baker

02/26/2024, 7:40 PM
Yes please
m

Mathieu Larose

02/26/2024, 9:42 PM
it's redeploying now - let us known if you still have problems afterwards, but this should help.
r

Robele Baker

02/27/2024, 3:47 PM
It's loading really quickly now!
Thanks @Mathieu Larose!
dagster yay 1
🤖 1
Hey @Mathieu Larose and Dagster team. The Dagster Serverless deploy jobs are failing again. Is it time to start containerizing my code and separating code locations?
m

Mathieu Larose

03/18/2024, 2:53 PM
Hi @Robele Baker, It seems the workload is a bit overwhelming for the size of the box it runs in, that is why it times out mostly. I am not sure if creating your own image would help. But if you can somehow split or optimize your workload that would definitely help.
r

Robele Baker

03/18/2024, 2:59 PM
It's failing in the build deploy Python executable step. How does one optimize for this workflow
m

Mathieu Larose

03/18/2024, 3:13 PM
sorry for the ambiguity, i was not referring to the github action workflow. i should have used "pipeline", referring to the overall size and complexity of what you have in a single code location. For example, I see that you are using dbt and have a couple hundred assets. I don't have a good sense of whether that's the root cause but presumably that is what has been scaling up on your side and if you could refactor this in several code locations and jobs, it might help? I can escalate this for better guidance tbh. Another option if you really have demanding asset graphs to support would be to move to hybrid deployment with ECS or Kubernetes which would allow you to configure the size of the boxes you're using to match your need.
r

Robele Baker

03/18/2024, 3:15 PM
I am looking to move to Hybrid eventually. We're a GCP house, and I need to dedicate time to either create a GCP Cloud Run/Docker solution or upstart a Kubernetes cluster. May this be escalated for the time being? Once I figure out a smooth Hybrid deployment, I'll use that one
👍 1
m

Mathieu Larose

03/18/2024, 6:09 PM
sorry for the delay, but the two insights i have been provided with we don't expect disabling pex to help you a lot there. the culprit is likely loading the dbt manifest. apparently one frequent mistake is using the same dbt manifest multiple times, creating extra work loading the same assets. you might want to check for this.
👀 1
1
3 Views