Another question - I modified the `branch_deployme...
# dagster-plus
r
Another question - I modified the
branch_deployment.yml
and
deploy.yml
files to run with Python 3.11, and now the GH Actions jobs take around 28 minutes and often fail via timeout. The log will spam the following, and it will often fail.
Still waiting for agent to sync changes to analytics_dagster. This can take a few minutes.
Sometimes, I get lucky, and it works, though. Is there a known fix?
1
m
Hi, That's the agent waiting for your code location to be loading. Can you provide a timeframe during which this error occurred so that I peek at the logs and see if there is any suspicious traces there that could help? Does it work locally?
r
I kicked off a
Serverless Branch Deployment
job for the
dooly
organization on Thu, 22 Feb 2024, 073411 GMT. There was a series of failures for the branch deployment. Local runs work with no problems. Eventually, I could have the branch deployment workflow work, and I pushed to prod with no issues so far.
m
@Robele Baker - Do you still have issues with this? It seems the memory usage might be an issue there. It does spike pretty high and fast on start up.
r
Hey @Mathieu Larose - I haven't deployed to a branch in a while, but builds to the prod environment is working well.
Actually, it's still failing sometimes on prod deployments too 😞
m
Hi @Robele Baker, I would like to bump your resource allocation on our side. Is it a good time for me to proceed to that change?
r
Yes please
m
it's redeploying now - let us known if you still have problems afterwards, but this should help.
r
It's loading really quickly now!
Thanks @Mathieu Larose!
dagster yay 1
🤖 1
Hey @Mathieu Larose and Dagster team. The Dagster Serverless deploy jobs are failing again. Is it time to start containerizing my code and separating code locations?
m
Hi @Robele Baker, It seems the workload is a bit overwhelming for the size of the box it runs in, that is why it times out mostly. I am not sure if creating your own image would help. But if you can somehow split or optimize your workload that would definitely help.
r
It's failing in the build deploy Python executable step. How does one optimize for this workflow
m
sorry for the ambiguity, i was not referring to the github action workflow. i should have used "pipeline", referring to the overall size and complexity of what you have in a single code location. For example, I see that you are using dbt and have a couple hundred assets. I don't have a good sense of whether that's the root cause but presumably that is what has been scaling up on your side and if you could refactor this in several code locations and jobs, it might help? I can escalate this for better guidance tbh. Another option if you really have demanding asset graphs to support would be to move to hybrid deployment with ECS or Kubernetes which would allow you to configure the size of the boxes you're using to match your need.
r
I am looking to move to Hybrid eventually. We're a GCP house, and I need to dedicate time to either create a GCP Cloud Run/Docker solution or upstart a Kubernetes cluster. May this be escalated for the time being? Once I figure out a smooth Hybrid deployment, I'll use that one
👍 1
m
sorry for the delay, but the two insights i have been provided with we don't expect disabling pex to help you a lot there. the culprit is likely loading the dbt manifest. apparently one frequent mistake is using the same dbt manifest multiple times, creating extra work loading the same assets. you might want to check for this.
👀 1
1