Why do I observe branch deploys with a TTL gt than the one s dagster #dagster-plus

Join Slack

Why do I observe branch deploys with a TTL > th...

# dagster-plus

geoHeil

05/14/2023, 1:50 PM

Why do I observe branch deploys with a TTL > than the one specified in the agent?

geoHeil

05/14/2023, 1:51 PM

my runner is docker compose

geoHeil

05/14/2023, 4:31 PM

Why is dagster starting 6 containers for this code location whilst sitting idle?

geoHeil

05/14/2023, 4:45 PM

server_ttl: full_deployments: enabled: true ttl_seconds: 60 branch_deployments: ttl_seconds: 60

geoHeil

05/14/2023, 4:45 PM

are set

geoHeil

05/14/2023, 4:45 PM

but somehow it already has spun up 20 by now

jordan

05/15/2023, 2:14 PM

The names and labels of the containers will probably give a bit more of an answer. Can you share the agent logs, which code locations you expect to exist, and which containers actually exist?

geoHeil

05/15/2023, 6:49 PM

I have 1 single code location deployed and would expect 1 container to be up and running at any given time and perhaps some of if a job is launched. However, in docker I see > 1 container being up and running

geoHeil

05/15/2023, 6:50 PM

<<codelocation1>>-eb3e4b, <<codelocation1>>-05f02b are online currently (I killed all the others and increased the timeoutes again

geoHeil

05/15/2023, 6:52 PM

Here you go with some logs https://gist.github.com/geoHeil/347553af596fa0807e8e4b4641563cec

jordan

05/15/2023, 8:42 PM

A few things jump out from the logs (where I only see reference to 3 containers): • were there multiple agents at one point? It looks like some of the code locations previously (or even currently) belonged to another agent and weren’t properly cleaned up • when a location is updated, the agent performs a blue/green deployment where it first stands up a new container and then shuts down the old one. Did you recently reload a location? It’s possible both the new and old containers are still up while it’s transitioning. • does the agent service any branch deployments?

geoHeil

05/16/2023, 7:03 AM

let me kill all containers and observe it for a couple of hours again

geoHeil

05/16/2023, 7:04 AM

I indeed had 1 agent and one for branch deployments

geoHeil

05/16/2023, 7:05 AM

But 2 dagster deployments (both branch deploys enabled)

geoHeil

05/16/2023, 7:05 AM

And dagster cloud seems to have a restriction that there can only be one agent offering branch deploys (for now)

geoHeil

05/16/2023, 10:24 AM

it seems to be better now.

geoHeil

05/24/2023, 8:52 PM

@jordan I can observe it is better. But not fixed. Runs no longer cause stale containers to be kept around. However, a new deployment (i.e. branch) or push to main/prod will keep the OLD version(s) still flying around and not shut them down. How can these be removed as well? Is this a bug in dagster-cloud?

jordan

05/24/2023, 9:06 PM

Can you share the agent logs, which containers actually exist, and which containers you expect to exist?

geoHeil

05/25/2023, 3:23 AM

I will do this after a next deployment as I deleted the containers manually. Or would the old logs from the last couple of days be fine?

geoHeil

05/25/2023, 3:31 AM

I pm-ed you the logs

Open in Slack

Previous Next