https://dagster.io/ logo
#ask-community
Title
# ask-community
t

Tomas Gatial

01/10/2023, 12:15 PM
Hi, has anyone observed any issues when scaling up User Code Deployments? After scaling my UC Deployment to 3 replicas i observe following: • In Dagit, i Preiodically get
Definitions Reloaded
popup. • In Dagit I also I observe error message:
Enum 'LocationStateChangeEventType' cannot represent value: <LocationStateChangeEventType instance>
• Dagit becomes less responsive • Also I notice slow linear increase of memory footprint of
dagit
and
daemon
pods (Helm Deployment v 1.1.9, tested in 2 different clusters -> Azure & Rancher Desktop)
a

Adam Bloom

01/10/2023, 2:26 PM
Curious what your use case for running multiple replicas of user code is. The helm chart does not make the number of replicas configurable. I’ve always assumed more than 1 is not supported (or necessary) https://github.com/dagster-io/dagster/blob/master/helm/dagster/charts/dagster-user-deployments/templates/deployment-user.yaml#L13
t

Tomas Gatial

01/10/2023, 3:37 PM
Having enabled DefaultRunLauncher enables me to handle low resource / high frequency / time sensitive jobs directly on the user code, without the overhead of k8s orchestrator. I am aiming to have robust Dagster setup, able to handle workloads on both ephemeral and non ephemeral resources, as discussed here: https://dagster.slack.com/archives/C01U954MEER/p1669137566368989?thread_ts=1668960151.191789&amp;cid=C01U954MEER Documentation (page Deployment->Open Source) says code location replicas are supported. https://docs.dagster.io/deployment/overview#long-running-services
d

daniel

01/10/2023, 3:39 PM
I think adding the following to dagsterApiGrpcArgs will help with most of these issues (but replicas on the user code deployments aren't officially supported and I can't promise you won't run into other weirdness)
Copy code
--fixed-server-id <some unique string for your user code deployment here>
looking into that error now, which is not expected
setting the fixed-server-id field will help indicate to dagit that each of the replicas represent the same location - right now its getting confused because each replica has its own server ID so it thinks the code is constantly updating
t

Tomas Gatial

01/10/2023, 3:42 PM
Thanks Daniel! I am testing the arg now.
d

daniel

01/10/2023, 3:43 PM
the other big downside i think you'll run into right now if you use the default run launcher with replicas is that any runs that are still happening whenever you upgrade your code will be interrupted
t

Tomas Gatial

01/10/2023, 3:50 PM
Thanks for noting! Will the
sensor
runs be interrupted too?
d

daniel

01/10/2023, 4:01 PM
sensors should be fine
er sorry - to clarify, any runs would be interrupted, yeah, including runs launched from sensors
but running the sensors themselves should be fine - they will stop too but can pick up where they left off
2 Views