Is there any downside to having a lot of code repo...
# ask-community
Is there any downside to having a lot of code repositories? Lets say I separate by business function, if I have 20 repositories each with 3-4 jobs will that cause any latency or issues? We would be deploying on k8s so each repo would be its own pod, but we would use 1 dagit for now
We're trending in this direction as well. We're up to ~10 repos, with anywhere from a handful to hundreds of assets / jobs. So far no issues besides the lineage graph refusing to render. I do have the same worry though as we continue to scale -- what does performance look like?
I'm curious about this too, we're moving to a pants-managed monorepo where we'll have the ability to slice up our repo into many different code locations. It seems to me that the underlying architecture is built for this kind of horizontal scaling though - every code location has its own agent right ? (I think actually each deployment has it's own agent, not code location, so seems the agent could end up being a bottleneck). There don't really seem to be any cross-code-location dependencies from an architectural level. Each agent communicates with Dagit on its own through the storage backend, so it seems like wherever you have the storage (usually postgres) deployed is the bottleneck at the end of the day - and postgres can scale quite well. Seems that a bigger issue than number of code locations would be the volume of job runs that need to be queued up, which is somewhat independent of the number of code locations. Are there still many cross-code-location restrictions? It used to be that you couldn't reference assets / jobs in other code locations very well, or re-use asset / op / job names in other code locations, but I know there was work done to reduce that friction. I haven't really gotten a chance to try since I'm still on one code location for the moment.