Is there a best practice when determining the size and function of code repositories? How do the daemon workers utilize the code repositories? Would a large code repository create latency in worker creation?
03/02/2023, 9:32 PM
Hey Aaron - the daemon and dagit never actually directly load your code - they communicate with a separate process that loads your code over a gRPC interface, and only pass around structured summaries/snapshots of the Dagster code objects in your repositories. So the main variable that would cause the daemon to need more memory would be if you have a particularly large or intricate graph, but the size of the code (or the amount of work that the code does, or the time that it takes to import it) shouldn't matter for the daemon worker latency.
This is one big difference between dagster and airflow, which has one or more centralized scheduler processes that loads your code directly.
03/02/2023, 9:34 PM
Awesome. Thank you for the insight. We are debating separating graphs into their own repositories, like push_data_to_s3, transform_other_data... etc. But I think that might be overkill
03/02/2023, 9:40 PM
The only performance-related reason i would recommend doing that proactively would be if you know if will make the time to import the code go way down