Hey everyone, Looking for a bit of guidance. If t...
# ask-community
e
Hey everyone, Looking for a bit of guidance. If this is not the best place, please redirect me to the correct channel (or any other media/app) 🙂 I've inherited a number of Dagster Repositories (I will migrate them to Definitions whenever possible). Right now their deployment is one "main" server and several other servers connected via gRPC. They all share the same local storage and database (for run information storage, etc..). As I'm new to Dagster, I'm wondering if this is the best approach. The current UI is extremely slow, sometimes unresponsive when loading runs, for example. I'm thinking about proposing a migration to a different deployment system. One Dagit Instance per repository, each one with its own local storage and DB. Each Definitions would be deployed in a Docker container. I'm also wondering what's the best way of deleting old runs. Right now there is a job that uses the DagsterInstance object to load and delete runs older than a certain age. Is this the best approach? Finally, I'm all hears (or eyes, in this case 😅) for tips on how to manage and scale Dagster. Any source of information/documentation on this would be highly appreciated. Thanks in advance!
a
As I’m new to Dagster, I’m wondering if this is the best approach. The current UI is extremely slow, sometimes unresponsive when loading runs, for example.
That set-up should be fine, but it depends on the resources available to the processes in question. What version of dagster is running? How much cpu/mem does the dagit server have? How much cpu/mem does the database have? If this is an older instance, running
dagster instance migrate
/
dagster instance reindex
may improve database performance.
thinking about proposing a migration to a different deployment system. One Dagit Instance per repository, each one with its own local storage and DB
This may address resource contention by making more resources available, and if this achieves usability you desire then it should be a fine option.
delete runs older than a certain age. Is this the best approach?
Yea at this point. Looks something like this im guessing https://github.com/dagster-io/dagster/discussions/12047
e
That set-up should be fine, but it depends on the resources available to the processes in question.
Dagster version: 1.0.12 Server specs: t3a.xlarge, 4vCPU and 16GiB (for the past 7 days, cpu has never been more than 70%, usually around 30% and RAM always between 3-6 GiB) DB specs: RDS db.t3.micro (2 vCPU and 1GiB) Postgres 12.11 (for the past 7 days, cpu has been around 10-20% and read iops around 75-150; im going to enable performance insights to get more visibility into this; however, using
pgcli
, the query times are not that high, maybe around 5-10 seconds to get all the runs for example) We already run migrate as part of the startup script
This may address resource contention by making more resources available, and if this achieves usability you desire then it should be a fine option.
thanks!
Yea at this point. Looks something like this im guessing https://github.com/dagster-io/dagster/discussions/12047
yes, it mostly looks like that 🙂
a
im going to enable performance insights
Great - ya some additional profiling information will likely be very useful.
The current UI is extremely slow, sometimes unresponsive when loading runs, for example.
Is this the page that shows a paginated list of recent runs or the event log for a specific run? Ild be surprised if the former was an issue but the latter can vary quite dramatically based on how you use the system. For example, using
context.log
in large volumes or with large data can drastically degrade performance.
e
Is this the page that shows a paginated list of recent runs or the event log for a specific run?
sometimes the schedules do not even get displayed 😅 deleting some runs did help with loading runs for a specific job we do use context.log in every job would it be better to use a different logging method? what's the benefit of context.log, besides being stored with the runs?
a
sometimes the schedules do not even get displayed
which page is this exactly? The schedule pages im thinking of have been rewritten since 1.0.12 so upgrading may address this issue.
deleting some runs did help with loading runs for a specific job
I believe a DB index was added to address this in 1.1.18, so upgrading may address this issue.
would it be better to use a different logging method? what’s the benefit of context.log, besides being stored with the runs?
context.log
entries show up next to the structured events in the main event feed. High signal messages are useful to surface in this way. “Compute logs” or the raw stderr/stdout captured during execution is a more efferent way to capture various log output https://docs.dagster.io/deployment/dagster-instance#compute-log-storage. These are available to view in dagit with this button:
e
which page is this exactly? The schedule pages im thinking of have been rewritten since 1.0.12 so upgrading may address this issue.
navbar -> deployment -> schedules anything under deployment is very slow to load, sometimes, does not load at all
“Compute logs” or the raw stderr/stdout
here you mean print statement and/or built in logger?
a
either or - if it makes its way to stdout/stderr and compute logs are configured properly in dagster they will get captured
e
cool
thanks for all your help!
dagsir 1