https://dagster.io/ logo
#dagster-cloud
Title
# dagster-cloud
a

Alex Prykhodko

12/11/2022, 9:22 PM
memory availability for the job runners (update: on Cloud Serverless) seems to be fluctuating quite a bit. it’s typically ~32GB, but I got 3.8GB this on this one and it causes the job to fail:
d

daniel

12/11/2022, 9:23 PM
Hey Alex - is this using Dagster Cloud serverless?
a

Alex Prykhodko

12/11/2022, 9:23 PM
yes.
d

daniel

12/11/2022, 9:25 PM
you can force it to be 32GB by checking the "Isolate Run Environment" checkbox in the Launchpad (the tradeoff is that the run will take a bit longer to start as it sets up its own isolated environment with 32GB). More information on this here: https://docs.dagster.io/dagster-cloud/deployment/serverless#run-isolation
👍 1
a

Alex Prykhodko

12/11/2022, 9:25 PM
thank you!
d

daniel

12/11/2022, 9:27 PM
No problem - just checked and I believe the limit is actually 16GB not 32GB as per https://docs.dagster.io/dagster-cloud/deployment/serverless#limitations
👍 2
a

Alex Prykhodko

12/11/2022, 9:28 PM
yep, I see that now… did not see that doc before. FYI, this is the code that gives me 32GB total memory for most runs:
Copy code
def report_memory_usage(logger: logging.Logger):
    import psutil
    import os
    process = psutil.Process(os.getpid())
    total_mem = psutil.virtual_memory().total / 1024 / 1024
    available_mem = psutil.virtual_memory().available / 1024 / 1024
    process_mem = process.memory_info().rss / 1024 / 1024
    report_info = f'process memory usage: {process_mem} MB / total: {total_mem} MB / available: {available_mem} MB'
    <http://logger.info|logger.info>(report_info)
@daniel how do I enable always isolated runs?
that’s the current deployment config:
Copy code
run_queue:
  max_concurrent_runs: 10
  tag_concurrency_limits: []
run_monitoring:
  start_timeout_seconds: 1200
run_retries:
  max_retries: 0
sso_default_role: VIEWER
non_isolated_runs:
  max_concurrent_non_isolated_runs: 1
d

daniel

12/11/2022, 9:36 PM
I don't think there's currently a way to change the default (runs that are triggered by a schedule or sensor will always be isolated)
a

Alex Prykhodko

12/11/2022, 9:36 PM
gotcha. thanks. been doing it manually for now.
d

daniel

12/11/2022, 9:37 PM
Oh wait, yes there is:
Copy code
non_isolated_runs:
  enabled: False
(that will make the checkbox go away and default everything to isolated)
a

Alex Prykhodko

12/11/2022, 9:38 PM
the default value for that is False, and my config does not specify it.
I will set it to False, but there still appears to be a problem either with the doc or impl.
message has been deleted
d

daniel

12/11/2022, 9:39 PM
Ah yeah, the problem is the docs. Will fix, thanks!
a

Alex Prykhodko

12/11/2022, 9:39 PM
thank you much! go Dagster!
condagster 1
d

daniel

12/11/2022, 9:42 PM
that's interesting that psutil is saying 32GB are available - we're running in ECS and are definitely setting the memory to 16384. It's possible that it gives the process more memory but still kills it if it exceeds the set limit
👍 1
a

alex

12/12/2022, 3:23 PM
I’m not sure about ECS and
psutil
specifically - but one potential misreporting problem is that you are getting the resource information of the host machine and not your containers allocation of that resource
👍 1
22 Views