HI!. I am using dagster cloud hybrid (on AWS ECS, ...
# dagster-plus
t
HI!. I am using dagster cloud hybrid (on AWS ECS, dagster version 1.2.4). The code location server is shutting down at a certain time by ttl setting. When I run a job while the code location server is shutdown, I get the following error
Copy code
dagster._core.errors.DagsterUserCodeUnreachableError: Timed out waiting for call to user code GET_SUBSET_EXTERNAL_PIPELINE_RESULT [{value}]
New or dormant Branch Deployments can take time to become ready, try again in a little bit.
I know the cause of the error, but I do not know how to change the timeout time setting. If you can, could you please tell us how to do that?
d
Hello! You can configure this TTL by setting the following field in your EcsUserCodeLauncher config in your dagster.yaml:
Copy code
user_code_launcher:
  module: dagster_cloud.workspace.ecs
  class: EcsUserCodeLauncher
  config:
    ...
    server_ttl:
      branch_deployments: <your value here in seconds - the default is 24 hours>
The tradeoff here is that the servers will stay around for longer in your cluster if you increase this setting.
t
Thank you for your response. Yes, I have made that setting. I believe that when this setting is enabled, the code location server should shut down (which is the intended behavior). However, after the server shuts down, I encounter the error I previously mentioned when trying to execute the job. I think this is because the server's startup wait time is timing out. That's why I would like to adjust this timeout. I apologize if my English is not clear
d
Ah I don't think its because the startup time is timing out - I think it's because ECS tasks can take a few minutes to start up. What i'd expect to happen is if you try again in a couple of minutes, the task will have started up and the job will start.
as soon as you load the code location for the branch deployment in dagit, it will send a signal to your agent and start spinning the ECS task back up - the error you're seeing will happen if you start a job right away while the task is starting back up
t
I understand. That's right, after encountering this error, the job can be executed normally. I believe that even if the server is not running, it should start when the job is executed. However, from now on, before running the job, I plan to explicitly start the server (for example, from the deployment menu).