Hi team, we are experiencing an issue in our dagster deployment in a job we've written to execute a python script inside of a dagster op container spawned specifically by the celery_docker_executor. When using the normal in process executor, the job is successful. Attached to this message is the specific error message we are receiving. We are having trouble replicating the issue, however it is only occuring in the op that we are actually trying to execute the python script in, and seems to fail at the specific point that we begin to execute the script. Thanks in advance for any assistance you can provide
01/18/2023, 8:35 PM
Hey Jim - I believe exit code 137 corresponds to an out-of-memory error
01/18/2023, 9:09 PM
Hi Daniel, I had noticed that however due to the nature of the script that is executing, I am surprised. We are not loading any data into the container, just continuously polling an API and checking the response. Once the response matches what we expect, it will complete the script execution. This same job configuration works when spawning it with our in process executor inside of 1 docker container, and we have no container limits set in either use case. I'm working on turning off our auto remove setting so I can inspect the crashed container. It seems like it fails the job almost instantly upon script execution with the celery executor, where as with the in process executor it executes completely.
01/18/2023, 9:10 PM
137 is -9 which is a SIGKILL signal - memory isn't the only reason that can happen but its by far the most common in my experience
Dagster doesn't send that signal so it'd be something in the OS or container that's sending it