https://dagster.io/ logo
Title
m

mrdavidlaing

12/09/2021, 11:42 AM
I'm bumping into a
botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4)
error when using a (possibly overloaded) internal S3 compatible blobstore to store job compute logs - see 🧵 for details. Since this doesn't actually cause any of the
op()
in the pipeline to fail I'm wondering if there is a way to mark this error as "ignorable"?
dagster.core.errors.DagsterSubprocessError: During multiprocess execution errors occurred in child processes:
In process 29: botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate.Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/child_process_executor.py", line 65, in _execute_command_in_child_process
for step_event in command.execute():
...snip...
File "/usr/local/lib/python3.7/site-packages/s3transfer/upload.py", line 694, in _main
client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 391, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/usr/local/lib/python3.7/site-packages/botocore/client.py", line 719, in _make_api_call
raise error_class(parsed_response, operation_name)In process 977: botocore.exceptions.ClientError: An error occurred (SlowDown) when calling the PutObject operation (reached max retries: 4): Please reduce your request rate.Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/child_process_executor.py", line 65, in _execute_command_in_child_process
for step_event in command.execute():
File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 83, in execute
instance=instance,
Alternatively, is there perhaps a way to configure the underlying boto library being used with a different set of retry options? https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html
p

prha

12/09/2021, 6:36 PM
Hi David. Thanks for reporting… I think we should probably guard against compute log manager errors and yield a special failure event instead of letting the error mark the run as failed. Re: configuring the underlying boto library, any chance you can set an AWS config file on your run workers?
@Dagster Bot issue Guard compute log manager errors to yield a custom failure event instead of failing the run
👍 2
d

Dagster Bot

12/09/2021, 6:36 PM
m

mrdavidlaing

12/09/2021, 6:38 PM
Re: configuring the underlying boto library, any chance you can set an AWS config file on your run workers?
If that means setting an env variable then yes - I just can't figure out which env var to set 🙂
p

prha

12/09/2021, 6:41 PM
hmm, not a boto expert: does
AWS_MAX_ATTEMPTS
do what you want? (from https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html)
m

mrdavidlaing

12/10/2021, 11:28 AM
I'll give that a try. Feels like a bit of a "magic setting" - but if it works I'll submit a PR to document it better
I can confirm that setting the
AWS_MAX_ATTEMPTS
 env var does appear to have made the problem go away for me.