Abhinav Dhulipala
04/02/2023, 11:39 PMdagster._core.errors.DagsterUserCodeUnreachableError: Could not reach user code server. gRPC Error code: UNKNOWN
File "/home/usr/.local/lib/python3.8/site-packages/dagster/_grpc/client.py", line 453, in start_run
res = self._query(
File "/home/usr/.local/lib/python3.8/site-packages/dagster/_grpc/client.py", line 157, in _query
self._raise_grpc_exception(
File "/home/usr/.local/lib/python3.8/site-packages/dagster/_grpc/client.py", line 140, in _raise_grpc_exception
raise DagsterUserCodeUnreachableError(
The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.UNKNOWN
details = "Exception calling application: [Errno 5] Input/output error"
debug_error_string = "{"created":"@1680457553.144815667","description":"Error received from peer unix:/tmp/tmpqx4b0zzk","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Exception calling application: [Errno 5] Input/output error","grpc_status":2}"
>
File "/home/usr/.local/lib/python3.8/site-packages/dagster/_grpc/client.py", line 155, in _query
return self._get_response(method, request=request_type(**kwargs), timeout=timeout)
File "/home/usr/.local/lib/python3.8/site-packages/dagster/_grpc/client.py", line 130, in _get_response
return getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
File "/home/usr/.local/lib/python3.8/site-packages/grpc/_channel.py", line 946, in __call__
return _end_unary_response_blocking(state, call, False, None)
File "/home/usr/.local/lib/python3.8/site-packages/grpc/_channel.py", line 849, in _end_unary_response_blocking
raise _InactiveRpcError(state)
I've tried changing python version, cleaning my venv, and trying to debug the python install on my VM. Our transform stage works flawlessly on my local dev machine, but when deployed to the VM I seem to get this all the time. Any ideas as to what could be going wrong here? Has anyone gotten something simliardaniel
04/03/2023, 3:11 PMAbhinav Dhulipala
04/03/2023, 7:15 PMvenv/lib/python3.8/site-packages/dagster/_grpc/server.py:1293: UserWarning: GrpcServerProcess is being destroyed without signalling to server that it should shut down. This may result in server processes living longer than they need to. To fix this, wrap the GrpcServerProcess in a contextmanager or call shutdown_server on it
The version of dagster I'm running is
dagster --version
dagster, version 1.2.4
I'll try reloading the code location, I'm very new to dagster and appreciate the suggestion. I'll try deploying dagster in docker! Thank you for that suggestiondaniel
04/03/2023, 7:18 PMdaniel
04/03/2023, 7:20 PMAbhinav Dhulipala
04/03/2023, 7:22 PMAbhinav Dhulipala
04/03/2023, 7:22 PMdaniel
04/03/2023, 7:22 PMAbhinav Dhulipala
04/03/2023, 7:24 PMdaniel
04/03/2023, 7:25 PMAbhinav Dhulipala
04/03/2023, 7:25 PMAbhinav Dhulipala
04/03/2023, 7:25 PMdaniel
04/03/2023, 7:26 PMAbhinav Dhulipala
04/03/2023, 7:28 PMdef make_s3_files_updated_sensor(job: JobDefinition) -> SensorDefinition:
"""Returns a sensor that launches the given job on s3 updates to a provided directory."""
@sensor(name=f"{job.name}_on_files_updated", minimum_interval_seconds=300, job=job)
def s3_files_updated_sensor(context: SensorEvaluationContext):
since_key = context.cursor or None
BUCKET_NAME = "<redacted>"
with build_resources({"s3": s3_prod_resource}) as resources:
new_s3_keys = get_s3_keys(
BUCKET_NAME, prefix=MY_DIRECTORY, since_key=since_key, s3_session=resources.s3
)
<http://context.log.info|context.log.info>(f"new_s3_keys: {len(new_s3_keys)}")
for key in filter(lambda k: k.endswith(".hyper"), new_s3_keys):
yield RunRequest(
run_key=key, run_config={"ops": {"upload_to_db": {"config": {"filename": key}}}}
)
context.update_cursor(key)
return s3_files_updated_sensor
daniel
04/03/2023, 7:29 PMAbhinav Dhulipala
04/03/2023, 7:30 PMdaniel
04/03/2023, 7:36 PMdaniel
04/03/2023, 7:37 PMAbhinav Dhulipala
04/03/2023, 7:44 PMupload_to_db - STEP_FAILURE - Execution of step "upload_to_db" failed.
dagster._core.errors.DagsterExecutionStepExecutionError: Error occurred while executing op "upload_to_db"::
botocore.exceptions.ClientError: An error occurred (404) when calling the HeadObject operation: Not Found
This is because I retried a job, but forget it left our job queue.daniel
04/03/2023, 7:44 PMdaniel
04/03/2023, 8:35 PMdaniel
04/03/2023, 8:35 PMAbhinav Dhulipala
04/03/2023, 8:42 PMdaniel
04/03/2023, 8:45 PMdetails = "Exception calling application: [Errno 5] Input/output error"
I don't see that in general when the code server becomes unavailable. You mentioned there are lots of resources but I wonder if it could be running out of disk or hitting some other I/O related limit?Abhinav Dhulipala
04/03/2023, 9:00 PMERROR: grpcio-health-checking 1.53.0 has requirement grpcio>=1.53.0, but you'll have grpcio 1.47.5 which is incompatible.
From there I rm -rf'd my pip env and did the following
python -m venv venv && source venv/bin/activate && pip install -U pip wheel setuptools && pip install dagster dagit dagster-slack dagster-aws dagster-postgres
The grpcio dep seems to have been the offenderdaniel
04/03/2023, 9:05 PMdaniel
04/03/2023, 9:06 PMAbhinav Dhulipala
04/03/2023, 9:46 PMAbhinav Dhulipala
04/03/2023, 10:39 PMdaniel
04/03/2023, 10:40 PMUddhav Kapadia
04/25/2023, 7:40 PMdaniel
04/25/2023, 7:41 PMUddhav Kapadia
04/25/2023, 7:41 PM