https://dagster.io/ logo
Title
z

Zach

05/31/2022, 3:44 PM
Not sure if #dagster-support would be better for this, but seeing as it is coming from the dagster-cloud library I'll post it here. I'm seeing this failure on a small percentage of ops in a large fan-out (all fan-out ops are running the same step-launcher and op code, some seem to fail in this way):
dagster_cloud.storage.errors.GraphQLStorageError: Error in GraphQL response: [{'message': 'Internal Server Error (Trace ID: 2565201103886791611)', 'locations': [{'line': 15, 'column': 13}], 'path': ['eventLogs', 'getLogsForRun']}]
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/execute_plan.py", line 230, in dagster_event_sequence_for_step
    for step_event in check.generator(step_events):
  File "/usr/local/lib/python3.9/site-packages/etxdagster/dnax/dnax_step_launcher.py", line 466, in launch_step
    step_run_ref = self._step_context_to_step_run_ref(
  File "/usr/local/lib/python3.9/site-packages/etxdagster/dnax/dnax_step_launcher.py", line 557, in _step_context_to_step_run_ref
    return step_context_to_step_run_ref(
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/external_step.py", line 191, in step_context_to_step_run_ref
    upstream_output_events, run_group = _upstream_events_and_runs(step_context)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/execution/plan/external_step.py", line 117, in _upstream_events_and_runs
    step_output_records = step_context.instance.all_logs(
  File "/usr/local/lib/python3.9/site-packages/dagster/utils/__init__.py", line 615, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1289, in all_logs
    return self._event_storage.get_logs_for_run(run_id, of_type=of_type)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/event_logs/storage.py", line 257, in get_logs_for_run
    res = self._execute_query(
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/event_logs/storage.py", line 237, in _execute_query
    res = self._graphql_client.execute(query, variable_values=variables)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/client.py", line 63, in execute
    return self._execute_retry(query, variable_values)
  File "/usr/local/lib/python3.9/site-packages/dagster_cloud/storage/client.py", line 117, in _execute_retry
    raise GraphQLStorageError(f"Error in GraphQL response: {str(result['errors'])}")
a

alex

05/31/2022, 4:25 PM
what version of dagster are you on ? There was a slight mitigation to this last week but more should be released this week
z

Zach

05/31/2022, 4:26 PM
ah okay we're on 0.14.14... I'll try upgrading. are there any consequences to upgrading user-code and agent versions while a job is running? this would be with the EcsRunLauncher
a

alex

05/31/2022, 4:32 PM
I believe the running processes should just continue on with what they have - upgrading should just improve future launched runs. Agent should be fine to upgrade whenever.
👍 2