Hi everyone I wanted to query missing partitions using the f dagster #ask-community

Hi everyone, I wanted to query missing partitions ...

Alexis Manuel

12/14/2022, 2:35 PM

Hi everyone, I wanted to query missing partitions using the following GraphQL query in the GraphQL playground:

Copy code

query test(
  $repositorySelector: RepositorySelector!, 
  $partitionSetName: String!
)
  {
    partitionSetOrError(
      repositorySelector: $repositorySelector
      partitionSetName: $partitionSetName
    ) {
      ... on PartitionSet {
        id
        name
        pipelineName
        partitionsOrError {
          ... on Partitions {
            results {
              name
            }
          }
        }
        partitionStatusesOrError {
          __typename
          ... on PartitionStatuses {
            results {
              id
              partitionName
              runStatus
              runDuration
            }
          }
        }
      }
    }
  }

and I have the following response :

{"error": "Unexpected token '<', \"\n<html><hea\"... is not valid JSON"}

Am I missing something or is it a known problem ? I am using Dagster & Dagit version 1.0.17, deployed on K8S.

jamie

12/14/2022, 8:26 PM

cc @dish

dish

12/14/2022, 8:50 PM

Hi Alexis, if you remove any elements of that query, does it respond correctly? Do other queries respond correctly?

Alexis Manuel

12/14/2022, 8:59 PM

I tried one of the sample queries in the doc, same result:

Alexis Manuel

12/14/2022, 9:00 PM

This one works:

Copy code

query FilteredRunsQuery {
  runsOrError(filter: { statuses: [FAILURE] }) {
    __typename
    ... on Runs {
      results {
        runId
        jobName
        status
        runConfigYaml
        stats {
          ... on RunStatsSnapshot {
            startTime
            endTime
            stepsFailed
          }
        }
      }
    }
  }
}

dish

12/14/2022, 9:05 PM

Can you open your browser dev tools and see what kind of http status code your request is returning?

dish

12/14/2022, 9:05 PM

The unparseable response is almost certainly coming from a non-200, so I’m wondering what kind of response code it actually has

Alexis Manuel

12/14/2022, 9:06 PM

I get a 502 http status code

dish

12/14/2022, 9:06 PM

Perfect, thanks — cc @alex, any of these fields look suspicious to you in terms of leading to a 502?

alex

12/14/2022, 9:08 PM

how many failed runs do you have in your DB? I am guessing you are either timing out or oom-ing your webserver

alex

12/14/2022, 9:10 PM

you could check the logs on the webserver / state of the k8s pods could also set a

limit

on your

runsOrError

call

Alexis Manuel

12/14/2022, 9:12 PM

I have around 700 failed runs in my DB. My original problem is related to the first query, the runsOrError one was just a test to see if the issue was global or not

Alexis Manuel

12/14/2022, 9:14 PM

Which kind of logs should I look for ?

alex

12/14/2022, 9:15 PM

the 502 and html response means that the network ingress in your cluster, likely nginx, is serving the response because the upstream dagit webserver that attempted to handle the request failed in some way

Alexis Manuel

12/14/2022, 9:21 PM

I have a lot of errors related to unreachable user code:

Copy code

Stack Trace:
 File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 535, in _load_location
 location = self._create_location_from_origin(origin)
 File "/usr/local/lib/python3.7/site-packages/dagster/_core/workspace/context.py", line 460, in _create_location_from_origin
 return origin.create_location()
 File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/origin.py", line 329, in create_location
 return GrpcServerRepositoryLocation(self)
 File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/repository_location.py", line 569, in __init__
 list_repositories_response = sync_list_repositories_grpc(self.client)
 File "/usr/local/lib/python3.7/site-packages/dagster/_api/list_repositories.py", line 19, in sync_list_repositories_grpc
 api_client.list_repositories(),
 File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 211, in list_repositories
 res = self._query("ListRepositories", api_pb2.ListRepositoriesRequest)
 File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 141, in _query
 raise DagsterUserCodeUnreachableError("Could not reach user code server") from e
{}
The above exception was caused by the following exception:
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
 status = StatusCode.UNAVAILABLE
 details = "DNS resolution failed for elt:3031: C-ares status is not ARES_SUCCESS qtype=A name=elt is_balancer=0: Could not contact DNS servers"
 debug_error_string = "{"created":"@1671033666.514173394","description":"DNS resolution failed for elt:3031: C-ares status is not ARES_SUCCESS qtype=A name=elt is_balancer=0: Could not contact DNS servers","file":"src/core/lib/transport/error_utils.cc","file_line":167,"grpc_status":14}"

I think it can be related to another problem I have with my infrastructure. Can I mention you in the related thread ?

Alexis Manuel

12/14/2022, 9:26 PM

And a lot of this one too:

Copy code

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1671053109.330323489","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"

Alexis Manuel

12/14/2022, 10:02 PM

I have more errors from dagit pod, I am posting them in case it could be useful to you:

Copy code

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/executor.py", line 452, in resolve_or_error
    return executor.execute(resolve_fn, source, info, **args)
  File "/usr/local/lib/python3.7/site-packages/graphql/execution/executors/sync.py", line 16, in execute
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/schema/external.py", line 195, in resolve_schedules
    for schedule in self._repository.get_external_schedules()
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/schema/external.py", line 195, in <listcomp>
    for schedule in self._repository.get_external_schedules()
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 252, in get_schedule_state
    states = self._get(RepositoryDataType.SCHEDULE_STATES, schedule_name, 1)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 59, in _get
    self._fetch(data_type, limit)
  File "/usr/local/lib/python3.7/site-packages/dagster_graphql/implementation/loader.py", line 174, in _fetch
    instigator_type=InstigatorType.SCHEDULE,
  File "/usr/local/lib/python3.7/site-packages/dagster/_utils/__init__.py", line 640, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/instance/__init__.py", line 1979, in all_instigator_state
    repository_origin_id, repository_selector_id, instigator_type
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/schedules/sql_schedule_storage.py", line 54, in all_instigator_state
    if self.has_instigators_table() and self.has_built_index(SCHEDULE_JOBS_SELECTOR_ID):
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/schedules/sql_schedule_storage.py", line 237, in has_instigators_table
    return self._has_instigators_table(conn)
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/schedules/sql_schedule_storage.py", line 240, in _has_instigators_table
    table_names = db.inspect(conn).get_table_names()
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 267, in get_table_names
    conn, schema, info_cache=self.info_cache
  File "<string>", line 2, in get_table_names
  File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/reflection.py", line 55, in cache

alex

12/14/2022, 10:05 PM

StatusCode.DEADLINE_EXCEEDED

this means the user code server took longer to respond than the timeout , default of 60 sec. I would guess you have a slow/heavy schedule or sensor

Alexis Manuel

12/14/2022, 10:10 PM

Would you say that a partitioned schedule with a lot of partitions could be considered as heavy ? I initially tried to query a schedule with over 100k partitions (each asset is partitioned per 15 minutes since early 2022, with 4 assets in the scheduled job)

alex

12/14/2022, 10:15 PM

so its specifically the

@schedule

@sensor

decorated function that generates

RunRequests

/ config taking longer than 60 seconds

Alexis Manuel

12/14/2022, 10:17 PM

What can I do to fix it ?

alex

12/14/2022, 10:21 PM

make the function in question take less than 60 seconds or increase the timeout using the environment variable

DAGSTER_GRPC_TIMEOUT_SECONDS

Alexis Manuel

12/14/2022, 10:22 PM

I am using the

build_schedule_from_partitioned_job

to create the schedule, should I create my own implementation of it then ?

alex

12/14/2022, 10:28 PM

ah thats useful context - looking at the implementation i dont see anything jump out. How is the partitioned job defined?

Alexis Manuel

12/14/2022, 10:30 PM

First the job itself:

Copy code

my_job = define_asset_job(
    "my_job",
    tags=TAGS,
    selection=[
        "asset1",
        "asset2",
        "asset3",
        "asset4",
        "asset5",
        "asset6",
    ],
    partitions_def=fifteen_minute_partitions,
)

The partition definition:

Copy code

fifteen_minute_partitions = TimeWindowPartitionsDefinition(
    cron_schedule="*/15 * * * *",
    start=datetime(2022, 1, 1, 0, 0, 0),
    fmt="%Y-%m-%d %H:%M",
    timezone="Europe/Paris",
)

alex

12/14/2022, 10:39 PM

flagged for folks on the teams who know this area to take a look if you can repro locally you could identify the problem with a profiler like

py-spy

Alexis Manuel

12/14/2022, 10:42 PM

I will try to do so, thank you for your help ! 🙂

sandy

12/15/2022, 2:47 AM

here's a PR that should speed this up considerably: https://github.com/dagster-io/dagster/pull/11147 I'll see if we can get it into tomorrow's release. if not, it might need to wait until the new year.

❤️ 1

Alexis Manuel

12/15/2022, 6:06 AM

thank you @sandy

92 Views

Open in Slack

Previous Next