Hello I just updated to version 1 1 10 in order to solve the dagster #ask-community

Hello, I just updated to version 1.1.10 in order t...

Alexis Manuel

01/13/2023, 8:14 AM

Hello, I just updated to version 1.1.10 in order to solve the

build_asset_reconciliation_sensor

bug related to Source Asset. I have now the following error during sensor tick:

Copy code

dagster._core.errors.DagsterUserCodeUnreachableError: The sensor tick timed out due to taking longer than 60 seconds to execute the sensor function. One way to avoid this error is to break up the sensor work into chunks, using cursors to let subsequent sensor calls pick up where the previous call left off.
  File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/sensor.py", line 491, in _process_tick_generator
    sensor_debug_crash_flags,
  File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/sensor.py", line 558, in _evaluate_sensor
    instigator_data.cursor if instigator_data else None,
  File "/usr/local/lib/python3.7/site-packages/dagster/_core/host_representation/repository_location.py", line 830, in get_external_sensor_execution_data
    cursor,
  File "/usr/local/lib/python3.7/site-packages/dagster/_api/snapshot_sensor.py", line 72, in sync_get_external_sensor_execution_data_grpc
    timeout=timeout,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 403, in external_sensor_execution
    custom_timeout_message=custom_timeout_message,
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 184, in _streaming_query
    e, timeout=timeout, custom_timeout_message=custom_timeout_message
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 137, in _raise_grpc_exception
    ) from e
The above exception was caused by the following exception:
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1673597250.934518131","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":81,"grpc_status":4}"
>
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 180, in _streaming_query
    method, request=request_type(**kwargs), timeout=timeout
  File "/usr/local/lib/python3.7/site-packages/dagster/_grpc/client.py", line 168, in _get_streaming_response
    yield from getattr(stub, method)(request, metadata=self._metadata, timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
    return self._next()
  File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
    raise self

I built the sensor using following code:

Copy code

from dagster import AssetSelection, build_asset_reconciliation_sensor

freshness_sla_sensor = build_asset_reconciliation_sensor(
    name="freshness_sla_sensor", asset_selection=AssetSelection.all()
)

My assets are mainly loaded from a dbt project which has around 30 different models and sources Is there any way to know the root cause of the sensor tick timeout ?

owen

01/13/2023, 5:41 PM

hi @Alexis Manuel! Thanks for the report -- I'm currently working on a host of performance improvements for this sensor, but I'm a bit surprised that you're hitting this timeout with only around 30 different models (generally, I would expect the current version of the sensor to start to take >60s closer to the ~1000 assets mark). A few questions just to help inform if the current fixes will solve your issue: 1. Are you using partitions at all? 2. Are you using FreshnessPolicies? If so, how many assets have freshness policies defined? 3. Do you have a large history of asset materializations before turning this sensor on?

Alexis Manuel

01/13/2023, 6:04 PM

1. I have 4 models out of the 40 which are partitioned, but they have quite a lot of partitions (they have a 15 minute partitions startig on January 1st 2022) 2. I only have 3 assets using Freshness Policies, which are 3 dbt models defining their policies as dbt config. They are the reason why I am interested in testing the

build_asset_reconciliation_sensor

functionnality to ensure an efficient dbt graph scheduling. 3. The most surprising fact to me was that it occured on a fresh local PostgresDb with no prior runs and materializations

owen

01/13/2023, 6:29 PM

Thanks for that info! I'm going to see if a) I can replicate your experience with a similar setup and b) if the planned fixes will solve the issue (I'm hopeful they will). I'm guessing these partitioned assets are upstream of your dbt graph?

Alexis Manuel

01/13/2023, 6:57 PM

Yes, they are raw data fetched every 15 minutes from an API

owen

01/17/2023, 10:32 PM

hi @Alexis Manuel -- just an update on this, I was able to replicate this performance issue, and can confirm that it will be resolved with the collection of performance improvements that we're rolling out this week.

🙏 1

Alexis Manuel

01/18/2023, 7:55 AM

Thank you @owen for the update

Charlie Bini

01/20/2023, 6:54 PM

hi @owen I just started having this same issue as well. are the improvements you mentioned already rolled out? my sensor covers only ~50 assets that are hourly partitioned starting yesterday

Alexis Manuel

01/20/2023, 6:55 PM

@Charlie Bini The 1.1.11 update solved everything for me

🧐 1

Charlie Bini

01/20/2023, 6:58 PM

dang I'm there as well

Alexis Manuel

01/20/2023, 7:13 PM

Maybe it's because you have far more partitioned assets than I do. I only have 4 partitioned one out of the 50 I have now

Charlie Bini

01/20/2023, 10:12 PM

yeah that's gotta be it. it runs now after I broke the one up into multiple sensors

owen

01/20/2023, 10:25 PM

@Charlie Bini I looked into this, and it is a performance issue with some of the partition mapping code (specifically impacting the first tick of the sensor after it's been added, as far as I can see). working towards a resolution at the moment

Charlie Bini

01/20/2023, 10:27 PM

thanks @owen!

Charlie Bini

01/20/2023, 10:28 PM

splitting them up into multiple sensors using AssetSelection seems to be a good workaround in the meantime

Thomas Rolfsnes

03/17/2023, 1:23 PM

Hi @owen! I'm also experiencing this issue. Asset selection contains 4 weekly, 1 daily and one unpartitioned asset. Running on DagsterCloud.

Thomas Rolfsnes

03/17/2023, 1:24 PM

on a local dagit instance it runs fine!

Thomas Rolfsnes

03/17/2023, 1:24 PM

so I'm assuming it has to do with this point? 1. Do you have a large history of asset materializations before turning this sensor on?

owen

03/17/2023, 3:24 PM

hi @Thomas Rolfsnes! what version of dagster are you running on at the moment? there have been a bunch of performance improvements to the sensor in the past couple of months, so it's likely that upgrading dagster would resolve the issue

Thomas Rolfsnes

03/20/2023, 9:09 AM

@owen Hi! I was already at 1.1.21, but updated to 1.2.2 now. Do note that we're using DagsterCloud. However I still get the following error:

Copy code

dagster._core.errors.DagsterUserCodeUnreachableError: Timed out waiting for call to user code GET_EXTERNAL_SENSOR_EXECUTION_DATA [7a43fc45-6780-4db0-9386-a5824a4f5026]
  File "/dagster-graphql/dagster_graphql/schema/instigation.py", line 227, in resolve_evaluationResult
    sensor_data = repository_location.get_external_sensor_execution_data(
  File "/dagster-cloud-backend/dagster_cloud_backend/user_code/workspace.py", line 647, in get_external_sensor_execution_data
    result = self.api_call(
  File "/dagster-cloud-backend/dagster_cloud_backend/user_code/workspace.py", line 382, in api_call
    return dagster_cloud_api_call(
  File "/dagster-cloud-backend/dagster_cloud_backend/user_code/workspace.py", line 131, in dagster_cloud_api_call
    for result in gen_dagster_cloud_api_call(
  File "/dagster-cloud-backend/dagster_cloud_backend/user_code/workspace.py", line 246, in gen_dagster_cloud_api_call
    raise DagsterUserCodeUnreachableError(

owen

03/20/2023, 5:23 PM

thanks for that info -- we did merge in a performance improvement for the

unpartitioned -> partitioned asset

case in the asset reconciliation logic (going out this week), which may resolve the problem, but it's definitely odd to me that an asset selection of that size would be hitting this time out (as it's orders of magnitude smaller than other graphs that complete much faster). I'm thinking that this is somehow hitting an edge case and generating way more queries than necessary, so I'm going to dig in a bit and see if I can figure out why you're seeing this

Thomas Rolfsnes

03/20/2023, 5:42 PM

Let me know if there's any more info I can provide, thanks!

5 Views

Open in Slack

Previous Next