Hey team, I have a load of pipelines currently stu...
# ask-community
i
Hey team, I have a load of pipelines currently stuck in the
starting
state. They're not giving errors, but I am getting errors from one of my sensors. Error:
Copy code
dagster._core.errors.SensorExecutionError: Error occurred during the execution of evaluation_fn for sensor notify_on_failure
  File "/usr/local/lib/python3.8/site-packages/dagster/_grpc/impl.py", line 328, in get_external_sensor_execution
    return sensor_def.evaluate_tick(sensor_context)
  File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/errors.py", line 209, in user_code_error_boundary
    raise error_cls(
The above exception was caused by the following exception:
sqlalchemy.exc.ArgumentError: Column expression, FROM clause, or other columns clause element expected, got [Column('id', Integer(), table=<event_logs>, primary_key=True, nullable=False), Column('event', Text(), table=<event_logs>, nullable=False)]. Did you mean to say select(Column('id', Integer(), table=<event_logs>, primary_key=True, nullable=False), Column('event', Text(), table=<event_logs>, nullable=False))?
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/errors.py", line 202, in user_code_error_boundary
    yield
  File "/usr/local/lib/python3.8/site-packages/dagster/_grpc/impl.py", line 328, in get_external_sensor_execution
    return sensor_def.evaluate_tick(sensor_context)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/definitions/sensor_definition.py", line 424, in evaluate_tick
    result = list(self._evaluation_fn(context))
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/definitions/sensor_definition.py", line 594, in _wrapped_fn
    for item in result:
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/definitions/run_status_sensor_definition.py", line 460, in _wrapped_fn
    event_records = context.instance.get_event_records(
  File "/usr/local/lib/python3.8/site-packages/dagster/_utils/__init__.py", line 667, in inner
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/instance/__init__.py", line 1607, in get_event_records
    return self._event_storage.get_event_records(event_records_filter, limit, ascending)
  File "/usr/local/lib/python3.8/site-packages/dagster/_core/storage/event_log/sql_event_log.py", line 874, in get_event_records
    query = db.select(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/_selectable_constructors.py", line 493, in select
    return Select(*entities)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 5219, in __init__
    self._raw_columns = [
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/selectable.py", line 5220, in <listcomp>
    coercions.expect(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 413, in expect
    resolved = impl._literal_coercion(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 652, in _literal_coercion
    self._raise_for_expected(element, argname)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 1143, in _raise_for_expected
    return super()._raise_for_expected(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 711, in _raise_for_expected
    super()._raise_for_expected(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/sql/coercions.py", line 536, in _raise_for_expected
    raise exc.ArgumentError(msg, code=code) from err
And sensor code:
Copy code
@run_failure_sensor(default_status=DefaultSensorStatus.RUNNING)
def notify_on_failure(context: RunFailureSensorContext):
    mode = utils.get_service_mode()
    url = utils.get_dagster_url(mode)

    failure_message = (...)

    sentry.capture_exception(...)

    if mode == 'prod':
        slack.send_message('#data-alerting', failure_message)
Seems to be SQLAlchemy-related, but I can't reproduce locally, which is strange
d
i
Ace, thanks!
🤙 1
d
We’ll be pushing out a pin for this today - sorry for the trouble here
d
I am curious - I was experiencing the same issue but the errors were silent. Where could we enact a timeout so that if an op gets stuck in the
running
state, it will eventually fail?