https://dagster.io/ logo
Join the conversationJoin Slack
Channels
announcements
dagster-airbyte
dagster-airflow
dagster-bigquery
dagster-cloud
dagster-cube
dagster-dask
dagster-dbt
dagster-de
dagster-ecs
dagster-feedback
dagster-kubernetes
dagster-noteable
dagster-releases
dagster-serverless
dagster-showcase
dagster-snowflake
dagster-support
dagster-wandb
dagstereo
data-platform-design
events
faq-read-me-before-posting
gigs-freelance
github-discussions
introductions
jobs
random
tools
豆瓣酱帮
Powered by Linen
dagster-support
  • m

    Marc Keeling

    04/20/2022, 10:25 PM
    Is it possible to have a dual partition? essentially combine a date partition and a dynamic partition? So for each day execute a query, get a list of customers and for each customer do a certain operation.
    j
    p
    • 3
    • 3
  • a

    Alec Ryan

    04/21/2022, 12:50 AM
    I have 3 jobs: 1. Load data to s3 2. move data from s3 to snowflake 3. build dbt models Jobs 1,2 rely on partition dates to run. dbt cannot rely on a partition_date directly. How can I pass the date from job 2 to dbt? Can I role them all into the same job?
    :daggy-success: 1
    s
    • 2
    • 9
  • s

    Stefan Adelbert

    04/21/2022, 1:45 AM
    Log Level of Events I'm logging to Google Cloud Logging using a custom logger associated with each job, which allows me to monitor dagster using dashboard and alerts in Google Cloud Monitoring. I'd like to be able to capture certain dagster events in Google Cloud Logging, like
    RUN_SUCCESS
    , but these appear to be logged at the
    DEBUG
    log level. I could lower the log level of my custom logger to
    DEBUG
    , but I don't want to capture
    DEBUG
    logging in general. I there a way to elevate the log level of dagster events from
    DEBUG
    to
    INFO
    ?
    j
    d
    • 3
    • 3
  • h

    Huib Keemink

    04/21/2022, 7:41 AM
    Is it possible to have a different run launcher for each user deployment? For most of my jobs the standard k8s run launcher is perfect, but I have some jobs that need to run more frequently where I’d much rather run inside the (already running) user deployment pod to reduce the overhead
    j
    • 2
    • 4
  • p

    phương đinh

    04/21/2022, 7:47 AM
    hi team, how can "an IOManager of op" access output from upstream op ? in details, i have an op return a list of ids, then i want to pass each id to mssql_io_manager to query some external data. 🙏
    j
    • 2
    • 2
  • a

    Aman Saleem

    04/21/2022, 8:12 AM
    Hello, i have 4 sensors queuing the runs in queue and have set limit of 20 for each tag of runs but max 3 to 5 runs are coming in
    In Progress
    state at a time that taking a lot time for processing runs. Here is my configuration setting. Any suggestion for making it faster will be helpful.
    run_coordinator:
      module: dagster.core.run_coordinator
      class: QueuedRunCoordinator
      config:
        max_concurrent_runs: 50
        tag_concurrency_limits:
        - key: GET_MERCHANT_LISTINGS_ALL_DATA
          limit: 20
        - key: GET_MERCHANT_LISTINGS_ALL_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
          limit: 20
        - key: GET_FBA_FULFILLMENT_REMOVAL_ORDER_DETAIL_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
          limit: 20
        - key: GET_FBA_FULFILLMENT_REMOVAL_SHIPMENT_DETAIL_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
          limit: 20
        - key: GET_FBA_FULFILLMENT_INVENTORY_HEALTH_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
          limit: 20
        - key: GET_FBA_MYI_UNSUPPRESSED_INVENTORY_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: GET_RESERVED_INVENTORY_DATA
          limit: 20
        - key: GET_RESERVED_INVENTORY_DATA
          limit: 2
          value:
            applyLimitPerUniqueValue: true
        - key: list_inventory_supply
          limit: 20
        - key: list_inventory_supply
          limit: 2
          value:
            applyLimitPerUniqueValue: true
    d
    • 2
    • 29
  • s

    Sara

    04/21/2022, 12:06 PM
    I absolutely need to be able to write parameters in a job. I know it is not the protocol, but is there a way to do it? Thanks! @job def my_test_job(): result_1 = my_op1("value") result_2 = my_op2(20)
    :daggy-success: 1
    s
    • 2
    • 1
  • m

    Mark Fickett

    04/21/2022, 3:00 PM
    I'm getting "Error occurred while loading input" that just says "Invariant failed." I think this worked with 0.14.7 and started failing with 0.14.9. Here's the stack trace:
    battery_data_job_local - c49da2c4-6a0f-41b2-a477-f9baa34afc1f - 1741477 - data_pipe_graph_sasquatch_anode.raw_data_graph._publish_raw_data_trace_context - STEP_FAILURE - Execution of step "data_pipe_graph_sasquatch_anode.raw_data_graph._publish_raw_data_trace_context" failed.
    
    dagster.core.errors.DagsterExecutionLoadInputError: Error occurred while loading input "pipe" of step "data_pipe_graph_sasquatch_anode.raw_data_graph._publish_raw_data_trace_context"::
    
    dagster.check.CheckError: Invariant failed.
    
    Stack Trace:
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/core/execution/plan/utils.py", line 47, in solid_execution_error_boundary
        yield
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/core/execution/plan/inputs.py", line 607, in _load_input_with_input_manager
        value = input_manager.load_input(context)
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/core/storage/fs_io_manager.py", line 152, in load_input
        context.add_input_metadata({"path": MetadataValue.path(os.path.abspath(filepath))})
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/core/execution/context/input.py", line 325, in add_input_metadata
        if self.asset_key:
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/core/execution/context/input.py", line 216, in asset_key
        check.invariant(len(matching_input_defs) == 1)
      File "/home/mfickett/Documents/mfickett/dev/data-pipeline/.direnv/python-3.9.5/lib/python3.9/site-packages/dagster/check/__init__.py", line 1167, in invariant
        raise CheckError("Invariant failed.")
    The op in question is:
    @op(
        ins={"start": In(Nothing)},
    )
    def _publish_raw_data_trace_context(context):
        publish_current_trace_context(context)
    And the
    pipe
    parameter it mentions is an enum value (produced as a constant form a different op, passed through a @graph. I'm not really sure where to look next, any suggestions?
    j
    • 2
    • 8
  • m

    Mark Fickett

    04/21/2022, 3:37 PM
    What causes you to get
    Multiprocess executor: child process for step data_pipe_graph_sasquatch_anode.sanitation_graph._publish_sanitation_trace_context unexpectedly exited with code 1
    dagster.core.executor.child_process_executor.ChildProcessCrashException
    instead of the actual stack trace from the child? I've seen it a few times today. I got one for a keyring library error (I think it was a secretstorage.exceptions.ItemNotFoundException but didn't grab it out of the console).
    j
    • 2
    • 1
  • a

    Alec Ryan

    04/21/2022, 4:22 PM
    How can I pass a job run partition_key to another job?
    j
    o
    d
    • 4
    • 33
  • a

    Alec Ryan

    04/21/2022, 4:22 PM
    example, job 1 runs for 2022-04-21. Job 2, receives 2022-04-21 from job1 (can't be partitioned)
    :daggy-success: 1
  • c

    Charlie Bini

    04/21/2022, 4:28 PM
    potentially silly question: is there a way to define multiple values for a
    Field
    in a
    config_schema
    ? Sort of like
    Any
    but restrict it to only one or two types
    z
    j
    • 3
    • 5
  • a

    Arun Kumar

    04/21/2022, 7:05 PM
    Hi team, we are running dagster 0.12.12 and facing complete downtime on dagit from morning. This initially was caused by the liveness probe, we then removed the liveness probe manually. Now the dagit pods are in a health state, however dagit is not up again. From our DB monitoring tool we see the below frequent query performing bad. Any thoughts on what else could we check?
    SELECT job_ticks.id, job_ticks.tick_body 
      FROM job_ticks 
     WHERE job_ticks.job_origin_id = $1 
     ORDER BY job_ticks.id DESC LIMIT $2
    d
    a
    • 3
    • 45
  • l

    Liezl Puzon

    04/21/2022, 7:06 PM
    Is there an easier way to jump to set of runs from 12 hours ago if I have 100-1000s of runs per hour? I would like to jump through multiple pages at once rather than hit the “Older” button but not sure how to use this cursor to do that
    cursor=cc5f4949-61b3-4269-a07e-fb49aac68e0e
    j
    d
    • 3
    • 5
  • n

    Nicholas Buck

    04/21/2022, 8:36 PM
    My company uses SAS 9.4, and we usually send remote commands with powershell scripts. can dagster support running processes that aren't python-based? thanks. Edit: our SAS servers run locally. We do use S3 for warehousing.
    p
    • 2
    • 1
  • b

    Brooke Talcott

    04/21/2022, 9:03 PM
    Are the SkipReasons supposed to surface in dagit on the sensors page? I’m yielding a SkipReason with a message that I thought would show up but the tooltip in dagit shows the run was skipped because “the runs already exist for the existing keys”
    p
    d
    • 3
    • 18
  • j

    jasono

    04/21/2022, 9:21 PM
    Hi, When I run a job that has a few ops with no interdependency, I keep getting this error (Event Type: Engine_Event) from the op that finishes first. Interestingly the op that issues this error is marked green as a success. How can I fix this?
    Exception while cleaning up compute log capture. Exception: Timed out waiting for tail process to start.
    p
    • 2
    • 2
  • a

    Austin Bailey

    04/21/2022, 10:32 PM
    Hi! I recently was trying out the "op_selection" functionality and was doing a bit of testing in launchpad. I had a couple of graph issues so when I did this particular selection it was returning some errors. I have since moved on to other things but the launchpad still pulls up the sub selection and refuses to persist my previous run config that I had in here. I have since run 3 - 4 runs from launchpad and it keeps defaulting back to this view. I probably could clear the browser cache and it would fix this but thought I'd give y'all a heads up. Not sure if this is a known bug... Quick Edit: Just tried clearing cache and no dice... any thoughts? When I go to a run and click "Open in launchpad" it still shows this messed up config
    d
    • 2
    • 9
  • l

    Liezl Puzon

    04/21/2022, 10:35 PM
    I’m trying to terminate runs that are
    Starting
    with the GraphQL API but I’m getting this error. How do I force termination?
    "message": "Run 54c52064-4bb7-49ef-a924-42375ddfd201 could not be terminated due to having status STARTING."
    d
    • 2
    • 4
  • b

    Bryce Baker

    04/22/2022, 1:51 AM
    Hi, attempting to follow the getting started tutorial, but getting stuck at about every turn. I believe I have my environment all set up correctly now and am attempting to run the hello_cereal example. Getting the below when I try
    dagit -f hello_cereal.py
    from within the \jobs directory.
    Error loading repository location hello_cereal.py:FileNotFoundError: [WinError 2] The system cannot find the file specified: 'hello_cereal.py'
    :daggy-success: 1
    l
    • 2
    • 18
  • b

    Bryan Chavez

    04/22/2022, 3:17 AM
    Upgraded to 0.14.10 and the Launchpad pages are completely blank (no configs).
    d
    • 2
    • 3
  • s

    Son Giang

    04/22/2022, 4:07 AM
    Hi everyone. I wonder how to pass partition start and end to every ops in a graph/job? Like do we have some sort of global context for a graph/job so we can use for the ops when needed? It’s such a pain when I have to pass partition information (start/end time) to every ops.
    p
    p
    • 3
    • 2
  • s

    Sanidhya Singh

    04/22/2022, 6:22 AM
    when logging metadata for an asset, is there any option to plot them as a bar chart instead of a line?
    j
    d
    • 3
    • 2
  • g

    geoHeil

    04/22/2022, 11:05 AM
    I face a problem in my dummy dagster job: https://github.com/geoHeil/dagster-ssh-demo here. Sensors trigger the execution of some jobs. However, the dagit UI marks downstream assets as stale - even though the upstream trigger of the sensor should have skipped the job. I know that https://github.com/dagster-io/dagster/issues/7434 is still open - but what concerns me is that I feel getting a similar thing now instead for the Asset graph for the Jobs view as well (for assets)- and wonder if this is a bug in dagit again or if I am somehow using dagster wrong.
    j
    b
    • 3
    • 3
  • m

    Mark Fickett

    04/22/2022, 1:46 PM
    Logging question: I have a few file loggers configured via
    dictConfig
    in an application I'm porting to Dagster. They make use of a logging filter as well as file handlers. Is there a way to make my
    logging.dictConfig
    call get along with Dagster logging setup? I can use the
    python_loggers
    configuration in
    dagster.yaml
    to capture their output into Dagit, but that only works if I remove the
    dictConfig
    call. I see that I could add handlers and formatters via
    dagster_handler_config
    as described in the docs, but it's a little painful to port my current logging config dict setup to
    dagster.yaml
    , and I would also lose the filters.
    j
    o
    • 3
    • 8
  • a

    Aaron Bailey

    04/22/2022, 3:35 PM
    Hey all - Im really new to this and Im a little confused. Trying to implement hooks (specifically msteams) and as I go through the documentation - everything for the hook is referring to @solid & @pipeline. Is this just a matter of the documentation not keeping up to speed of innovation? I believe we're supposed to use ops and jobs now... so just trying to work though it. I appologize for the newbie question and thanks in advance.
    j
    • 2
    • 18
  • g

    geoHeil

    04/22/2022, 4:03 PM
    dagster queued run mount duckDB! mount warehouse_location I am trying to get my dagster pipeline to work inside docker. For this I am following along with: - https://github.com/dehume/big-data-madison-dagster - https://github.com/dagster-io/dagster/tree/master/examples/deploy_docker In particular, https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/dagster.yaml#L11 suggests using
    DockerRunLauncher
    . For both of
    dagit
    and
    dagster-daemon
    I have enabled docker-in-docker by mounting: https://github.com/dagster-io/dagster/blob/master/examples/deploy_docker/docker-compose.yml#L61
    /var/run/docker.sock:/var/run/docker.sock
    But I only get:
    DockerException: Error while fetching server API version: ('Connection aborted.', PermissionError(13, 'Permission denied'))
    
     File "/opt/conda/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1698, in launch_run
        self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
      File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 152, in launch_run
        self._launch_container_with_command(run, docker_image, command)
      File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 97, in _launch_container_with_command
        client = self._get_client(container_context)
      File "/opt/conda/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 72, in _get_client
        client = docker.client.from_env()
      File "/opt/conda/lib/python3.9/site-packages/docker/client.py", line 96, in from_env
        return cls(
      File "/opt/conda/lib/python3.9/site-packages/docker/client.py", line 45, in __init__
        self.api = APIClient(*args, **kwargs)
      File "/opt/conda/lib/python3.9/site-packages/docker/api/client.py", line 197, in __init__
        self._version = self._retrieve_server_version()
      File "/opt/conda/lib/python3.9/site-packages/docker/api/client.py", line 221, in _retrieve_server_version
        raise DockerException(
    when executing:
    docker compose --profile dagster up --build
    I am running Docker for Mac how can I get dagster to work nicely in this setup?
    d
    p
    • 3
    • 114
  • g

    Gabriel Montañola

    04/22/2022, 4:16 PM
    SOLVED: Forgot to update Dagster on custom image.
    0.14.10
    was set only on Helm deployment.
    --- Hi there folks. I'm trying to use this feature from
    0.14.10
    but I'm not sure I'm doing it right. I added this to my
    deployments
    on the Helm chart:
    includeConfigInLaunchedRuns:
              enabled: true
    But jobs are still created without environment variables derived from
    env
    and
    envSecrets
    . Also,
    container_context
    from created jobs is
    null
    . Do I need to change other settings so this can work as intended?
    ✅ 1
    d
    • 2
    • 8
  • l

    Liezl Puzon

    04/22/2022, 5:55 PM
    For Dagster Cloud: How do I find and use an API key for querying https://XYZ.dagster.cloud/prod/graphql via
    curl
    ?
    ✅ 1
    :daggy-success: 1
    • 1
    • 1
  • a

    Arun Kumar

    04/22/2022, 9:59 PM
    Hi team, we ran a migration from 0.12.12 to 0.14.5 yesterday. We had a small hiccup regarding statement timeout while creating index on event logs table, however we were able fix the issue and run another successful migration. I am seeing some of the today's runs are failing with the following error. However, it looks like the actual ops with in the job are running fine, but seeing the below error at the end of the job run. Any thoughts on why this could happen?
    dagster.core.errors.DagsterInstanceMigrationRequired: Instance is out of date and must be migrated (Postgres event log storage requires migration). Database is at revision 9c5f00e80ef2, head is f4b6a4885876. Please run `dagster instance migrate`.
    
    Original exception:
    
    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
    cursor, statement, parameters, context
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
    psycopg2.errors.LockNotAvailable: canceling statement due to lock timeout
    
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
    File "/usr/local/lib/python3.7/site-packages/dagster/core/storage/sql.py", line 62, in handle_schema_errors
    yield
    File "/usr/local/lib/python3.7/site-packages/dagster_postgres/utils.py", line 166, in create_pg_connection
    yield conn
    File "/usr/local/lib/python3.7/site-packages/dagster_postgres/event_log/event_log.py", line 153, in store_event
    (res[0] + "_" + str(res[1]),),
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1296, in execute
    future=False,
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1602, in _exec_driver_sql
    distilled_parameters,
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1863, in _execute_context
    e, statement, parameters, cursor, context
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2044, in _handle_dbapi_exception
    sqlalchemy_exception, with_traceback=exc_info[2], from_=e
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
    raise exception
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
    cursor, statement, parameters, context
    File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
    cursor.execute(statement, parameters)
    sqlalchemy.exc.OperationalError: (psycopg2.errors.LockNotAvailable) canceling statement due to lock timeout
    
    [SQL: NOTIFY run_events, %s; ]
    [parameters: ('da8a1351-3190-4f25-8b26-825ac7ed42c5_14158049',)]
    (Background on this error at: <https://sqlalche.me/e/14/e3q8>)
    
    
      File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 748, in pipeline_execution_iterator
        for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
      File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 244, in execute
        event_specific_data=EngineEventData.multiprocess(os.getpid()),
      File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 897, in engine_event
        step_handle=step_handle,
      File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 312, in from_pipeline
        log_pipeline_event(pipeline_context, event)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 221, in log_pipeline_event
        dagster_event=event,
      File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 336, in log_dagster_event
        self.log(level=level, msg=msg, extra={DAGSTER_META_KEY: dagster_event})
      File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 351, in log
        self._log(level, msg, args, **kwargs)
      File "/usr/local/lib/python3.7/logging/__init__.py", line 1514, in _log
        self.handle(record)
      File "/usr/local/lib/python3.7/logging/__init__.py", line 1524, in handle
        self.callHandlers(record)
      File "/usr/local/lib/python3.7/logging/__init__.py", line 1586, in callHandlers
        hdlr.handle(record)
      File "/usr/local/lib/python3.7/logging/__init__.py", line 894, in handle
        self.emit(record)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 243, in emit
        handler.handle(dagster_record)
      File "/usr/local/lib/python3.7/logging/__init__.py", line 894, in handle
        self.emit(record)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 135, in emit
        self._instance.handle_new_event(event)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 1174, in handle_new_event
        self._event_storage.store_event(event)
      File "/usr/local/lib/python3.7/site-packages/dagster_postgres/event_log/event_log.py", line 153, in store_event
        (res[0] + "_" + str(res[1]),),
      File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
        self.gen.throw(type, value, traceback)
      File "/usr/local/lib/python3.7/site-packages/dagster_postgres/utils.py", line 166, in create_pg_connection
        yield conn
      File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
        self.gen.throw(type, value, traceback)
      File "/usr/local/lib/python3.7/site-packages/dagster/core/storage/sql.py", line 83, in handle_schema_errors
        ) from None
    d
    p
    h
    • 4
    • 25
Powered by Linen
Title
a

Arun Kumar

04/22/2022, 9:59 PM
Hi team, we ran a migration from 0.12.12 to 0.14.5 yesterday. We had a small hiccup regarding statement timeout while creating index on event logs table, however we were able fix the issue and run another successful migration. I am seeing some of the today's runs are failing with the following error. However, it looks like the actual ops with in the job are running fine, but seeing the below error at the end of the job run. Any thoughts on why this could happen?
dagster.core.errors.DagsterInstanceMigrationRequired: Instance is out of date and must be migrated (Postgres event log storage requires migration). Database is at revision 9c5f00e80ef2, head is f4b6a4885876. Please run `dagster instance migrate`.

Original exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
psycopg2.errors.LockNotAvailable: canceling statement due to lock timeout


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dagster/core/storage/sql.py", line 62, in handle_schema_errors
yield
File "/usr/local/lib/python3.7/site-packages/dagster_postgres/utils.py", line 166, in create_pg_connection
yield conn
File "/usr/local/lib/python3.7/site-packages/dagster_postgres/event_log/event_log.py", line 153, in store_event
(res[0] + "_" + str(res[1]),),
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1296, in execute
future=False,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1602, in _exec_driver_sql
distilled_parameters,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1863, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2044, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1820, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.errors.LockNotAvailable) canceling statement due to lock timeout

[SQL: NOTIFY run_events, %s; ]
[parameters: ('da8a1351-3190-4f25-8b26-825ac7ed42c5_14158049',)]
(Background on this error at: <https://sqlalche.me/e/14/e3q8>)


  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 748, in pipeline_execution_iterator
    for event in pipeline_context.executor.execute(pipeline_context, execution_plan):
  File "/usr/local/lib/python3.7/site-packages/dagster/core/executor/multiprocess.py", line 244, in execute
    event_specific_data=EngineEventData.multiprocess(os.getpid()),
  File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 897, in engine_event
    step_handle=step_handle,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 312, in from_pipeline
    log_pipeline_event(pipeline_context, event)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/events/__init__.py", line 221, in log_pipeline_event
    dagster_event=event,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 336, in log_dagster_event
    self.log(level=level, msg=msg, extra={DAGSTER_META_KEY: dagster_event})
  File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 351, in log
    self._log(level, msg, args, **kwargs)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1514, in _log
    self.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1524, in handle
    self.callHandlers(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 1586, in callHandlers
    hdlr.handle(record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 894, in handle
    self.emit(record)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/log_manager.py", line 243, in emit
    handler.handle(dagster_record)
  File "/usr/local/lib/python3.7/logging/__init__.py", line 894, in handle
    self.emit(record)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 135, in emit
    self._instance.handle_new_event(event)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/instance/__init__.py", line 1174, in handle_new_event
    self._event_storage.store_event(event)
  File "/usr/local/lib/python3.7/site-packages/dagster_postgres/event_log/event_log.py", line 153, in store_event
    (res[0] + "_" + str(res[1]),),
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/dagster_postgres/utils.py", line 166, in create_pg_connection
    yield conn
  File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/storage/sql.py", line 83, in handle_schema_errors
    ) from None
d

daniel

04/22/2022, 10:05 PM
Hi Arun - that error looks to me like the
dagster instance migrate
call may not have actually finished? Not sure if that's the root cause of the bug - but
Database is at revision 9c5f00e80ef2, head is f4b6a4885876
implies to me that it didn't actually get all the way to the end of the schema migration
Do you have the output from when you ran the schema migration? I'm assuming you ran through some version of the guide here https://docs.dagster.io/deployment/guides/kubernetes/how-to-migrate-your-instance#overview
✅ 1
just a shot in the dark here - there's no chance these jobs were running while the schema migration was happening, is there?
a

Arun Kumar

04/22/2022, 10:34 PM
Hi @daniel, unfortunately I don't have access to the migrate job logs anymore 😞 But I am quite sure it succeeded. So when we ran the migration, it failed for the first time ~with the above exact error related to timeout (~Correction: It failed with a different statement timeout error). Then we increased the timeout on the DB and ran the
dagster instance migrate
job again which eventually succeeded. Not sure how that error still appears in the job. Is it possible that DB went to an inconsistent state due to the first error and the second migration did not fix it?
Nope, we ran the migration yesterday night. These jobs started today
p

prha

04/22/2022, 10:41 PM
Are you using the same postgres db for all your storages? Can we check the alembic hash by stored in the DB:
select * from alembic_version;
a

Arun Kumar

04/22/2022, 10:44 PM
Yes, we use the same DB. Yes, I can get that
p

prha

04/22/2022, 10:45 PM
Actually, is it possible that some of your run workers are still on an old version?
a

Arun Kumar

04/22/2022, 10:45 PM
I made a correction in my previous msg. During migration we did not see the above error, we saw a different statement timeout error which we fixed by increasing the timeout
@prha here is the alembic version
version_num
--------------
 9c5f00e80ef2
(1 row)
p

prha

04/22/2022, 10:47 PM
The error message
Database is at revision 9c5f00e80ef2, head is f4b6a4885876
reflects that your DB is in the most recent known migration state as of
0.14.5
, but that the code raising the exception believes that the last known migration is one from
0.12.11
a

Arun Kumar

04/22/2022, 10:54 PM
where is the expected revision coming from? Also, does the lock timeout error provides any clue?
p

prha

04/22/2022, 10:57 PM
The last known revision is marked as
head
, which is
f4b6a4885876
. In our sequence of migrations,
f4b6a4885876
is an earlier, older migration than the revision that your DB is currently marked at, which is
9c5f00e80ef2
. This tells me that you did in fact successfully run the schema migration.
The fact that the last known revision in the stack trace is
f4b6a4885876
tells me that the dagster version of the code raising the exception is older than the dagster version of the code that actually migrated the DB.
I’m not sure about the significance of the lock timeout error…
My best guess is that it’s transient, but not sure.
h

Hebo Yang

04/22/2022, 11:26 PM
@daniel I thought that instance migrate error msg is a wrapper to all psql errors?
d

daniel

04/22/2022, 11:29 PM
It wraps psql errors where it detects the alembic revision is different than what it thinks is the latest revision. prha’s theory about why it is firing here (the version of dagster running the code that is storing the event is slightly older so it has a different latest revision) makes sense to me as an explanation for why
h

Hebo Yang

04/22/2022, 11:33 PM
I see...so the psql operation failed and resulted in a diff in rev.
p

prha

04/22/2022, 11:34 PM
yes, that’s my read on it…
h

Hebo Yang

04/22/2022, 11:34 PM
We are still using 0.12.12 for our user code repo but I thought they are compatible with 14.5?
p

prha

04/22/2022, 11:37 PM
I do think that they are compatible… but it does mean that you will get some misleading error messages when some DB exceptions occur.
:thankyou: 1
✅ 1
d

daniel

04/22/2022, 11:38 PM
It's also possible that despite being compatible, that upgrading the version there might help (the code should still work unchanged, but it's possible that if it's writing using the dagster libraries in 0.14.5 that it is more performant)
:thankyou: 1
✅ 1
h

Hebo Yang

04/22/2022, 11:42 PM
Thanks folks!
a

Arun Kumar

04/29/2022, 2:08 AM
Closing the loop here. Upgrading the repository to 0.14.5 fixed the issue 👍
View count: 1