hey does by any chance anyone can suggest something ```This dagster #ask-community

hey does by any chance anyone can suggest somethin...

Pavlo Yemelianov

08/09/2023, 5:47 PM

hey does by any chance anyone can suggest something

Copy code

This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.OperationalError) database is locked
...

but we’re using postgres and i don’t see any locks there one thing to mention that we’re using celery with big amount of parallel tasks

✅ 1

🤖 1

johann

08/09/2023, 7:36 PM

How often do you see this?

Pavlo Yemelianov

08/09/2023, 8:25 PM

it depends on number of celery instance ~10 instances with 8 workers in each everything is fine 30+ instances it crashes everytime

johann

08/09/2023, 8:27 PM

Strange. This is dagster-celery? Or dagster-celery-k8s?

johann

08/09/2023, 8:28 PM

I’m going to guess that part of your instance isn’t pointing at postgres, and is storing things on the host machines with sqlite (the default)

Pavlo Yemelianov

08/09/2023, 8:29 PM

it’s a dagster and celery just installed separately

johann

08/09/2023, 8:29 PM

Could you share your

dagster.yaml

(with secrets removed)

Pavlo Yemelianov

08/09/2023, 8:31 PM

we don’t have it yeah you’re right it’s dagster-k8s here’s dagster-values

Copy code

dagster-user-deployments:
  enabled: true
  deployments:
    - name: data-ops
      image:
        repository: us-central1-docker.pkg.dev/-dev/repository-dev/dagster
        tag: latest
        pullPolicy: Always
      dagsterApiGrpcArgs:
        - -m
        - data_ops
      port: 8000
      env:
        MELTANO_ENVIRONMENT: prod
        GOOGLE_APPLICATION_CREDENTIALS: /etc/service-key/credentials
      volumes:
        - name: service-key
          secret:
            secretName: service-key
      volumeMounts:
        - name: service-key
          mountPath: /etc/service-key
          readOnly: true

generateCeleryConfigSecret: false
global:
  celeryConfigSecretName: "dagster-celery-1111-config-secret"


flower:
  enabled: true

  image:
    repository: "<http://docker.io/mher/flower|docker.io/mher/flower>"
    tag: "0.9.5"
    pullPolicy: Always

  service:
    type: ClusterIP
    annotations: {}
    port: 5555

  # Support Node, affinity and tolerations for Flower pod assignment. See:
  # <https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#nodeselector>
  # <https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity>
  # <https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/>
  annotations: {}
  nodeSelector: {}
  affinity: {}
  tolerations: []
  podSecurityContext: {}
  securityContext: {}

  # Override the default K8s scheduler
  # schedulerName: ~

  resources: {}
  # If you want to specify resources, uncomment the following lines, adjust them as necessary,
  # and remove the curly braces after 'resources:'.
  # limits:
  #   cpu: 100m
  #   memory: 128Mi
  # requests:
  #   cpu: 100m
  #   memory: 128Mi

  # Liveness probe detects when to restart flower.
  # <https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes>
  livenessProbe:
    tcpSocket:
      port: "flower"
    # initialDelaySeconds: 60
    periodSeconds: 20
    failureThreshold: 3


  startupProbe:
    tcpSocket:
      port: "flower"
    periodSeconds: 10
    failureThreshold: 6

postgresql:
  enabled: false
  postgresqlHost: 11.111.111.111
  postgresqlUsername: dagster
  postgresqlPassword: 1111
  postgresqlDatabase: dagster
  service:
    port: 5432

johann

08/09/2023, 8:33 PM

So you’re managing your own celery workers?

Pavlo Yemelianov

08/09/2023, 8:33 PM

yes

johann

08/09/2023, 8:33 PM

(outside of the dagster helm chart)

johann

08/09/2023, 8:34 PM

What dagster executor are you using

johann

08/09/2023, 8:34 PM

The default (multiprocess) and just calling celery task submit yourself?

Pavlo Yemelianov

08/09/2023, 8:35 PM

yep

johann

08/09/2023, 8:36 PM

So the celery workers don’t touch dagster at all? (they don’t have the lib installed, etc.)

johann

08/09/2023, 8:38 PM

And where does the

database is locked

message get logged? The run pods? The webserver?

Pavlo Yemelianov

08/09/2023, 8:41 PM

so, i’m not actually really experienced with it, current setup was made by someone I’m just trying to get it working but i’ll try to explain so we have a job and an op in it op creates list of celery tasks running method with celery

@app.task(autoretry_for=(,), retry_kwargs={'max_retries': 1, 'countdown': 10})

then we’re just waiting for all tasks from list to finish

Copy code

result = group(self.tasks).apply_async()

        # Wait for the tasks to complete
        while not result.ready():
...

and then pipeline continues we have a celery deployment which is managed by KEDA celery uses same docker image as we’re using for dagster deployment

Pavlo Yemelianov

08/09/2023, 8:42 PM

database is locked occurs in 2 places in celery pod and in

dagster-run

job

johann

08/09/2023, 8:46 PM

Do the celery tasks invoke dagster?

johann

08/09/2023, 8:47 PM

I’d guess that the celery workers don’t have the dagster configuration to point at postgres, and are instead writing the sqlite locally

Pavlo Yemelianov

08/09/2023, 8:47 PM

nope celery actually runs meltano

Pavlo Yemelianov

08/09/2023, 8:47 PM

is there a way i can check that ?

Pavlo Yemelianov

08/09/2023, 8:47 PM

if it has or not

johann

08/09/2023, 8:48 PM

If you look inside the webserver pod there will be a

dagster.yaml

that configures it (the helm chart sets it). The same file would be needed inside the celery workers

johann

08/09/2023, 8:49 PM

Is there more of a stack trace for

Copy code

This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.OperationalError) database is locked
...

Pavlo Yemelianov

08/09/2023, 8:49 PM

Copy code

[2023-08-09 20:31:02,955: INFO/ForkPoolWorker-9] This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (sqlite3.OperationalError) database is locked
[2023-08-09 20:31:02,955: INFO/ForkPoolWorker-9] [SQL: UPDATE runs SET last_heartbeat_at=? WHERE runs.id = ?]
[2023-08-09 20:31:02,955: INFO/ForkPoolWorker-9] [parameters: ('2023-08-09 20:30:49.066415', 5)]
[2023-08-09 20:31:02,956: INFO/ForkPoolWorker-9] (Background on this error at: <https://sqlalche.me/e/14/e3q8>) (Background on this error at: <https://sqlalche.me/e/14/7s2a>)

Pavlo Yemelianov

08/09/2023, 8:50 PM

it actually fails either with update heartbeat or insert event_logs

Pavlo Yemelianov

08/09/2023, 8:51 PM

i’ve increased lock time on pg instance and amount of locks

johann

08/09/2023, 8:53 PM

Yeah it seems like the celery workers are using dagster (or else they wouldn’t be trying to make these queries right?) but are using sqlite instead of postgres

Pavlo Yemelianov

08/09/2023, 8:55 PM

yeah i’ve also was thinking same will try adding

dagster.yaml

there

johann

08/09/2023, 8:55 PM

If you set

DAGSTER_HOME

env var to a directory inside the celery workers, and copy the

dagster.yaml

from the webserver to there (can use the configmap), I’d expect it to work

johann

08/09/2023, 8:56 PM

I’m still a little confused about the setup with the celery tasks, but if it works sounds good haha

Pavlo Yemelianov

08/09/2023, 9:00 PM

haha i’m also confiused because i don’t know what to do with it all 😄 i will try your solution thanks but if not which celery tasks setup you would recommend ? we expect to have up to 500 celery instances preferably controlled by some autoscaler

Pavlo Yemelianov

08/09/2023, 10:34 PM

@johann thanks for help ! unfortunately you’re solution haven’t helped, and i was also not able to find any mentions of sqlite on pods thorowing that error but what helped is tuning db (i actually did that before but for some reason it hasn’t saved it ) so increasing those two, i’ve increased performance, it started processing much faster and it fixed the issue with locks

Copy code

max_locks_per_transaction
lock_timeout

idk should i open an issue that it’s showing wrong db connection in error text ?

johann

08/09/2023, 10:58 PM

Tuning Postgres solved this??

Pavlo Yemelianov

08/10/2023, 7:11 AM

yep increasing amount of locks per transaction

Pavlo Yemelianov

08/11/2023, 3:55 PM

sorry it’s not related to dagster at all problem is with meltano they both have sqlite and

runs

tables with similar structure so if anyone will have similar issue check meltano

johann

08/11/2023, 3:56 PM

Makes more sense, thanks for the followup

Open in Slack

Previous Next