Hi everyone I am curious how it is with logs in dagster As f dagster #ask-community

Hi everyone, I am curious how it is with logs in d...

Jakub Hettler

02/23/2023, 11:03 AM

Hi everyone, I am curious how it is with logs in dagster. As far as I understand - by default

dagster dev

logs are stored in sqlitedb - every dagster run means one new sqlite database. If I change the dagster.yaml file as below I can forward logs into Postgres (I skipped run_storage and schedule_storage)

Copy code

event_log_storage:
  module: dagster_postgres.event_log
  class: PostgresEventLogStorage
  config:
    postgres_db:
      hostname:
        env: DAGSTER_POSTGRES_HOSTNAME
      username:
        env: DAGSTER_POSTGRES_USER
      password:
        env: DAGSTER_POSTGRES_PASSWORD
      db_name:
        env: DAGSTER_POSTGRES_DB
      port: 5432

compute_logs:
  module: dagster_aws.s3.compute_log_manager
  class: S3ComputeLogManager
  config:
    bucket: "***1234"
    skip_empty_files: true

Is there a way how to store event_logs into Cloudwatch or S3? I can store compute_logs to S3 via dagster-aws show above), but still event_logs are saved in Postgres. If I count number of running assests/ops per day this table will have billions of rows in a few days. What to do with old logs? I tried to just delete them from Postgres and everything works fine, unless you can’t see logs of course 🙂 Any best practice hints for the case like this? Thanks!

claire

02/23/2023, 7:06 PM

Hi Jakub. Here's a response from a similar question: https://discuss.dagster.io/t/8549360/Hi-please-can-someone-explain-what-the-event-logs-table-is-i

We support three types of logs in Dagster:

1. Default structured logs. These are things emitted by the Dagster framework like “run started”, “run failed”. We use these events to keep track of run status, track asset materializations over time, power retry events, etc.

2. Custom structured logs. These are logs that are used if, in your code, you do things like
<<http://context.log.info>|<http://context.log.info|context.log.info>>("stuff")
.

3. Compute logs. These are the stdout / stderr that the code emits, including from libraries like pyspark that might be not be captured in the python layer. This often includes the text output from 1 and 2.

We read from the event log to present certain views in Dagit, historically:

A. The run view, where all the events for a particular run is shown

B. Asset materialization views, where the materialization events for a particular asset is shown.

C. Retries of runs, reads from the event log to see if particular steps have succeeded/failed, in order to determine which steps should be executed in the retry.

D. Step duration stats, to determine the history of step durations for a particular run.

Azure blob storage would keep the history of log type 3. From the event log table, it should be safe to delete log. type 2 (would have
dagster_event_type
value of null), but it would affect the appearance of those events in scenario A. Deleting log type 1 would affect scenarios A/B/C/D, and so is much more complicated to do.

claire

02/23/2023, 7:07 PM

As the response above mentions, deleting structured logs would result in losing run history / asset history. If you have logs generated from

<http://context.log.info|context.log.info>(...)

, you can delete these logs without consequence (as they don't have dagster events).

10 Views

Open in Slack

Previous Next