using k8s run launcher I noticed that when a schedule is tri dagster #ask-ai

using k8s run launcher, I noticed that when a sche...

Rudy Jariwala

03/01/2024, 2:20 AM

using k8s run launcher, I noticed that when a schedule is triggered, the code location pod tried to connect to the database using the postgresql connection parameters that are configured in the dagster instance. Was curious why this needs to happen?

Scout

03/01/2024, 2:20 AM

When using the K8sRunLauncher with Dagster, the code location pod may need to connect to the database configured in the Dagster instance for several reasons. According to the documentation, the Dagster daemon, which is responsible for managing schedules and sensors, periodically checks the runs table in the PostgreSQL database for runs that are ready to be launched. It also submits runs from schedules and sensors. The daemon launches runs via the K8sRunLauncher, which creates a run worker job with the image specified in the user code deployment. This run worker is responsible for executing the launched Dagster runs. During the run, events that occur are written to the database and then displayed in the UI. Therefore, when a schedule is triggered, the code location pod needs to connect to the database to: 1. Check the runs table to determine if there are any runs that need to be launched. 2. Write events to the database as the run progresses, which includes materializations, expectations, and any other structured events that occur during the execution of the run. This connection to the database is essential for the proper functioning of the Dagster instance, as it allows for the coordination between the different components of the system, such as the webserver, the daemon, and the run workers. It ensures that the state of the runs is consistently tracked and that the UI can display the latest information about the runs. The documentation also mentions that the webserver communicates with user code deployments via gRPC to fetch information needed to populate the Dagster UI, and it frequently checks whether the user code deployment has been updated to fetch new information. This is another reason why the code location pod may need to connect to the database, as part of the overall system's operation to maintain up-to-date information about the state of the runs and the code deployments. Dagster k8s job run fails sql connection... button psycopg in a code location causes a conf... button Deploying with Helm | Dagster button Dagster Docs button Thank you for your feedback!

Rudy Jariwala

03/01/2024, 2:24 AM

then why does the code location itself not have a postgresql configuration? I noticed that the dagster daemon pushed down the db connection string to code location instead

Rudy Jariwala

03/01/2024, 2:24 AM

@Scout then why does the code location itself not have a postgresql configuration? I noticed that the dagster daemon pushed down the db connection string to code location instead'

Scout

03/01/2024, 2:24 AM

The code location itself does not have a separate PostgreSQL configuration because it operates within the context of the Dagster instance, which already has a configured connection to the PostgreSQL database. The Dagster instance is responsible for managing the overall state and execution of runs, and it centralizes the configuration for storage, run launchers, and other components. When the Dagster daemon triggers a schedule and launches a run, it communicates with the code location to execute the run. The code location needs access to the PostgreSQL database to read and write run-related data, such as run logs, event records, and other metadata. Instead of each code location having its own separate database configuration, the Dagster daemon provides the necessary database connection information to the code location. This approach simplifies the configuration management and ensures consistency across the system. By pushing down the database connection string to the code location, the Dagster daemon ensures that the code location can interact with the database as needed without requiring separate configuration management for each code location. This design allows for a more streamlined operation, where the central Dagster instance configuration dictates how each component, including code locations, interacts with the database. This centralized configuration approach is consistent with the documentation's description of how Dagster instances are configured and how they manage the execution environment for runs. It allows for a single source of truth for database connections and other instance-wide settings, reducing the complexity of managing multiple configurations across different components.

10 Views

Open in Slack

Previous Next