< Scout> what s the difference between the `celery k8s job e dagster #ask-ai

<@U0667DNC02Y> what's the difference between the `...

BC A

03/05/2024, 12:23 PM

@Scout what's the difference between the

celery_k8s_job_executor and k8s_job_executor

Scout

03/05/2024, 12:23 PM

The

celery_k8s_job_executor

and

k8s_job_executor

are both executors used in Dagster for running jobs on Kubernetes, but they have different mechanisms for execution. The

celery_k8s_job_executor

is designed to work with Celery to manage distributed task queues. When using this executor, each step of a job is submitted as a Celery task, which in turn launches an ephemeral Kubernetes pod to execute that step. This setup allows for maintaining global concurrency limits across multiple runs and is suitable for scenarios where you want to leverage Celery's capabilities for distributed task processing. The

celery_k8s_job_executor

must be used with the

CeleryK8sRunLauncher

, which is responsible for launching runs using the executor. On the other hand, the

k8s_job_executor

is intended for executing each step of a job directly as a Kubernetes job without the involvement of Celery. This executor is simpler and does not require the overhead of managing a Celery task queue. It can be used with the

K8sRunLauncher

, which is capable of launching runs with various executors, including the

k8s_job_executor

. In summary, the main difference lies in the use of Celery for distributed task management with the

celery_k8s_job_executor

and the direct use of Kubernetes jobs with the

k8s_job_executor

. The choice between the two would depend on your specific requirements for job execution, such as the need for global concurrency control or a preference for a simpler setup without Celery.

BC A

03/05/2024, 2:55 PM

@Scout Does the celery launcher requires a redis or a rabbitmq ?

Scout

03/05/2024, 2:56 PM

Yes, the Celery executor requires a message broker for distributing tasks, and Redis or RabbitMQ are the two most commonly used message brokers with Celery. The documentation indicates that you can configure Celery with either Redis or RabbitMQ as the message broker for task queues. When deploying Dagster with the

CeleryK8sRunLauncher

using Helm, you have the option to enable Redis within the Helm chart configuration. RabbitMQ is also supported but is disabled by default in the example provided in the documentation. You would typically choose one of these message brokers based on your preferences, existing infrastructure, and specific requirements. Here's an example snippet from the documentation showing Redis configuration within a Helm chart:

Copy code

yaml rabbitmq:   enabled: false  redis:   enabled: true   internal: false   host: "redisHost"   port: 6379   brokerDbNumber: 0   backendDbNumber: 0

In this example, RabbitMQ is disabled (

enabled: false

), and Redis is enabled and configured with the necessary connection details. The

internal: false

setting indicates that an external Redis instance is being used, and the

host

and

port

fields specify the connection details for the Redis broker. To use Celery with Dagster, you would need to ensure that your chosen message broker is properly configured and accessible to the Celery workers that will be polling for tasks and executing them.

BC A

03/06/2024, 10:12 AM

@Scout Can the runlauncher used for a k8s cluster only be one out of other executors or we can use several?

Scout

03/06/2024, 10:12 AM

In a Dagster deployment, the run launcher is configured at the instance level, which means that you typically have one run launcher per Dagster instance. This run launcher is responsible for initiating runs on the computational resources specified in its configuration. However, while you can only configure one run launcher per instance, you can use multiple executors within that instance. Executors are responsible for executing the steps within a job run, and you can configure them at the job level. This means that different jobs within the same Dagster instance can use different executors, such as the

in_process_executor

multiprocess_executor

, or

k8s_job_executor

. For example, if you are using the

K8sRunLauncher

, you can configure it to work with different executors depending on the requirements of each job. You can specify the executor configuration within the run config when you launch a job. Here's an example of how you might configure different executors for different jobs in your `workspace.yaml`:

Copy code

yaml load_from:   - python_file:       relative_path: "path_to_your_job_1.py"       location_name: "location_for_job_1"       job: job_1       config:         execution:           multiprocess: {}    - python_file:       relative_path: "path_to_your_job_2.py"       location_name: "location_for_job_2"       job: job_2       config:         execution:           k8s_job_executor: {}

In this example,

job_1

is configured to use the

multiprocess_executor

, and

job_2

is configured to use the

k8s_job_executor

. Both jobs are part of the same Dagster instance and can be launched by the same

K8sRunLauncher

. It's important to note that while the run launcher is a singleton at the instance level, the executor configuration provides the flexibility to choose how each job's steps are executed. If you need even more customized behavior, you might consider implementing a custom run launcher or using different Dagster instances for different teams or use cases.

BC A

03/06/2024, 10:14 AM

@Scout what about the helm configuration

Scout

03/06/2024, 10:14 AM

In the context of Helm configuration for a Dagster deployment on Kubernetes, you typically define the run launcher and other settings in the

values.yaml

file used to configure the Dagster Helm chart. The Helm chart is designed to deploy all the necessary components of a Dagster instance, including Dagit (the web UI), the Dagster daemon, user code deployments, and the run launcher. The run launcher is specified at the instance level in the Helm chart, and similar to the instance configuration, you would typically have one run launcher for the entire Dagster instance. The Helm chart allows you to specify which run launcher to use, and you can configure it to work with Kubernetes. Here's an example snippet from a

values.yaml

file where the

K8sRunLauncher

is configured:

Copy code

yaml dagster:   instance:     runLauncher:       type: K8sRunLauncher       config:         # Configuration for the K8sRunLauncher

This configuration sets up the

K8sRunLauncher

for the Dagster instance. However, within the jobs themselves, you can specify different executors as needed, just as you would in a non-Helm deployment. The executor configuration is part of the job's run config and is not specified in the Helm chart. If you have multiple jobs that require different executors, you would configure those at the job level, not in the Helm chart. Each job would include its executor configuration in its run config, which determines how the steps of that job are executed when a run is launched by the run launcher. In summary, the Helm chart configuration is primarily concerned with setting up the Dagster instance, including the run launcher, while the executor configuration is specific to each job and is not part of the Helm chart settings. This separation of concerns allows you to maintain a single run launcher for the instance while still having the flexibility to use different executors for different jobs.

BC A

03/06/2024, 10:17 AM

@Scout give me an example of the helm value deployment, and the jobs definitions for the following I choose K8sRunLauncher in the dagster instance level - default executor - I have another job I want to execute on the celery executor

Scout

03/06/2024, 10:18 AM

As per the documentation, when deploying Dagster on Kubernetes using Helm, you can configure the

K8sRunLauncher

at the instance level in your

values.yaml

file. This configuration will apply to the entire Dagster instance. For a job that requires a different executor, such as the

celery_k8s_job_executor

, you would specify this in the job's run config. Here's an example of how you might set up your Helm

values.yaml

to use the `K8sRunLauncher`:

Copy code

yaml dagster:   instance:     runLauncher:       type: K8sRunLauncher       config:         # Additional configuration for the K8sRunLauncher can be specified here

This configuration sets the

K8sRunLauncher

as the default run launcher for the Dagster instance. Now, let's define two jobs: one that uses the default executor (implicitly the

in_process_executor

if not specified) and another that explicitly uses the

celery_k8s_job_executor

. For the first job, which uses the default executor, you don't need to specify any executor configuration in the job definition. Here's an example of a simple job definition:

Copy code

python from dagster import job  @job def my_default_executor_job():     # Define your ops and job logic here     pass

For the second job, which should use the

celery_k8s_job_executor

, you would specify the executor in the job's run config. Here's an example of how you might define this job:

Copy code

python from dagster import job from dagster_celery_k8s import celery_k8s_job_executor  @job(executor_def=celery_k8s_job_executor) def my_celery_executor_job():     # Define your ops and job logic here     pass

In this example,

my_celery_executor_job

is configured to use the

celery_k8s_job_executor

. When you launch a run for this job, the

K8sRunLauncher

will recognize the executor configuration and execute the job's steps using Celery on Kubernetes. Please note that when using the

celery_k8s_job_executor

, you must also have a Celery broker (such as Redis or RabbitMQ) and a Celery backend configured for result storage. These would be specified in your

values.yaml

file under the appropriate sections for Celery, Redis, and RabbitMQ configurations. Keep in mind that the actual configuration details may vary based on your specific requirements and the version of Dagster and its Helm chart that you are using. Always refer to the latest official Dagster documentation for the most accurate and up-to-date guidance.

BC A

03/06/2024, 10:20 AM

@Scout If I run this configuration I will not have celery workers up in my cluster.

Scout

03/06/2024, 10:20 AM

You are correct; the configuration examples provided do not include the setup required for running Celery workers in your Kubernetes cluster. To run jobs with the

celery_k8s_job_executor

, you need to have Celery workers deployed and configured to communicate with a message broker (like Redis or RabbitMQ). To deploy Celery workers in your Kubernetes cluster using Helm, you would typically include additional configuration in your

values.yaml

file to enable Celery, set up the message broker, and configure the Celery workers. Here's an example of how you might configure this in your

values.yaml

file:

Copy code

yaml dagster:   # ... (other Dagster instance configurations)    rabbitmq:     enabled: false  # Assuming you are using Redis as the broker    redis:     enabled: true     internal: true  # Set to false if you are using an external Redis instance    celery:     enabled: true     broker: "<redis://dagster-redis:6379/0>"  # Adjust the broker URL as needed     backend: "<redis://dagster-redis:6379/1>"  # Adjust the backend URL as needed     # Additional Celery configurations can be added here    # ... (other configurations)

In this example, Redis is enabled and configured to be used as the Celery broker and backend. The

celery

section is enabled, which will deploy Celery workers within your Kubernetes cluster. The broker and backend URLs are configured to point to the Redis service deployed as part of the Helm chart (

dagster-redis

), but you would adjust these URLs to match your Redis deployment if you are using an external Redis instance. Additionally, you would need to ensure that your job definitions include the necessary executor configuration to use Celery, as shown in the previous example for

my_celery_executor_job

. Please note that the actual configuration details and the structure of the

values.yaml

file may vary based on the version of the Dagster Helm chart you are using. Always refer to the latest official Dagster documentation for the most accurate and up-to-date guidance on deploying Celery workers and configuring your Dagster instance on Kubernetes.

35 Views

Open in Slack

Previous Next