<@U0667DNC02Y> what's the difference between the `...
# ask-ai
b
@Scout what's the difference between the
celery_k8s_job_executor and k8s_job_executor
s
The
celery_k8s_job_executor
and
k8s_job_executor
are both executors used in Dagster for running jobs on Kubernetes, but they have different mechanisms for execution. The
celery_k8s_job_executor
is designed to work with Celery to manage distributed task queues. When using this executor, each step of a job is submitted as a Celery task, which in turn launches an ephemeral Kubernetes pod to execute that step. This setup allows for maintaining global concurrency limits across multiple runs and is suitable for scenarios where you want to leverage Celery's capabilities for distributed task processing. The
celery_k8s_job_executor
must be used with the
CeleryK8sRunLauncher
, which is responsible for launching runs using the executor. On the other hand, the
k8s_job_executor
is intended for executing each step of a job directly as a Kubernetes job without the involvement of Celery. This executor is simpler and does not require the overhead of managing a Celery task queue. It can be used with the
K8sRunLauncher
, which is capable of launching runs with various executors, including the
k8s_job_executor
. In summary, the main difference lies in the use of Celery for distributed task management with the
celery_k8s_job_executor
and the direct use of Kubernetes jobs with the
k8s_job_executor
. The choice between the two would depend on your specific requirements for job execution, such as the need for global concurrency control or a preference for a simpler setup without Celery.
b
@Scout Does the celery launcher requires a redis or a rabbitmq ?
s
Yes, the Celery executor requires a message broker for distributing tasks, and Redis or RabbitMQ are the two most commonly used message brokers with Celery. The documentation indicates that you can configure Celery with either Redis or RabbitMQ as the message broker for task queues. When deploying Dagster with the
CeleryK8sRunLauncher
using Helm, you have the option to enable Redis within the Helm chart configuration. RabbitMQ is also supported but is disabled by default in the example provided in the documentation. You would typically choose one of these message brokers based on your preferences, existing infrastructure, and specific requirements. Here's an example snippet from the documentation showing Redis configuration within a Helm chart:
Copy code
yaml rabbitmq:   enabled: false  redis:   enabled: true   internal: false   host: "redisHost"   port: 6379   brokerDbNumber: 0   backendDbNumber: 0
In this example, RabbitMQ is disabled (
enabled: false
), and Redis is enabled and configured with the necessary connection details. The
internal: false
setting indicates that an external Redis instance is being used, and the
host
and
port
fields specify the connection details for the Redis broker. To use Celery with Dagster, you would need to ensure that your chosen message broker is properly configured and accessible to the Celery workers that will be polling for tasks and executing them.
b
@Scout Can the runlauncher used for a k8s cluster only be one out of other executors or we can use several?
s
In a Dagster deployment, the run launcher is configured at the instance level, which means that you typically have one run launcher per Dagster instance. This run launcher is responsible for initiating runs on the computational resources specified in its configuration. However, while you can only configure one run launcher per instance, you can use multiple executors within that instance. Executors are responsible for executing the steps within a job run, and you can configure them at the job level. This means that different jobs within the same Dagster instance can use different executors, such as the
in_process_executor
,
multiprocess_executor
, or
k8s_job_executor
. For example, if you are using the
K8sRunLauncher
, you can configure it to work with different executors depending on the requirements of each job. You can specify the executor configuration within the run config when you launch a job. Here's an example of how you might configure different executors for different jobs in your `workspace.yaml`:
Copy code
yaml load_from:   - python_file:       relative_path: "path_to_your_job_1.py"       location_name: "location_for_job_1"       job: job_1       config:         execution:           multiprocess: {}    - python_file:       relative_path: "path_to_your_job_2.py"       location_name: "location_for_job_2"       job: job_2       config:         execution:           k8s_job_executor: {}
In this example,
job_1
is configured to use the
multiprocess_executor
, and
job_2
is configured to use the
k8s_job_executor
. Both jobs are part of the same Dagster instance and can be launched by the same
K8sRunLauncher
. It's important to note that while the run launcher is a singleton at the instance level, the executor configuration provides the flexibility to choose how each job's steps are executed. If you need even more customized behavior, you might consider implementing a custom run launcher or using different Dagster instances for different teams or use cases.
b
@Scout what about the helm configuration
s
In the context of Helm configuration for a Dagster deployment on Kubernetes, you typically define the run launcher and other settings in the
values.yaml
file used to configure the Dagster Helm chart. The Helm chart is designed to deploy all the necessary components of a Dagster instance, including Dagit (the web UI), the Dagster daemon, user code deployments, and the run launcher. The run launcher is specified at the instance level in the Helm chart, and similar to the instance configuration, you would typically have one run launcher for the entire Dagster instance. The Helm chart allows you to specify which run launcher to use, and you can configure it to work with Kubernetes. Here's an example snippet from a
values.yaml
file where the
K8sRunLauncher
is configured:
Copy code
yaml dagster:   instance:     runLauncher:       type: K8sRunLauncher       config:         # Configuration for the K8sRunLauncher
This configuration sets up the
K8sRunLauncher
for the Dagster instance. However, within the jobs themselves, you can specify different executors as needed, just as you would in a non-Helm deployment. The executor configuration is part of the job's run config and is not specified in the Helm chart. If you have multiple jobs that require different executors, you would configure those at the job level, not in the Helm chart. Each job would include its executor configuration in its run config, which determines how the steps of that job are executed when a run is launched by the run launcher. In summary, the Helm chart configuration is primarily concerned with setting up the Dagster instance, including the run launcher, while the executor configuration is specific to each job and is not part of the Helm chart settings. This separation of concerns allows you to maintain a single run launcher for the instance while still having the flexibility to use different executors for different jobs.
b
@Scout give me an example of the helm value deployment, and the jobs definitions for the following I choose K8sRunLauncher in the dagster instance level - default executor - I have another job I want to execute on the celery executor
s
As per the documentation, when deploying Dagster on Kubernetes using Helm, you can configure the
K8sRunLauncher
at the instance level in your
values.yaml
file. This configuration will apply to the entire Dagster instance. For a job that requires a different executor, such as the
celery_k8s_job_executor
, you would specify this in the job's run config. Here's an example of how you might set up your Helm
values.yaml
to use the `K8sRunLauncher`:
Copy code
yaml dagster:   instance:     runLauncher:       type: K8sRunLauncher       config:         # Additional configuration for the K8sRunLauncher can be specified here
This configuration sets the
K8sRunLauncher
as the default run launcher for the Dagster instance. Now, let's define two jobs: one that uses the default executor (implicitly the
in_process_executor
if not specified) and another that explicitly uses the
celery_k8s_job_executor
. For the first job, which uses the default executor, you don't need to specify any executor configuration in the job definition. Here's an example of a simple job definition:
Copy code
python from dagster import job  @job def my_default_executor_job():     # Define your ops and job logic here     pass
For the second job, which should use the
celery_k8s_job_executor
, you would specify the executor in the job's run config. Here's an example of how you might define this job:
Copy code
python from dagster import job from dagster_celery_k8s import celery_k8s_job_executor  @job(executor_def=celery_k8s_job_executor) def my_celery_executor_job():     # Define your ops and job logic here     pass
In this example,
my_celery_executor_job
is configured to use the
celery_k8s_job_executor
. When you launch a run for this job, the
K8sRunLauncher
will recognize the executor configuration and execute the job's steps using Celery on Kubernetes. Please note that when using the
celery_k8s_job_executor
, you must also have a Celery broker (such as Redis or RabbitMQ) and a Celery backend configured for result storage. These would be specified in your
values.yaml
file under the appropriate sections for Celery, Redis, and RabbitMQ configurations. Keep in mind that the actual configuration details may vary based on your specific requirements and the version of Dagster and its Helm chart that you are using. Always refer to the latest official Dagster documentation for the most accurate and up-to-date guidance.
b
@Scout If I run this configuration I will not have celery workers up in my cluster.
s
You are correct; the configuration examples provided do not include the setup required for running Celery workers in your Kubernetes cluster. To run jobs with the
celery_k8s_job_executor
, you need to have Celery workers deployed and configured to communicate with a message broker (like Redis or RabbitMQ). To deploy Celery workers in your Kubernetes cluster using Helm, you would typically include additional configuration in your
values.yaml
file to enable Celery, set up the message broker, and configure the Celery workers. Here's an example of how you might configure this in your
values.yaml
file:
Copy code
yaml dagster:   # ... (other Dagster instance configurations)    rabbitmq:     enabled: false  # Assuming you are using Redis as the broker    redis:     enabled: true     internal: true  # Set to false if you are using an external Redis instance    celery:     enabled: true     broker: "<redis://dagster-redis:6379/0>"  # Adjust the broker URL as needed     backend: "<redis://dagster-redis:6379/1>"  # Adjust the backend URL as needed     # Additional Celery configurations can be added here    # ... (other configurations)
In this example, Redis is enabled and configured to be used as the Celery broker and backend. The
celery
section is enabled, which will deploy Celery workers within your Kubernetes cluster. The broker and backend URLs are configured to point to the Redis service deployed as part of the Helm chart (
dagster-redis
), but you would adjust these URLs to match your Redis deployment if you are using an external Redis instance. Additionally, you would need to ensure that your job definitions include the necessary executor configuration to use Celery, as shown in the previous example for
my_celery_executor_job
. Please note that the actual configuration details and the structure of the
values.yaml
file may vary based on the version of the Dagster Helm chart you are using. Always refer to the latest official Dagster documentation for the most accurate and up-to-date guidance on deploying Celery workers and configuring your Dagster instance on Kubernetes.