Binoy Shah
04/12/2023, 2:35 PMvalues.yml
file, If I only do a deployment with Dagster Chart version change and a values.yml
file update to migrate.enabled: true
Would it be sufficient to perform the migration, or are there any other manual steps needed ?
-------
I would then make another PR change to modify the values.yml
file back to migrate.enabled: false
after the successful deployment of previous changeMark Fickett
04/13/2023, 1:55 PMCharlie Bini
04/13/2023, 5:54 PMStephen Bailey
04/14/2023, 1:17 PMexecute_k8s_job
call, how do I raise an error within the underlying job code that will make Dagster recognize it as a failed execution?Binoy Shah
04/17/2023, 3:21 PMenv
and valuesFrom
directive from our custom container spec
env:
{{- if ($.s3).enabled }}
- name: OVERRIDE_S3_ENDPOINT
value: {{ include "s3.endpoint" $ }}
{{- end }}
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: {{ include "s3.secret" $ }}-secret
key: ACCESS_KEY
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: {{ include "s3.secret" $ }}-secret
key: SECRET_KEY
I need to inject values like this for the job runs via dagster too.
Is there any provision that allows me to achieve the similar value copy from secrets,Le Yang
04/21/2023, 3:54 PMAirton Neto
04/24/2023, 12:37 PMcelery_k8s_job_executor
this way:
defs = Definitions(
assets=all_assets,
executor=celery_k8s_job_executor.configured(
{
"env_secrets": [".."],
}
),
)
Is there a way to run code in a local service, maybe using celery aside from Kubernetes or bypassing all cluster deployment?
I want an easier way to run it for testing purposesIgnas Kizelevičius
04/26/2023, 2:20 PMrunLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
runK8sConfig:
podSpecConfig:
serviceAccountName: dagster
Anh Nguyen
04/27/2023, 4:46 AMAnh Nguyen
04/27/2023, 4:46 AMurllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
File "/usr/local/lib/python3.7/site-packages/dagster/_core/instance/__init__.py", line 1839, in launch_run
self.run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
File "/usr/local/lib/python3.7/site-packages/dagster_k8s/launcher.py", line 271, in launch_run
self._launch_k8s_job_with_args(job_name, args, run)
File "/usr/local/lib/python3.7/site-packages/dagster_k8s/launcher.py", line 251, in _launch_k8s_job_with_args
body=job, namespace=container_context.namespace
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/batch_v1_api.py", line 210, in create_namespaced_job
return self.create_namespaced_job_with_http_info(namespace, body, **kwargs) # noqa: E501
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/batch_v1_api.py", line 323, in create_namespaced_job_with_http_info
collection_formats=collection_formats)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
_preload_content, _request_timeout, _host)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
_request_timeout=_request_timeout)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 397, in request
body=body)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 282, in POST
body=body)
File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 174, in request
headers=headers)
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 79, in request
method, url, fields=fields, headers=headers, **urlopen_kw
File "/usr/local/lib/python3.7/site-packages/urllib3/request.py", line 170, in request_encode_body
return self.urlopen(method, url, **extra_kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/poolmanager.py", line 376, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 788, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 710, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 449, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 444, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1373, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 319, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 288, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
Abhishek Agrawal
04/27/2023, 6:34 AMpullPolicy
is set to Always
. When we do a docker push, dagster is not pulling the latest image. Has anyone faced this before? Any fix?Joe
05/03/2023, 7:41 PMSimon Frid
05/04/2023, 12:01 AMMichel Rouly
05/10/2023, 4:18 AMK8sRunLauncher
. Is it possible to configure the K8sRunLauncher
to set certain specific metadata on run pods only, but not on step pods?Daniel
05/10/2023, 12:24 PMAgon Shabi
05/11/2023, 7:45 PMk8s_job_executor
(i.e. each op is run as its own job), cancellation from the UI manages to delete the "parent" k8s job, but its children (the op jobs) silently continue in the background.I'm not sure that this was always the case, but I think you can tighten this up by including an owner reference (https://kubernetes.io/docs/concepts/overview/working-with-objects/owners-dependents/#owner-references-in-object-specifications) when creating the op jobs here: https://github.com/dagster-io/dagster/blob/eb26ca5d1fb69df0034742630e8fcc3f6799c34b/python_modules/libraries/dagster-k8s/dagster_k8s/executor.py#L234-L255
So when a user cancels from the UI, the K8sRunLauncher
deletes the dagster-run-...
job: https://github.com/dagster-io/dagster/blob/eb26ca5d1fb69df0034742630e8fcc3f6799c34b/python_modules/libraries/dagster-k8s/dagster_k8s/launcher.py#L313-L315, and then kubernetes itself will delete the 'owned' dagster-step-...
jobs automatically.
Loving all the work you guys are doing!Eldan Hamdani
05/16/2023, 7:37 AMFraser Marlow
05/17/2023, 2:54 AMJosh Lloyd
05/19/2023, 1:55 AMhelm install
and get all 4 pods (dagit, daemon, user-code, postgres) standing up without error from k8s’ perspective. When I open the dagit UI however the status of the user-code location has that oh so annoying gRPC Error code: UNAVAILABLE
error message.
When I do a kubectl logs <user_code_pod>
I get no logs returned, not a single log line. This is odd because I thought I’d at least see something like Started Dagster code server for file /example_project/example_repo/repo.py on port 3030 in process 1
. In a separate isolated docker container I have proven that the dagster api grpc
command is working and will kick of the server with the expected message above, but I don’t see it in my helm deployment.
My values.yaml
file is minimally altered from the k8s deployment docs example — only enough to add some secrets and point to my local custom user-code image.
I have no idea where to look next to debug. Any thoughts?Zack D.
05/19/2023, 9:58 PMclient = docker.DockerClient(base_url="<unix://var/run/docker.sock>")
We are using EKS 1.26 now. When we were using EKS 1.21, there is no error, and everything was working fine. Now in EKS 1.26, we mount the volume in the K8sRunLauncher in the values.yaml, and set the seucrityContext with privileged
as true and run_as_user
as 0, but it's still not working.
I would be much appreciated if you have any ideas about it. Below is our values.yaml configuration:
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
runK8sConfig:
podSpecConfig:
securityContext:
run_as_user: 0
envSecrets:
- name: dagster-secrets
jobNamespace: ~
loadInclusterConfig: true
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-socket-volume
volumes:
- name: docker-socket-volume
hostPath:
path: /var/run/docker.sock
securityContext:
privileged: true
Mark Fickett
05/20/2023, 11:01 PMPhilippe Laflamme
05/22/2023, 2:53 AMgcs
). These can share a single pod's resources with no problem. I also have a few CPU intensive assets which I'd like to isolate in separate pods. I believe what I need is a custom implementation of RunCoordinator
that can launch runs in different RunLauncher
implementations; potentially, based on tags on those jobs / assets, e.g.: foo/launcher: k8s
Is there already support for something like this? I looked at the RunCoordinator
interface and the different implementations but I'm not sure I understand how they work; any guidance would be appreciated. Another approach is to use the K8sExecutor
(which is a pod per step), but this assumes the use of the K8sLauncher
which means that every run also gets its own pod which I'd like to avoid since it would be overkill for the tiny tasks. Is there another approach I could consider?Haris Akhtar
05/22/2023, 8:06 AMSaar Amitay
05/22/2023, 8:24 AMTimothy Elder
05/22/2023, 8:43 PMjonvet
05/30/2023, 4:53 PM2023-05-30 16:50:58 +0000 - dagster.daemon.EventLogConsumerDaemon - ERROR - Caught error:
sqlalchemy.exc.ProgrammingError: (psycopg2.errors.UndefinedTable) relation "kvs" does not exist
LINE 2: FROM kvs
^
[SQL: SELECT kvs.key, kvs.value
FROM kvs
WHERE kvs.key IN (%(key_1_1)s, %(key_1_2)s)]
[parameters: {'key_1_1': 'EVENT_LOG_CONSUMER_CURSOR-PIPELINE_FAILURE', 'key_1_2': 'EVENT_LOG_CONSUMER_CURSOR-PIPELINE_SUCCESS'}]
(Background on this error at: <https://sqlalche.me/e/14/f405>)
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/daemon.py", line 222, in core_loop
yield from self.run_iteration(workspace_process_context)
File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/auto_run_reexecution/event_log_consumer.py", line 45, in run_iteration
persisted_cursors = _fetch_persisted_cursors(instance, DAGSTER_EVENT_TYPES, self._logger)
File "/usr/local/lib/python3.7/site-packages/dagster/_daemon/auto_run_reexecution/event_log_consumer.py", line 111, in _fetch_persisted_cursors
{_create_cursor_key(event_type) for event_type in event_types}
File "/usr/local/lib/python3.7/site-packages/dagster/_core/storage/runs/sql_run_storage.py", line 1146, in get_cursor_values
KeyValueStoreTable.c.key.in_(keys)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1380, in execute
return meth(self, multiparams, params, _EMPTY_EXECUTION_OPTS)
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 335, in _execute_on_connection
self, multiparams, params, execution_options
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1582, in _execute_clauseelement
cache_hit=cache_hit,
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1944, in _execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 2125, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
raise exception
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
The above exception was caused by the following exception:
psycopg2.errors.UndefinedTable: relation "kvs" does not exist
LINE 2: FROM kvs
^
Stack Trace:
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1901, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
cursor.execute(statement, parameters)
my best guess is that the DB schema has changed.
Question: is this guess correct? and is there a way to fix it by running some kind of DB migration?Zack D.
06/01/2023, 8:35 PMk8sRunLauncher
, but we couldn't get it fully work. After the configuration, we got the ops running under the namespace with the pod name as dasgter-step-<uuid>
. However, the job pod is still running under the default namespace with the pod's name as dagster-run-<uuid>
. We have configured the execution config for the job to pass the job_namespace
attribute, and its value should be dynamic depending on which tenant we are using. Do you have any ideas if we have missed any other configuration? Below is our execution config:
execution:
config:
env_secrets:
- tenantX-secret
job_namespace: tenantX
ops:
op_1: ...
op_2: ...
Nikita Kolodeznov
06/02/2023, 8:46 AMCharlie Bini
06/02/2023, 2:47 PMEnvVar
? I'm trying out 1Password's Connect integration, which syncs credentials with k8s secrets, but the keys within the secrets it creates aren't globally unique, so turning them into environment variables won't work. Here's what one of them looks like:Jayme Edwards
06/02/2023, 3:24 PM