Title
w

William Reed

08/05/2021, 11:08 PM
Hello #dagster-support, I use istio proxy sidecar containers, so the pods created by dagster job resources in my k8s cluster are failing to reach a completed state since the istio proxy sidecars continue to run even after the main containers finish executing their steps. All I need to do is add a
curl -X POST <http://localhost:15020/quitquitquit>
to the end of the jobs’ commands and it will shutdown the sidecars. How can I do that with the Helm chart though? Thank you.
a

alex

08/06/2021, 3:12 PM
Do you run in a pod-per-solid or pod-per-pipeline-run setup? Either way, one approach would be to make a
@resource
that is a context manager and call this clean up in the
finally
block. How many solids need to add this resource as required resource key depends on your setup. We will look towards a better solution to problems like this in the future, but for now resource tear down should consistently happen at the end of any compute.
something like
@resource
def sidecar_teardown():
  try:
    yield 'placeholder'
  finally:
    shutdown_sidecars()
🙏 1
w

William Reed

08/06/2021, 4:42 PM
Thank you @alex very helpful. The job pods are now completing successfully (great!) but the run pods are running into errors (I think during log fetching).
kubernetes.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 06 Aug 2021 16:26:46 GMT', 'Content-Length': '300'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"a container name must be specified for pod dagster-job-3186e80187c19a2343851c3f37992918-gn8ql, choose one of: [istio-validation istio-proxy dagster-job-3186e80187c19a2343851c3f37992918]","reason":"BadRequest","code":400}\n'

  File "/usr/local/lib/python3.7/site-packages/dagster_celery_k8s/executor.py", line 534, in _execute_step_k8s_job
    raw_logs = retrieve_pod_logs(pod_name, namespace=job_namespace)
  File "/usr/local/lib/python3.7/site-packages/dagster_k8s/utils.py", line 14, in retrieve_pod_logs
    return DagsterKubernetesClient.production_client().retrieve_pod_logs(pod_name, namespace)
  File "/usr/local/lib/python3.7/site-packages/dagster_k8s/client.py", line 501, in retrieve_pod_logs
    name=pod_name, namespace=namespace, _preload_content=False
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 22929, in read_namespaced_pod_log
    return self.read_namespaced_pod_log_with_http_info(name, namespace, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 23062, in read_namespaced_pod_log_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
Using
CeleryK8sRunLauncher
by the way.
a

alex

08/06/2021, 4:45 PM
yea looks like we just need to specify the container name within the pod when we call
read_namespaced_pod_log
@rex / @johann
w

William Reed

08/06/2021, 4:46 PM
Looks like it, yes. I’m currently running my sidecar shutdown function at the end of the pipeline (not the end of each solid), FYI.
I’m not sure if we’re running pod-per-solid/pipeline-- how do I check?
j

johann

08/06/2021, 4:47 PM
If you’re using the celery setup, it’s pod per solid
a

alex

08/06/2021, 4:48 PM
CeleryK8sRunLauncher
implies
CeleryK8sJobExecutor
which issues each solid as a separate job (via celery)
w

William Reed

08/06/2021, 4:48 PM
Ah then perhaps I need to be shutting my sidecars down with each solid then!
j

johann

08/06/2021, 4:48 PM
Alternatively, the K8sRunLauncher + k8s_job_executor also offers the pod per solid but doesn’t use read_namespaced_pod_log, so you might be able to skirt that particular issue
w

William Reed

08/06/2021, 4:49 PM
Good to know actually, thanks.
On deeper inquiry, it looks like the run pod isn’t able to connect to postgres because the istio proxy container has been terminated.
Should I expect the istio proxy to have been shutdown in the run pod even though the pipeline isn’t finished? Hmm…
I’ll keep trying combinations of things here and report back.
@johann @alex Got the example running with the
K8sRunLauncher
. Thanks! 🎉
Following up here, should I create a PR for this or can you guys handle it? @alex @johann
a

alex

08/09/2021, 9:03 PM
issue or PR would be lovely just so it doesn’t get lost
w

William Reed

08/09/2021, 9:18 PM
m

Michael Clawar

12/16/2021, 3:49 PM
Hello, just ran into this same issue when setting up Crowdstrike on the cluster. Added the example to the issue: https://github.com/dagster-io/dagster/issues/4474#issuecomment-995940869