Hi Dagster folks In another Dagster instance of mine my team dagster #deployment-kubernetes

Hi Dagster folks! In another Dagster instance of m...

Kevin

08/21/2020, 12:10 AM

Hi Dagster folks! In another Dagster instance of mine (my team is kind of having a hunger games for Dagster instances 😄 ), I've come across an error trying to get GRPC to work. It's a K8s deployment at 0.9.2 with a modified dagster_aws module to allow boto3 endpoint_url. Attached is the error from dagit. Any help is appreciated! :D

GRPC-pipeline-packed-error.log

Kevin

08/21/2020, 2:26 AM

With the very recent update to 0.9.3 the error has changed to something along the lines of:

[generate_number_19.compute]: (TypeError) - TypeError: _execute_step_k8s_job() got an unexpected keyword argument 'user_defined_k8s_config_dict

log file attached, again any help is appreciated :D

GRPC_user_defined_k8s_config_dict.log

max

08/21/2020, 1:04 PM

is everything on the same dagster version?

Kevin

08/21/2020, 2:23 PM

oh you betcha! 0.9.3!

sashank

08/21/2020, 4:16 PM

@Kevin my guess is that your celery workers aren’t actually running 0.9.3, since

user_defined_k8s_config_dict

should be a valid arg in 0.9.3

sashank

08/21/2020, 4:17 PM

There are three images you need to update when upgrading dagster: the

dagit

image, the

celery

image, and the

pipeline_run

image

sashank

08/21/2020, 4:17 PM

Is there any way you can verify which version of dagster the

celery

images are running?

Kevin

08/21/2020, 4:48 PM

@sashank okay i'll double check that, but i'm pretty sure they are all 0.9.3 as my

dagit

, and

celery

images use the same image

Kevin

08/21/2020, 7:03 PM

hard to see how any of the dagster modules would not be the latest at 0.9.3 docker image:

Copy code

FROM python:3.7.7

ENV DAGSTER_VERSION=0.9.3

# Cron is required to use scheduling in Dagster
RUN apt-get update && apt-get install -yqq cron


# PIP install for git functions

RUN mkdir -p /opt/dagster/dagster_home /opt/dagster/app

RUN pip install \
     dagster==${DAGSTER_VERSION} \
     dagster-graphql==${DAGSTER_VERSION} \
     dagster-celery[flower,redis,kubernetes]==${DAGSTER_VERSION} \
     dagster-cron==${DAGSTER_VERSION} \
     dagit==${DAGSTER_VERSION} \
     dagster-postgres==${DAGSTER_VERSION} \
     dagster-pandas==${DAGSTER_VERSION} \
     dagster-gcp==${DAGSTER_VERSION} \
     dagster-k8s==${DAGSTER_VERSION} \
     dagster-airflow==${DAGSTER_VERSION} \
     dagster-celery-k8s==${DAGSTER_VERSION} \
     dagster-celery-docker==${DAGSTER_VERSION}

ADD build_cache/ /
RUN pip install -e dagster-aws

# Copy your pipeline code and entrypoint.sh to /opt/dagster/app
COPY *.py *.yaml /opt/dagster/app/
COPY .aws /root/.aws

# Copy dagster instance YAML to $DAGSTER_HOME
ENV DAGSTER_HOME=/opt/dagster/dagster_home
COPY dagster.yaml $DAGSTER_HOME

WORKDIR /opt/dagster/app

sashank

08/21/2020, 7:38 PM

That Dockerfile definitely looks right

sashank

08/21/2020, 7:38 PM

Is there anyway you can ssh into the celery workers and check the dagster version?

sashank

08/21/2020, 7:53 PM

For example, using

kubectl

, you can do:

sashank

08/21/2020, 7:54 PM

Copy code

kubectl get pods

to get the name of the pod running the celery workers

sashank

08/21/2020, 7:55 PM

Copy code

k exec {pod_name} dagster -- --version

sashank

08/21/2020, 7:55 PM

To run the dagster version command in that pod

Kevin

08/24/2020, 4:59 PM

hi @sashank

k exec -n <namespace> <pod> dagster --version

dagster, version 0.9.3

it seems as though i have encountered a different issue with each of my attempts to fix it:

Copy code

FileNotFoundError: [Errno 2] No such file or directory: '/opt/dagster/app/celery_simple_pipeline.py'
  File "/usr/local/lib/python3.7/site-packages/dagster/cli/api.py", line 337, in _execute_run_command_body
    for event in execute_run_iterator(recon_pipeline, pipeline_run, instance):
  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 74, in execute_run_iterator
    step_keys_to_execute=pipeline_run.step_keys_to_execute,
  File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py", line 558, in create_execution_plan
    pipeline_def = pipeline.get_definition()
  File "/usr/local/lib/python3.7/site-packages/dagster/core/definitions/reconstructable.py", line 98, in get_definition
    self.repository.get_definition()
  File "/usr/local/lib/python3.7/site-packages/dagster/core/definitions/reconstructable.py", line 37, in get_definition
    return repository_def_from_pointer(self.pointer)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/definitions/reconstructable.py", line 341, in repository_def_from_pointer
    target = def_from_pointer(pointer)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/definitions/reconstructable.py", line 290, in def_from_pointer
    target = pointer.load_target()
  File "/usr/local/lib/python3.7/site-packages/dagster/core/code_pointer.py", line 250, in load_target
    module = load_python_file(self.python_file, self.working_directory)
  File "/usr/local/lib/python3.7/site-packages/dagster/core/code_pointer.py", line 87, in load_python_file
    return import_module_from_path(module_name, python_file)
  File "/usr/local/lib/python3.7/site-packages/dagster/seven/__init__.py", line 115, in import_module_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 724, in exec_module
  File "<frozen importlib._bootstrap_external>", line 859, in get_code
  File "<frozen importlib._bootstrap_external>", line 916, in get_data

always something XD thank you again for your help!

sashank

08/24/2020, 5:20 PM

cc @alex @daniel any ideas?

sashank

08/24/2020, 5:20 PM

Also @Kevin are you not running into the celery error anymore?

daniel

08/24/2020, 5:24 PM

Hi Kevin - just confirming, that file does in fact exist on the machine where the execution is happening (but Dagster is saying that it doesn't)?

daniel

08/24/2020, 5:26 PM

and I guess another question (just understanding your setup) - you said you're trying to get gRPC to work, that means you have a gRPC server running as well I assume, is that in the same image as the workers that are doing the execution?

Kevin

08/24/2020, 6:16 PM

@daniel @sashank Should i exec onto the pod in question and perform:

dagster pipeline execute -f <pipeline.py>

it executes no problem, and the celery error doesn't appear anymore given my changes to the values.yaml and configmap-instance.yaml -- my attempts at fixing it any files you guys want to look at do let me know!

daniel

08/24/2020, 6:18 PM

thanks, could we see your workspace/dagster.yaml as well?

alex

08/24/2020, 6:19 PM

how are you deploying in to k8s, helm?

Kevin

08/24/2020, 6:30 PM

Hi 🙂 yes i am deploying via the helm chart from your repo -- with my own values.yaml here are the files 🙂

workspace.yaml dagster.yaml

alex

08/24/2020, 6:34 PM

can we see what

values.yaml

you are using too?

Kevin

08/24/2020, 6:40 PM

ask and you shall receive ;)

values.yaml

alex

08/24/2020, 6:43 PM

ok some things that stand out to me are

userDeployments.enabled

false

and the

pipeline_run

section has been commented out

alex

08/24/2020, 6:45 PM

i think since you are trying to get grpc working - you want to set

userDeployments.enabled

true

Kevin

08/24/2020, 6:48 PM

Yeah, initially i tried setting

userDeployments.enabled

true

but that lead to the error i mentioned above of:

Copy code

[generate_number_0.compute]: (TypeError) - TypeError: _execute_step_k8s_job() got an unexpected keyword argument 'user_defined_k8s_config_dict'

i'll uncomment the pipeline_run section but i seem to remember an error when i did that

Kevin

08/24/2020, 6:53 PM

okay something different, but i can't say whether it is better 😄

Copy code

dagster.core.errors.DagsterSubprocessError: During celery execution errors occurred in workers:
[not_much_1.compute]: (ParameterCheckError) - dagster.check.ParameterCheckError: Param "text" is not a str. Got ApiException() which is type <class 'kubernetes.client.rest.ApiException'>.

alex

08/24/2020, 6:57 PM

so another concern i have is that if you are just using the “latest” tag - k8s isn’t going to redeploy the celery workers unless you are doing a full

helm uninstall

every time. Given the errors you have been seeing I am guessing this might be the case.

alex

08/24/2020, 7:05 PM

so I think you either want to: * use an explicit tagging scheme * do

helm uninstall

I believe a lot of the confusing errors from above have to do with stale nodes in the cluster since k8s doesn’t know you have published a new

genimind/dagster-kcore:latest

and so won’t redeploy the long standing services

dagit

celery

. The

alwaysPull: True

trick only works for the

job

resources since they pull the newest copy of

latest

when they are created for run/step execution.

Kevin

08/24/2020, 9:16 PM

hi @alex @sashank, @johann helped me with the issue -- it comes from multiple dagster instances trying to access the same resource in my redis cluster. Specifying a redis backend/broker number beyond the default '0' fixed the issue! thanks again for all the help!!

alex

08/24/2020, 9:53 PM

sweet glad that that worked out - we should maybe namespace our celery entries to defend against that as well

3 Views

Open in Slack

Previous Next