hey guys! running into some deep k8s stuff here. ...
# deployment-kubernetes
s
hey guys! running into some deep k8s stuff here. We are trying to run jobs in a namespace different from control plane, and with a service account in the namespace. Because security. Anyway, here is how we did the infra setup : • created a new namespaces
foo
, • we deploy dagster with a helm chart, as is normal, to
dagster
namespace • we copy db secret, pipeline and instance config maps from
dagster
namespace to
foo
after deployment • we create service account and roles needed for dagster default account to start executor jobs in other namespaces and
foo
KSA can create job in its own namespace • we configure jobs that should run in
foo
by adding to their config
Copy code
execution:
 config:
   job_namespace: "foo"
   service_account_name: "foo-dagster-ksa"
resources:
 io_manager:
   config:
     gcs_bucket: "foo-bucket"
• >>>
the jobs start! We can see that runner and executor jobs start OK,
then when executor starts the very first pod, the pod dies with this
Copy code
Events:
  Type     Reason                Age   From            Message
  ----     ------                ----  ----            -------
  Normal   SuccessfulCreate      13s   job-controller  Created pod: dagster-job-5cf70b1ae7d5bed7b23a7b5083157808-8vqwr
  Warning  BackoffLimitExceeded  9s    job-controller  Job has reached the specified backoff limit
pod's log:
Copy code
Traceback (most recent call last):
  File "/usr/local/bin/dagster", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/dist-packages/dagster/cli/__init__.py", line 50, in main
    cli(auto_envvar_prefix=ENV_PREFIX)  # pylint:disable=E1123
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/dist-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/dagster/cli/api.py", line 301, in execute_step_command
    if args.instance_ref
  File "/usr/local/lib/python3.7/dist-packages/dagster/core/instance/__init__.py", line 430, in from_ref
    run_launcher=instance_ref.run_launcher,
  File "/usr/local/lib/python3.7/dist-packages/dagster/core/instance/ref.py", line 265, in run_launcher
    return self.run_launcher_data.rehydrate() if self.run_launcher_data else None
  File "/usr/local/lib/python3.7/dist-packages/dagster/serdes/config_class.py", line 86, in rehydrate
    return klass.from_config_value(self, result.value)
  File "/usr/local/lib/python3.7/dist-packages/dagster_k8s/launcher.py", line 168, in from_config_value
    return cls(inst_data=inst_data, **config_value)
  File "/usr/local/lib/python3.7/dist-packages/dagster_k8s/launcher.py", line 79, in __init__
    kubernetes.config.load_incluster_config()
  File "/usr/local/lib/python3.7/dist-packages/kubernetes/config/incluster_config.py", line 121, in load_incluster_config
    try_refresh_token=try_refresh_token).load_and_set(client_configuration)
  File "/usr/local/lib/python3.7/dist-packages/kubernetes/config/incluster_config.py", line 54, in load_and_set
    self._load_config()
  File "/usr/local/lib/python3.7/dist-packages/kubernetes/config/incluster_config.py", line 73, in _load_config
    raise ConfigException("Service token file does not exist.")
kubernetes.config.config_exception.ConfigException: Service token file does not exist.
it looks like we are missing some secret ingredient in the
foo
namesapce, but what is it??
I also tried setting
load_incluster_config: false
although I am not sure it was set correctly....
d
Hi Serj - one thing to clarify here ... is the dagster-run pod (the run worker pod, not the dagster-job pod for each step) also running in the 'foo' namespace? or is that happening in the same namespace as the control plane?
i assume its the control plane namespace from your post. And foo-dagster-ksa is a service account defined in the foo namespace?
somebody else ran into this a while back and it turned out there was a magic
automountServiceAccountToken
flag that needed to be set, curious if that's the case here too https://dagster.slack.com/archives/C01U954MEER/p1640024180400900?thread_ts=1639781536.357200&amp;cid=C01U954MEER
s
thank you Daniel! yes, yes it was automount_service_account_token = true
🎉 2
the saddest thing here is Charles is a good fellow from my team
I was not aware he had run into the same issue
to answer your question, we only configured the executor to look at another namespace, the runner starts in the same namespace with the control plane and the repo
tagging @Charles Leung for lulz
😆 1
d
haha glad we could connect you