Matt Callaway
05/28/2021, 12:44 PMresources:
io_manager:
config:
s3_bucket: ""
solids:
multiply_the_word:
config:
factor: 0
inputs:
word: ""
I launch the execution and it fails saying “” is an illegal bucket name. I fill in “test-bucket” and it fails “Unable to locate credentials”.
This is of course local to my mac, so there shouldn’t be any credentials or any attempt to touch s3. It’s a walkthrough. Why would the walk through expect to use a real s3 bucket?
Shouldn’t a demo expect to use the fs_io_manager
? Can someone provide guidance on:
• How to get the walkthrough to work?
• How to specify the difference between a local dev/demo instance and a “real” instance on AWS?
• How to deploy changes to config and code in a k8s environment?johann
05/28/2021, 12:57 PMfs_io_manager
. Generally the helm/kubernetes part of the system serve as an option for users to productionize their dagster deployment.Matt Callaway
05/28/2021, 1:03 PMMatt Callaway
05/28/2021, 1:04 PMjohann
05/28/2021, 1:09 PMfs_io_manager
or mem_io_manager
work because there’s no isolation. If you use the multiprocess executor, then they’ll still be within a pod with a shared file system so fs_io_manager
works
• If you’re using an executor that isolates each solid to its own pod (such as celery_k8s_job_executor
or the new k8s_job_executor
), then fs_io_manager
won’t work because pods generally won’t have access to a shared file system. You can set up a shared volume, but that’s generally not best practice/not recommended for production.Matt Callaway
05/28/2021, 1:12 PMmem_io_manager
should work. How do I make use of them? If I update the config and change
resources:
io_manager:
config:
s3_bucket: "test-bucket"
to something like:
resources:
io_manager:
config:
fs_io_manager:
...
it shows a warning that it expects s3_bucket… as if fs_io_manager isn’t present as an option. How do I make it use one of these other IO managers?johann
05/28/2021, 1:12 PM“It should run in any kubernetes deployment”, which could include k8s on mac, or GCPAgreed. We have io_managers for gcs and s3. For local k8s, some users use minio
johann
05/28/2021, 1:35 PMHow do I make it use one of these other IO managers?This is confusing- the io_managers are defined a resources, which are selected using pipeline mode. In the dagit playground there’s a mode dropdown above where you were writing run config.
johann
05/28/2021, 1:37 PMdefault
mode sets the io_manager to s3. As you’ve pointed out this isn’t ideal, it’d be great if you could file a gh issue to fix that. test
mode leaves the io_manager to the system default, mem_io_manager
Matt Callaway
05/28/2021, 1:46 PMjohann
05/28/2021, 1:49 PMMatt Callaway
05/28/2021, 1:49 PMMatt Callaway
05/28/2021, 1:50 PMMatt Callaway
05/28/2021, 1:50 PMjohann
05/28/2021, 1:54 PMMy most important initial goal with dagster is “run the same workflow on my mac as I would in the cloud” and to that end I’m trying to understand the “environment” or “infrastructural” set up parts as soon as I canOverall our approach here is to give you knobs on each part of the system that interacts with environment/infra, so that you can choose the simple approach (e.g. mem_io_manager here) when possible and use the more complicated one when necessary
Matt Callaway
05/28/2021, 1:56 PMpip installed
dagit already), and my imagination sees where to go in moving that easy thing into a more complex “real” infrastructure. The difficulty is in finding examples that help me get there. Having a “cookbook” of examples would be really helpful.johann
05/28/2021, 2:04 PMHow to specify the difference between a local dev/demo instance and a “real” instance on AWS?We do have a large set of knobs that have to be turned. The two main ones to consider are the instance (
dagster.yaml
) which controls system-wide settings and presets/modes which control individual pipelines. The defaults for the instance are good for local development, and if you’re using helm we set up a production dagster.yaml for you (it uses postgres for storage, kubernetes run launcher, etc). Pipelines need multiple presets and modes so they can have easy local execution (inprocess or multiprocess executor, mem or fs storage), plus whatever other options you need for production.johann
05/28/2021, 2:08 PMHow to deploy changes to config and code in a k8s environment?Sorry if this wasn’t what you were asking-
helm upgrade
is how you can deploy new changes. When you’re working on dagster pipeline code, you’ll want a deploy process that builds a new image (with a new tag) and `helm upgrade`’s with the new image tag. Dagit and other parts of the system don’t need to change when you update your pipeline code, you only need to change the image for your user-deployments.Matt Callaway
05/28/2021, 2:18 PMvalues.yaml
to make changes:
env:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
And ran the helm upgrade. I’m now switching back from “test” to “default”, and launching a run. It shows failure, but the logs include no errors.Matt Callaway
05/28/2021, 2:19 PMjohann
05/28/2021, 2:24 PMIt shows failure, but the logs include no errors.This is really strange. Would it be possible for you to send the logs from that
dagster-run-…
job?johann
05/28/2021, 2:26 PMSo the dagster.yaml lives in the container or is that within the values.yaml?In the helm case, we generate it based on your values.yaml. https://github.com/dagster-io/dagster/blob/master/helm/dagster/templates/configmap-instance.yaml
Matt Callaway
05/28/2021, 2:30 PMjohann
05/28/2021, 2:36 PMjohann
05/28/2021, 2:39 PMdagster debug export <run ID> output_file.gzip
(as an exec to the dagit pod). That will include the raw events from the database, if you share it with me I could check if we’re also missing the eventsjohann
05/28/2021, 2:40 PMkubectl exec
and kubectl cp
are useful here, lmk if you need any help)Matt Callaway
05/28/2021, 2:42 PMkubectl get pods
shows me a set of 14 pods named dagster-run-…
Iterating over them with kubectl logs $POD
I see a few different sorts of error. This one I was expecting:
botocore.exceptions.NoCredentialsError: Unable to locate credentials
as I’m trying to supply creds for my new minio setup.
But then also there’s
2021-05-28 14:17:12 - dagster - ERROR - example_pipe - 06d59190-0d41-4523-b2aa-f03b86ac185e - 1 - PIPELINE_FAILURE - Execution of pipeline "example_pipe" failed. An exception was thrown during execution.
dagster.core.errors.DagsterResourceFunctionError: Error executing resource_fn on ResourceDefinition io_manager
Matt Callaway
05/28/2021, 2:43 PM{"__class__": "DagsterEvent", "event_specific_data": {"__class__": "PipelineFailureData", "error": {"__class__": "SerializableErrorInfo", "cause": {"__class__": "SerializableErrorInfo", "cause": null, "cls_name": "NoCredentialsError", "message": "botocore.exceptions.NoCredentialsError: Unable to locate credentials\n", "stack": [" File \"/usr/local/lib/python3.7/site-packages/dagster/core/errors.py\", line 184, in user_code_error_boundary\n yield\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 281, in single_resource_event_generator\n resource_or_gen = resource_def.resource_fn(context)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster_aws/s3/io_manager.py\", line 114, in s3_pickle_io_manager\n pickled_io_manager = PickledObjectS3IOManager(s3_bucket, s3_session, s3_prefix=s3_prefix)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster_aws/s3/io_manager.py\", line 17, in __init__\n self.s3.head_bucket(Bucket=self.bucket)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/client.py\", line 386, in _api_call\n return self._make_api_call(operation_name, kwargs)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/client.py\", line 692, in _make_api_call\n operation_model, request_dict, request_context)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/client.py\", line 711, in _make_request\n return self._endpoint.make_request(operation_model, request_dict)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/endpoint.py\", line 102, in make_request\n return self._send_request(request_dict, operation_model)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/endpoint.py\", line 132, in _send_request\n request = self.create_request(request_dict, operation_model)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/endpoint.py\", line 116, in create_request\n operation_name=operation_model.name)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/hooks.py\", line 356, in emit\n return self._emitter.emit(aliased_event_name, **kwargs)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/hooks.py\", line 228, in emit\n return self._emit(event_name, kwargs)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/hooks.py\", line 211, in _emit\n response = handler(**kwargs)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/signers.py\", line 90, in handler\n return self.sign(operation_name, request)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/signers.py\", line 162, in sign\n auth.add_auth(request)\n", " File \"/usr/local/lib/python3.7/site-packages/botocore/auth.py\", line 373, in add_auth\n raise NoCredentialsError()\n"]}, "cls_name": "DagsterResourceFunctionError", "message": "dagster.core.errors.DagsterResourceFunctionError: Error executing resource_fn on ResourceDefinition io_manager\n", "stack": [" File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py\", line 762, in pipeline_execution_iterator\n for event in pipeline_context.executor.execute(pipeline_context, execution_plan):\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/executor/in_process.py\", line 50, in execute\n output_capture=pipeline_context.output_capture,\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/api.py\", line 836, in __iter__\n yield from self.execution_context_manager.prepare_context()\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/utils/__init__.py\", line 430, in generate_setup_events\n obj = next(self.generator)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/context_creation_pipeline.py\", line 282, in execution_context_event_generator\n yield from resources_manager.generate_setup_events()\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/utils/__init__.py\", line 430, in generate_setup_events\n obj = next(self.generator)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 230, in resource_initialization_event_generator\n pipeline_def_for_backwards_compat=pipeline_def_for_backwards_compat,\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 182, in _core_resource_initialization_event_generator\n raise dagster_user_error\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 153, in _core_resource_i
nitialization_event_generator\n for event in manager.generate_setup_events():\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/utils/__init__.py\", line 430, in generate_setup_events\n obj = next(self.generator)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 298, in single_resource_event_generator\n raise dagster_user_error\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/execution/resources_init.py\", line 292, in single_resource_event_generator\n \"Resource generator {name} must yield one item.\".format(name=resource_name)\n", " File \"/usr/local/lib/python3.7/contextlib.py\", line 130, in __exit__\n self.gen.throw(type, value, traceback)\n", " File \"/usr/local/lib/python3.7/site-packages/dagster/core/errors.py\", line 193, in user_code_error_boundary\n ) from e\n"]}}, "event_type_value": "PIPELINE_FAILURE", "logging_tags": {}, "message": "Execution of pipeline \"example_pipe\" failed. An exception was thrown during execution.", "pid": 1, "pipeline_name": "example_pipe", "solid_handle": null, "step_handle": null, "step_key": null, "step_kind_value": null}
Matt Callaway
05/28/2021, 2:43 PMjohann
05/28/2021, 2:47 PMdagster-run-06d59190-…
Matt Callaway
05/28/2021, 2:48 PMjohann
05/28/2021, 2:51 PMMatt Callaway
05/28/2021, 2:52 PMkc exec dagster-run-06d59190-0d41-4523-b2aa-f03b86ac185e-xhg7f -- bash
error: cannot exec into a container in a completed pod; current phase is Succeeded
Matt Callaway
05/28/2021, 2:54 PMkc exec dagster-dagit-67cdffbdd8-zrnhj -- dagster debug export a457f2ba outfile.gzip
Matt Callaway
05/28/2021, 2:55 PMMatt Callaway
05/28/2021, 2:58 PMresources:
io_manager:
config:
s3_bucket: test-bucket
s3:
config:
endpoint_url: <http://localhost:9000>
profile_name: minio
region_name: us-east-1
solids:
multiply_the_word:
config:
factor: 0
inputs:
word: ''
where the profile_name is probably not found.Matt Callaway
05/28/2021, 2:59 PMMatt Callaway
05/28/2021, 3:00 PMMatt Callaway
05/28/2021, 3:04 PM> kubectl get configmap dagster-dagster-user-deployments-k8s-dagster-lia-user-env -o json | jq '.data'
{
"AWS_ACCESS_KEY_ID": "minioadmin",
"AWS_SECRET_ACCESS_KEY": "minioadmin",
"DAGSTER_HOME": "/opt/dagster/dagster_home",
"DAGSTER_K8S_INSTANCE_CONFIG_MAP": "dagster-dagster-user-deployments-instance",
"DAGSTER_K8S_PG_PASSWORD_SECRET": "dagster-postgresql-secret",
"DAGSTER_K8S_PIPELINE_RUN_ENV_CONFIGMAP": "dagster-dagster-user-deployments-pipeline-env",
"DAGSTER_K8S_PIPELINE_RUN_NAMESPACE": "dagster"
}
johann
05/28/2021, 3:07 PMjohann
05/28/2021, 3:09 PMMatt Callaway
05/28/2021, 3:10 PMkubectl logs dagster-dagit-67cdffbdd8-zrnhj
doesn’t seem to have any “live updates”.Matt Callaway
05/28/2021, 3:18 PMjohann
05/28/2021, 3:19 PMWhen I click the dagit button to go to raw logs, it just spins, loading…This is an easy pitfall, the raw logs get stored by the computeLogManager (configured in values.yaml) and by default it’s not accessible by dagit. It needs to use s3/gcs/minio again for those logs.
Matt Callaway
05/28/2021, 3:20 PMMatt Callaway
05/28/2021, 3:20 PMjohann
05/28/2021, 3:20 PMMatt Callaway
05/28/2021, 3:21 PMbotocore.exceptions.NoCredentialsError: Unable to locate credentials
Matt Callaway
05/28/2021, 3:21 PMMatt Callaway
05/28/2021, 3:25 PMvalues.yml
to provide the S3 credentials:
dagster-user-deployments:
enabled: true
deployments:
- name: "k8s-dagster-lia"
image:
repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
tag: latest
pullPolicy: Always
dagsterApiGrpcArgs:
- "-f"
- "/example_project/example_repo/repo.py"
port: 3030
env:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
johann
05/28/2021, 3:25 PMjohann
05/28/2021, 3:28 PMjohann
05/28/2021, 3:29 PMextraManifests
johann
05/28/2021, 3:30 PMMatt Callaway
05/28/2021, 3:32 PMMatt Callaway
05/28/2021, 3:33 PMenv
works:
To enable Dagster to connect to S3, provide AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables via the env, envConfigMaps, or envSecrets fields under userDeployments in values.yaml
Matt Callaway
05/28/2021, 3:37 PMenvConfigMaps:
- name: config-map
as compared to the referenced k8s doc that has:
apiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
SPECIAL_LEVEL: very
SPECIAL_TYPE: charm
So how should values.yaml look?
envConfigMaps:
- name: config-map
data:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
??johann
05/28/2021, 3:44 PMMatt Callaway
05/28/2021, 3:45 PMenvConfigMaps
entry look like?Varun
05/28/2021, 3:50 PMapiVersion: v1
kind: ConfigMap
metadata:
name: special-config
namespace: default
data:
SPECIAL_LEVEL: very
SPECIAL_TYPE: charm
and then specify its name in the envConfigMaps
section of values.yaml
like this.
envConfigMaps:
- name: special-config
johann
05/28/2021, 3:53 PMextraManifests
of the values.yaml (so it gets created alongside the rest of the k8s resources) and then then second block goes in
runLauncher:
type: type: K8sRunLauncher
config:
k8sRunLauncher:
envConfigMaps:
- name: special-config
Matt Callaway
05/28/2021, 4:01 PMvalues.yaml
says that Config Maps are made from the env
section:
# Additional environment variables to set.
# A Kubernetes ConfigMap will be created with these environment variables. See:
# <https://kubernetes.io/docs/concepts/configuration/configmap/>
#
# Example:
#
# env:
# ENV_ONE: one
# ENV_TWO: two
But then @johann says to use extraManifests
, which I would guess to look like this:
dagster-user-deployments:
enabled: true
deployments:
- name: "k8s-dagster-lia"
image:
repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
tag: latest
pullPolicy: Always
dagsterApiGrpcArgs:
- "-f"
- "/example_project/example_repo/repo.py"
port: 3030
envConfigMaps:
- name: aws-config-map
extraManifests:
- apiVersion: v1
kind: ConfigMap
metadata:
name: aws-config-map
namespace: dagster
data:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
(Note that I’ve used a dagster
namespace that I created with `kubectl`so I think that’s right, given --namespace dagster
shows the right services:
> kubectl get services --namespace dagster
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dagster-dagit ClusterIP 10.106.3.199 <none> 80/TCP 18h
dagster-postgresql ClusterIP 10.100.82.214 <none> 5432/TCP 18h
dagster-postgresql-headless ClusterIP None <none> 5432/TCP 18h
k8s-dagster-lia ClusterIP 10.110.102.175 <none> 3030/TCP 18h
But then @johann suggests that I use runLauncher
too? So then does that mean my example looks like this?
dagster-user-deployments:
enabled: true
deployments:
- name: "k8s-dagster-lia"
image:
repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
tag: latest
pullPolicy: Always
dagsterApiGrpcArgs:
- "-f"
- "/example_project/example_repo/repo.py"
port: 3030
envConfigMaps:
- name: aws-config-map
extraManifests:
- apiVersion: v1
kind: ConfigMap
metadata:
name: aws-config-map
namespace: dagster
data:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envConfigMaps:
- name: aws-config-map
Matt Callaway
05/28/2021, 4:02 PMsecrets
instead of Config Maps… but I’ll save that for later.)Matt Callaway
05/28/2021, 4:12 PMbotocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "<http://localhost:9000/test-bucket>"
Matt Callaway
05/28/2021, 4:13 PMMatt Callaway
05/28/2021, 4:13 PM> aws --endpoint-url <http://localhost:9000> s3 ls <s3://test-bucket/>
2021-05-28 09:12:01 29 date1.txt
johann
05/28/2021, 4:15 PMMatt Callaway
05/28/2021, 4:16 PMMatt Callaway
05/28/2021, 4:18 PM<http://host.docker.internal:9000>
johann
05/28/2021, 4:18 PMMatt Callaway
05/28/2021, 4:22 PMresources:
io_manager:
config:
s3_bucket: test-bucket
s3:
config:
endpoint_url: <http://host.docker.internal:9000>
region_name: us-east-1
solids:
multiply_the_word:
config:
factor: 0
inputs:
word: ''
Matt Callaway
05/28/2021, 4:22 PMvalues.yaml
?johann
05/28/2021, 4:23 PMjohann
05/28/2021, 4:24 PMshould that also go intoYou should have a separate file (can be named?values.yaml
values.yaml
or otherwise) that stores your overrides of our default values. You specify your file when you do helm upgrade -f <file>
Matt Callaway
05/28/2021, 4:26 PMresources:
section.Matt Callaway
05/28/2021, 4:26 PMdagit:
section of values.yaml
?Matt Callaway
05/28/2021, 4:27 PMdagster-user-deployments:
enabled: true
deployments:
- name: "k8s-dagster-lia"
image:
repository: "<http://docker.io/dagster/user-code-example|docker.io/dagster/user-code-example>"
tag: latest
pullPolicy: Always
dagsterApiGrpcArgs:
- "-f"
- "/example_project/example_repo/repo.py"
port: 3030
envConfigMaps:
- name: aws-config-map
extraManifests:
- apiVersion: v1
kind: ConfigMap
metadata:
name: aws-config-map
namespace: dagster
data:
AWS_ACCESS_KEY_ID: minioadmin
AWS_SECRET_ACCESS_KEY: minioadmin
runLauncher:
type: K8sRunLauncher
config:
k8sRunLauncher:
envConfigMaps:
- name: aws-config-map
dagit:
resources:
io_manager:
config:
s3_bucket: test-bucket
s3:
config:
endpoint_url: <http://host.docker.internal:9000>
region_name: us-east-1
solids:
multiply_the_word:
config:
factor: 0
inputs:
word: ''
Matt Callaway
05/28/2021, 4:27 PMresources
and solids
inside the dagit
section?johann
05/28/2021, 4:32 PMMatt Callaway
05/28/2021, 4:34 PMMatt Callaway
05/28/2021, 5:50 PM