John Mav
06/17/2020, 10:08 PMAndrey Alekseev
06/18/2020, 8:09 AMGaetan DELBART
06/18/2020, 12:30 PMdagster-k8s helm
.
1. In all the deployments, you use an init container using image: postgres:9.6.16
But, in production envs, we use an external database, using postgres 11.6
so, I had to manually change that to image: postgres:11.6
. Could we use a variable for the tag of this image ?
2. We don't use celery at all in our productions envs, and in the Values.yaml
file it is possible to disable celery
####################################################################################################
# Celery
####################################################################################################
celery:
# The Celery workers can be deployed with a fixed image (no user code included)
image:
repository: ""
tag: ""
pullPolicy: Always
enabled: true <- I've change this to false
So I added an {{ if .Values.celery.enabled }}
in deployment-celery.yaml
& to deployment-celery-extras.yaml
to prevent deployment of celery from happening
3. Since we don't use celery, I had to change the configmap-instance.yaml
, specifically the run_launcher
part. In fact, the class class: CeleryK8sRunLauncher
is hardcoded, and cannot be change, so I had to change this manually to class: K8sRunLauncher
& tweek the config a bit to be able to run pipeline directly in k8s-job. Maybe we could have a section in the values.yaml
to let the user choose what he want to use ?
Finally, we use traefik as our cluster router. and I've added some template to the helm release, to add routes to the dagit ui. Would you be interested in a PR to implement traefik in your helm repository ?borgdrone7
06/19/2020, 11:35 AMCris
06/19/2020, 6:23 PMKevin
06/19/2020, 7:39 PMuser
06/19/2020, 11:36 PMaj
06/19/2020, 11:43 PMBen Smith
06/20/2020, 10:19 PMpath_to_object: my_object.yaml
element in the config of my first solid that adds "my_object" to resources and can be accessed via <http://context.resources.my|context.resources.my>_object
in all later solids.sephi
06/21/2020, 7:57 AMbash_command_solid
inside a composite_solid
- and need to set a dependency between the output of a solid
and the bash_command_solid
.
The pseudo code is as follows:
@composite_solid()
def func():
path_to_file = save_file_solid()
res = bash_command_solid(f"Rscript run_process.R {path_to_file}")
return res
The error we receive is as follows:
dagster.core.definitions.events.Failure: Bash command execution failed with output: /tmp/tmpxxxxx line 1: syntax error near unexpected token `newline'\n/tmp/tmpxxxxx: `Rscirpt run_process.R <dagster.core.definitions.composition.InvokedSolidOutputHandle object at 0x7f....>'\n", "label": "intentional-failure", "metadata_entries":[]}
From what we understand the composite_solid
is running within a pipeline and generating a temp path string output from the solid (without running the solid itself).
Running the bash command in a terminal runs flawless.
In other composite_solid
s we are able to create dependencies so I'm guessing that it is related to bash_command_solid
.
What would be the correct approach for such task?Rafal
06/21/2020, 2:04 PMwbonelli
06/21/2020, 5:21 PMsqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread
error from the Dask worker. This happens intermittently, including when the worker is configured to use just 1 thread. It doesn't cause the pipeline to fail, just will show up in the worker's logs. I'm happy to PR a comment about this into the Dask executor docs (and maybe a recommendation to use the Postgres option?)max
06/22/2020, 11:56 AMBen Sully
06/22/2020, 3:56 PMreconstructable
, but as soon as I do that I can't include it in a repository. the trimmed backtrace is this, although i think there's a bug there, and i think the actual root is that the repository
decorator doesn't accept ReconstructablePipeline
objects:
File "/home/ben/repos/dataplatform-poc/pipelines/dataplatform/repository.py", line 6, in <module>
@repository
File "/home/ben/.pyenv/versions/3.7.5/envs/dataplatform-poc/lib/python3.7/site-packages/dagster/core/definitions/decorators/repository.py", line 225, in repository
return _Repository()(name)
File "/home/ben/.pyenv/versions/3.7.5/envs/dataplatform-poc/lib/python3.7/site-packages/dagster/core/definitions/decorators/repository.py", line 44, in __call__
bad_definitions.append(i, type(definition))
TypeError: append() takes exactly one argument (2 given)
Ben Sully
06/22/2020, 3:59 PMreconstructable
until i got an error, so i think that needs documenting somewhere 🙂matas
06/22/2020, 6:22 PMmatas
06/22/2020, 6:53 PMsephi
06/23/2020, 8:22 AMdagster
and spark
we are wondering what is the optimal way to use cache in a nested dagster pipeline.
Currently we are running with spark
(version 2.3) with YARN
with a Cloudera distribution (we are running without a dagster storage config ) .
Our pipeline consists of composite solids
that have dependencies between them. The solids
within the composite solids
are processing the data in various ways, including saving the data as an intermediate steps.
We notices that adding cache
prevents some steps to be recalculated.
What is the best practice to include the cache into the solids?Mathias Frey
06/23/2020, 9:05 AMLeor
06/23/2020, 5:45 PMKevin
06/23/2020, 5:47 PMCris
06/24/2020, 3:55 PMTobias Macey
06/24/2020, 6:50 PMTobias Macey
06/25/2020, 1:59 PMCris
06/25/2020, 7:37 PMJohn Helewa
06/26/2020, 12:47 AMShaun Ryan
06/27/2020, 10:00 PMCris
06/26/2020, 12:55 AMuser
06/26/2020, 1:15 AMsashank
06/26/2020, 1:19 AM