Jenny Webster
08/08/2021, 10:18 PMPeter B
08/09/2021, 2:30 AMtakan
08/09/2021, 4:56 AMdagster pipeline launch -p hello_pipeline -w user-deployment/deploy_k8s/example_project/workspace.yaml -l example_repo2
and it worked on my terminal. But the run doesn’t reflect the output result on the dagit UI.
Isn’t the command dagter run
not supposed to show the result on the UI? and if so is there any way to do this with the cli?Devaraj Nadiger
08/09/2021, 9:07 AMDaniel Salama
08/09/2021, 11:36 AMRubén Lopez Lozoya
08/09/2021, 3:20 PMgrpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.DEADLINE_EXCEEDED
details = "Deadline Exceeded"
debug_error_string = "{"created":"@1628474481.015524097","description":"Error received from peer ipv4:10.88.0.193:3030","file":"src/core/lib/surface/call.cc","file_line":1066,"grpc_message":"Deadline Exceeded","grpc_status":4}"
>
File "/usr/local/lib/python3.7/site-packages/dagster/scheduler/scheduler.py", line 212, in launch_scheduled_runs_for_schedule
debug_crash_flags,
File "/usr/local/lib/python3.7/site-packages/dagster/scheduler/scheduler.py", line 258, in _schedule_runs_at_time
scheduled_execution_time=schedule_time,
File "/usr/local/lib/python3.7/site-packages/dagster/core/host_representation/repository_location.py", line 687, in get_external_schedule_execution_data
scheduled_execution_time,
File "/usr/local/lib/python3.7/site-packages/dagster/api/snapshot_schedule.py", line 55, in sync_get_external_schedule_execution_data_grpc
else None,
File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 272, in external_schedule_execution
external_schedule_execution_args
File "/usr/local/lib/python3.7/site-packages/dagster/grpc/client.py", line 97, in _streaming_query
yield from response_stream
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
raise self
I can manually run the backfill over this partition set though 😞Tadas Barzdžius
08/09/2021, 4:02 PMchrispc
08/09/2021, 4:38 PMchrispc
08/09/2021, 5:09 PMchrispc
08/09/2021, 5:20 PMWilliam Reed
08/09/2021, 7:35 PMjeremy
08/10/2021, 1:28 AMPlayground
tab, I get the error saying that the celery-docker
is not a valid config entry under the execution
path. How can I configure this executor? Am I missing a particular step?
I have the dagster_celery_docker
requirement installed in the environment that runs dagit with the DefaultRunLauncher
.Pradithya Aria Pura
08/10/2021, 4:53 AMtakan
08/10/2021, 6:19 AMdagster.core.errors.DagsterInvalidInvocationError: Compute function of solid 'test' has context argument, but no context was provided when invoking.
when I
dagster pipeline execute -f sample/repo.py -p source_graph
and my repo.py is
from dagster import op, graph, repository
from dagster_shell import create_shell_command_solid
@op
def source_op():
command = "echo 'hi all'"
a = create_shell_command_solid(command, 'test')
a()
@graph
def source_graph():
source_op()
@repository
def repo():
return [source_graph]
I’m not sure where I’m doing wrong here. Could someone help me out on this?Peter B
08/10/2021, 6:54 AMKenneth Cruz
08/10/2021, 7:16 AMGeorge Pearse
08/10/2021, 8:33 AMSuraj Narwade
08/10/2021, 9:48 AMDaniel Salama
08/10/2021, 11:47 AM@solid
def time_test():
time.sleep(1)
it’s running infinitely (supposed to finish executing after 1 second)Vladislav Ladenkov
08/10/2021, 2:19 PMBrian Abelson
08/10/2021, 3:41 PMQueuedRunCoordinator
, this can, over time, lead to no new jobs launching if there are enough of these zombie jobs. Is there a way I can configure, on a per-pipeline basis, a way of auto-cancelling jobs once they've run for a certain time period?Wes Roach
08/10/2021, 6:43 PMresult
object and I'm able to view the output with result.output_for_solid("solid_name").show()
, etc. Is there a way to access a similar interface for a previous run? For instance, I run the same pipeline via Dagit, it writes the intermediate and final DataFrames to their respective locations. I want to load the parameters of the pipeline by run_id
, as an example parameter, and access the intermediate and final DataFrames (stored in Parquet fwiw) via a similar result.output_for_solid
interface - is that possible?Martim Passos
08/10/2021, 6:45 PMColumnConstraintViolationException
, but is there a way to use this with builtin types? Like printing out which column failed the test and the offending rows, for instancejeremy
08/10/2021, 8:00 PMSnapshot ID
of a pipeline run? Is it generated from a hash of the pipeline file? And further, is it possible to retrieve and run pipelines of a certain snapshot?Utkarsh
08/11/2021, 5:23 AMKenneth Cruz
08/11/2021, 6:34 AMcontext
argument? This is to check which solid is currently invoked in a pipeline with more than 1 same solid but different alias.Pradithya Aria Pura
08/11/2021, 7:54 AMAn unexpected exception was thrown. Please file an issue.
google.api_core.exceptions.Forbidden: 403 POST <https://storage.googleapis.com/upload/storage/v1/b/my-bucket/o?uploadType=multipart>: {
"error": {
"code": 403,
"message": "Insufficient Permission",
"errors": [
{
"message": "Insufficient Permission",
"domain": "global",
"reason": "insufficientPermissions"
}
]
}
}
: ('Request failed with status code', 403, 'Expected one of', <HTTPStatus.OK: 200>)
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_plan.py", line 193, in _dagster_event_sequence_for_step
for step_event in check.generator(step_events):
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 326, in core_dagster_event_sequence_for_step
for evt in _type_check_and_store_output(step_context, user_event, input_lineage):
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 380, in _type_check_and_store_output
for evt in _store_output(step_context, step_output_handle, output, input_lineage):
File "/usr/local/lib/python3.7/site-packages/dagster/core/execution/plan/execute_step.py", line 488, in _store_output
handle_output_res = output_manager.handle_output(output_context, output.value)
File "/usr/local/lib/python3.7/site-packages/dagster_gcp/gcs/io_manager.py", line 65, in handle_output
retry_on=(TooManyRequests, Forbidden),
File "/usr/local/lib/python3.7/site-packages/dagster/utils/backoff.py", line 64, in backoff
raise to_raise
File "/usr/local/lib/python3.7/site-packages/dagster/utils/backoff.py", line 58, in backoff
return fn(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2858, in upload_from_string
retry=retry,
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 2589, in upload_from_file
_raise_from_invalid_response(exc)
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 4355, in _raise_from_invalid_response
raise exceptions.from_http_status(response.status_code, message, response=response)
Following is the run configuration
execution:
k8s:
config:
env_config_maps:
- dagster-pipeline-env
image_pull_policy: Always
resources:
io_manager:
config:
gcs_bucket: my-bucket
gcs_prefix: dagster
solids:
multiply_the_word:
config:
factor: 2
inputs:
word: test
The pipeline can be executed successfully by simply retrying, but the same issue randomly reappear. Any idea what did go wrong?Milos Tomic
08/11/2021, 11:17 AMdagster not found
when I try to do multi-stage docker build using python3.7-slim as building image, and google gcr.io/distroless/python3:debug as running imageVlad Dumitrascu
08/11/2021, 5:40 PMCan I run two pipelines at the same time?
I have a pipeline that used to take 30 minutes that now takes 4 hours. Well, I need to run other pipelines I made because I depend on them, now. Can I just start another one, concurrently? Or do I have to wait for the other to finish?
Full disclosure, I'm using DagIt to start the pipeline. Can I just start another at the same time, and track them in the "Runs" tab?
I don't want to lose 2-3 hours due to a crash or something.Julian Knight
08/11/2021, 9:04 PMComputeLogManager
and EventLogStorage
implementations which would delegate to the native ones but also delete data for old runs periodically, but before I do this I want to see if a solution exists already.