jonvet
07/06/2022, 11:09 AMimage
tag inside the container_config
of the graph. The job fails with
dagster.core.errors.DagsterInvariantViolationError: Could not find pipeline 'scale_model_training'. Found: .
File "/home/ubuntu/pyenv/versions/3.9.8/lib/python3.9/site-packages/dagster/grpc/impl.py", line 82, in core_execute_run
recon_pipeline.get_definition()
File "/home/ubuntu/pyenv/versions/3.9.8/lib/python3.9/site-packages/dagster/core/definitions/reconstruct.py", line 180, in get_definition
defn = self.repository.get_definition().get_pipeline(self.pipeline_name)
File "/home/ubuntu/pyenv/versions/3.9.8/lib/python3.9/site-packages/dagster/core/definitions/repository_definition.py", line 1102, in get_pipeline
return self._repository_data.get_pipeline(name)
File "/home/ubuntu/pyenv/versions/3.9.8/lib/python3.9/site-packages/dagster/core/definitions/repository_definition.py", line 850, in get_pipeline
return self._pipelines.get_definition(pipeline_name)
File "/home/ubuntu/pyenv/versions/3.9.8/lib/python3.9/site-packages/dagster/core/definitions/repository_definition.py", line 155, in get_definition
raise DagsterInvariantViolationError(
where scale_model_training
is the name of my job. any ideas what could be wrong?executor_def=k8s_job_executor
to the job definition and then specify which image to use in the run configpod_spec_config
with a toleration
because I need to run it on a specific node pool. when I run the job it creates the pod for the job on the correct node pool. however, the “step” job doesn’t seem to keep those graph annotations big cry so the “step” job is scheduled on the default node pool..tag
argument in .to_job()
but it doesn’t seem to have an effect. the job is still scheduled on a default pod. any idea what i can do?
scale_model_training_job = scale_model_training_graph.to_job(
name="scale_model_training",
config=config_from_files(
[
file_relative_path(__file__, "scale_model_training.yaml"),
]
),
executor_def=k8s_job_executor,
tags={
"dagster-k8s/config": {
"container_config": {
"resources": {
"requests": {"memory": "10Gi"},
"limits": {"memory": "10Gi"},
},
},
"pod_spec_config": {
"tolerations": [
{"key": "<http://nvidia.com/gpu|nvidia.com/gpu>", "operator": "Equal", "value": "present", "effect": "NoSchedule"}
],
},
"job_spec_config": {"ttl_seconds_after_finished": 3600},
},
},
)