https://dagster.io/ logo
#deployment-kubernetes
Title
# deployment-kubernetes
c

Charlie Bini

05/11/2022, 3:14 PM
Is this the expected behavior on GKE when
max_retries
is > 0? When jobs fail, it starts a new pod, resuming the previous job at the point of failure. However, I'm using the default IO Manager (
fs_io_manager
I believe), so the retry fails with
FileNotFoundError: [Errno 2] No such file or directory: '/opt/dagster/dagster_home/storage/030b6a20-fea9-481e-8ac2-c2fcca035741/query_data.get_count[storedgrades_q_0]/table'
. Is using the
gcs_pickle_io_manager
required for using the retry feature?
If so, is there a way to set the deployment to use
gcs_pickle_io_manager
as the default or does it need to be added to every job?
d

daniel

05/11/2022, 3:19 PM
yeah, to be able to pick off where it left off, it needs to be able to pull outputs from the failed steps - so either mounting a shared filesystem somehow or using an IO manager that actually persists results like the gcs io manager is the way to go
For the second part: We're actually thinking right now about ways to set resources at the repo level, which would address this - but for now I think it would need to be configured on every job, yeah.
c

Charlie Bini

05/11/2022, 3:20 PM
gotcha thanks for clearing that up
@marcos FYI ☝️
🚀 1