Does anyone use the fs_io_manager (for speed), then move everything to S3 after for traceability?
04/25/2022, 3:52 PM
would you want to do this within the same run? Or are you imagining that in certain scenarios you want to write to s3 and in others write to fs
04/25/2022, 4:00 PM
Not quite sure, I had been using the s3 io manager without questioning it, but I'm currently debugging a pipeline and realising that the uploads and downloads take a lot of time. Wondering if there was a way to do both.
I can imagine a pipeline with a final op that uploads the related fs files to S3. That I can switch on and off via the job config.
04/25/2022, 4:01 PM
Yup that's what I was imagining as well. You could make the output of the last op optional, so it only fires if you "switch it on" via config
You could also create a schedule/sensor that bulk uploads to s3 on some cadence
04/25/2022, 4:22 PM
The file system manager might actually be slower ??? Seems bizarre