Michiel Ghyselinck

03/30/2023, 6:54 AM
Hello, I created an
that writes objects to S3, then I have another
that retrieves these objects, processes them and writes them to a database. Both ops are using the same static partition (list of user names). Currently I created one job per op. Then I first run the job that writes to S3, when that is finished I run the op that writes to the database. I have a gut feeling that this isn't a good practice. Some other ways I thought of doing this: • Should I return the S3 keys from the first op and feed them into the second op? That way I can put my two ops in one job. • Or should I make assets instead of ops? Looking for some advice/feedback. I read the docs about ops and assets but I still feel that I'm not capable of making the right distinctions.


04/03/2023, 4:36 PM
Hi Michiel, Sounds like you should have your ops in the same job. You could also use the S3 IO manager to pass data between the ops. As for ops vs assets, it sounds like assets might work here as well, but the same advice applies about IO managers and grouping in the same job.