05/19/2021, 8:12 PM
Document processing. Hello, I have a few questions coming up with this use case. I have a bunch of document I want to process. Those documents have different template. For the example, I want to extract some meta data on both types. First I concentrate on the first document then I might change my pipeline to take both types. I want my pipeline just execute at the minimum. Document already extracted do nothing, other extract. How can I do that ? How can I see if an asset is already there ? Other questions I might not think about
Sometimes from document I want to extract images. So I wonder if I should write a custom IO for this