This is a process referred to in government as "Content Disarm and Reconstruction." But I can't stress this enough: if you don't trust the files, you shouldn't be processing them at all locally.
I built a similar tool that converts >300 formats to images in a remote sandbox. https://preview.ninja/
How do I safely read them from the remote box, assuming we really expect the box to get compromised?
If the box is compromised and I RDP, I can get mashed by an RDP exploit. If I download the image or pdf, I can get mashed with an image or PDF exploit.
This is why I never trusted the Qubes image/PDF disarm workflow
I built a similar tool that converts >300 formats to images in a remote sandbox. https://preview.ninja/