When you use PDF.js from Mozilla to render a PDF file in DOM, I think you might actually get something pretty close. For example I suppose each Tj becomes a <span> and each TJ becomes a collection of <span>s. (I'm fairly certain it doesn't use <canvas>.) And I suppose it must be very faithful to the original document to make it work.
Indeed! I've used it to parse documents I've received through FOIA -- sometimes it's just easier to write beautifulsoup code compared to having to deal with PDF's oddities.