https://github.com/coolwanglu/pdf2htmlEX
A demo is available at http://coolwanglu.github.com/pdf2htmlEX/demo/demo.htm
Features:
- Precise rendering
- Formats (font, position) are preserved without using images as much as possible
- Single html file output: "no companions"
Background:
Recently I need to convert LaTeX/PDF into HTML, I don't know which tools your are using, but I didn't find one that satisfies me.
I've tried pdftohtml from poppler, which drop formats. And latex2html, which didn't work at all for a simple TeX file. I've also tried PDFMate, which is not good either.
Then I decided to write one myself, which is now the pdf2htmlEX
It's not only for PDF generated by latex, but also general PDFS.
So far I've been focused on font manipulation, such that text can be rendered precisely. Other objects (figures, images, and some font types) are rendered as Images, and will be supported "natively" in the future.
I hope it could be useful to some of you, and will appreciate any suggestions or bug reports.
Update:
Added two more demo pages:
http://coolwanglu.github.com/pdf2htmlEX/demo/cheat.html
http://coolwanglu.github.com/pdf2htmlEX ... eneve.html
- Completed removed Boost
- Relaxed dependency of C++11, supports GCC no earlier than 4.4.6
- Links are now supported (In-document jumping is accurate to pages)
- Fixed an encoding problem for some fonts.