Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you get the source data (text) from a book? To me it is the major roadblock for LLM-based commercial content consumption.


Old books are on Gutenberg, archive.org etc.

Physical ones, I scan. Cutting the spine is easiest. But today you can also just take pics with your phone.

Many retailers also sell EPUB. Which is just HTML.

Obviously, that’s all for private consumption only. (Unless you’re OpenAI I guess. :-P)


Oh you gotta serious! Salute to you from a lazy dad.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: