Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Spacy is a decent suggestion here. They have pretty good ways of writing tagging rules.

All of this does seem to be extremely excessive to choose a book genre though. I would imagine the number of books after a simplistic clustering technique would be rather small to flip through, so I really don't understand the use case at all.

If you have very few books (few thousands) then you can apply more fine grained analyses in reasonable amounts of computation, such as contextualized embedding methods. But if the point is to select a book, there no real benefit since the simple 2 second term frequency methods would narrow choices down to only a few books.

If you have billions of books, contextualized embeddings become quite expensive to produce and use (several weeks or months of processing, petabytes of storage, etc), so it's not really feasible as an individual, But the extra querying capability does help narrow the large set down.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: