Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It doesn't seem to be handling multi-word proper nouns as I thought it would:

"Pride and Prejudice is a good book." becomes "Pride/NNP and/CC Prejudice/NNP is/VBZ a/DT good/JJ book/NN ./." I would have thought "Pride and Prejudice" would be lumped together.



It's a little confusing but this looks like a front-end to Stanford's Part-of-Speech tagger. POS taggers do not group multi-word tokens. This would be the role of a chunker or a parser.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: