Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

See my comment below for some of the reasons I've had issues trying to test the commonly used segmentation corpora. I completely agree it would be great if there was a free (as in both speech and beer) common training set. One key would be that this common training set either provide the exact text that should be run in the segmenter or exact instructions on how to produce the text to run in the segmenter (re: see the issue I mentioned below of the ambiguity around how to actually test the Brown corpus).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: