Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Congratulations on the acquisition!

I'm assuming that puts an end to my dreams of Wit.ai supporting device-local voice recognition with context-limited vocabularies to drive user interfaces though...



... device-local voice recognition with context-limited vocabularies to drive user interfaces

That would be really interesting to me as well. Do you know of any other project or startup working on that?

Similar to the OpenCV library for computer vision, I wish there were an "OpenVC" for voice control.


I found pocketsphinx pretty easy to work with. I use it for command voice recognition for home automation stuff. It's even pretty accurate when using limited vocabulary models, after some tweaking. There are python bindings, though they lag behind the C api slightly, and even the C api is well-commented and the code is clean.


Do you have a write-up/blog post or article about pocketsphinx? How do you create/train the vocabulary models? If you use Text-to-speech too (CMU Flite, eSpeak), maybe you know some good resources there as well?


I haven't written anything up yet. To summarize, I use the standard acoustic model (hub4wsj_sc_8k) with a combination of keyword activation and a fixed grammar (in jsgf format). It's normally listening for a wakeup keyword, and when it find one, switches into grammar mode until it hears a complete utterance, or times out, then switches back to keyword mode. It works pretty well, though tuning the keyword sensitivity is annoying.

The pocketsphinx-specific code is actually quite simple:

https://bitbucket.org/davidn/dom/src/default/listen/listen.c

You can see the keyword and jsgf files in that directory, for reference. The pronunciation dictionary is generated from one of the standard dictionaries, selecting just the works present in the grammar.

Note: If you look in the whole directory, there's a mix of decent code and ugly hacks in there, and I didn't make any attempt at making it customizable. It's just for me. Btw, it all runs on a raspberry pi.


Thanks!

Btw. their probabilistic parser (http://goo.gl/rRdRx4) might be useful for your project: "wake me up the first Friday of February 2014 at 7am"

Great to hear that RaspberryPi (and Python) is fast enough for it.


There's OpenEars, which is free: http://www.politepix.com/openears/.

Basically, you can define vocabulary sets to check against, and you provide a callback to know when a word has been detected. Depending on the complexity of the vocabularies, you might need to spring for the Rejecto plugin to improve the results.


Check out mindmeld.com (built by Expect Labs). It is a platform for building voice-driven interfaces with context-limited vocabularies.


Check out Robin Labs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: