Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> There should be somewhere in the corpus, "the is spelled t h e" that this system can use to pull this out.

Such an approach would require an enormous table, containing all written words, including first and last names, and would still fail for made up words.

A more tractable approach would be to give it the map between the individual tokens and their letter component, but then you have the problem that this matching depends on the specific encoding used by the model (it varies between models). You could give it to the model during fine-tuning though.



The best approach would be to instruct it to under the hood call a function for such asks and hide the fact that it called a function.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: