This is really cool, but I'm curious what the performance implications are when doing fuzzy matching? I noticed that the list of types of errors did not include transposition. I wonder if the reason for this comes down to performance/technical reasons.
He talks about the time complexity ("O(M^2N), where M is the length of the regular expression and N is the length of the text"), and he talks more about it under "Predictable matching speed." It doesn't look like he talks about how the fuzziness feature affects that but I could have missed it.
Wow, I'm glad you linked to that article, otherwise I would have thought the complexity was O(M^(2N)) instead of O((M^2)N). Although, the fact that I almost mistook that complexity means I should probably spend some time learning about how regular expressions are implemented. Thanks for the link.
The fuzziness doesn't affect the worst case time complexity, but it does make the "working set" of possible states larger and slows down matching somewhat. But search time is still linear to the length of the text.
Thanks for the information and for writing this lib. It looks like it's the only regex library out there with fuzzy matching, and it's got Python bindings. Could not have asked for more!
Adding transposition I've thought about, and received a number of requests for, but never got around to doing it. The reason isn't performance, but doing a change like this would get quite involved with the guts of the library, and I simply don't have the time...
1. https://github.com/seatgeek/fuzzywuzzy