Approximate Regex Matching in Python

rplnt · on Jan 20, 2012

Another approximate matching library is FuzzyWuzzy[1]. It has some handy functions and seems easier to use. Although might not be as powerful as this.

1. https://github.com/seatgeek/fuzzywuzzy

schwa · on Jan 20, 2012

Bonus points for the racist project name too.

coderdude · on Jan 20, 2012

If you're English, which they're not.

rcfox · on Jan 20, 2012

English is my first language, and I have no idea of what he's talking about.

burgerbrain · on Jan 20, 2012

http://www.urbandictionary.com/define.php?term=fuzzy%20wuzzy

Strictly an English thing.

Wilduck · on Jan 20, 2012

This is really cool, but I'm curious what the performance implications are when doing fuzzy matching? I noticed that the list of types of errors did not include transposition. I wonder if the reason for this comes down to performance/technical reasons.

coderdude · on Jan 20, 2012

The author wrote more about the library here: http://laurikari.net/tre/about/

He talks about the time complexity ("O(M^2N), where M is the length of the regular expression and N is the length of the text"), and he talks more about it under "Predictable matching speed." It doesn't look like he talks about how the fuzziness feature affects that but I could have missed it.

Wilduck · on Jan 20, 2012

Wow, I'm glad you linked to that article, otherwise I would have thought the complexity was O(M^(2N)) instead of O((M^2)N). Although, the fact that I almost mistook that complexity means I should probably spend some time learning about how regular expressions are implemented. Thanks for the link.

burgerbrain · on Jan 20, 2012

O(M^(2N)) would get absolutely ludicrous pretty damn quick.

metafunctor · on Jan 20, 2012

Author of the library here.

The fuzziness doesn't affect the worst case time complexity, but it does make the "working set" of possible states larger and slows down matching somewhat. But search time is still linear to the length of the text.

coderdude · on Jan 20, 2012

Thanks for the information and for writing this lib. It looks like it's the only regex library out there with fuzzy matching, and it's got Python bindings. Could not have asked for more!

metafunctor · on Jan 20, 2012

Adding transposition I've thought about, and received a number of requests for, but never got around to doing it. The reason isn't performance, but doing a change like this would get quite involved with the guts of the library, and I simply don't have the time...