Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Approximate Regex Matching in Python (hackerboss.com)
83 points by coderdude on Jan 20, 2012 | hide | past | favorite | 12 comments


Another approximate matching library is FuzzyWuzzy[1]. It has some handy functions and seems easier to use. Although might not be as powerful as this.

1. https://github.com/seatgeek/fuzzywuzzy


Bonus points for the racist project name too.


If you're English, which they're not.


English is my first language, and I have no idea of what he's talking about.



This is really cool, but I'm curious what the performance implications are when doing fuzzy matching? I noticed that the list of types of errors did not include transposition. I wonder if the reason for this comes down to performance/technical reasons.


The author wrote more about the library here: http://laurikari.net/tre/about/

He talks about the time complexity ("O(M^2N), where M is the length of the regular expression and N is the length of the text"), and he talks more about it under "Predictable matching speed." It doesn't look like he talks about how the fuzziness feature affects that but I could have missed it.


Wow, I'm glad you linked to that article, otherwise I would have thought the complexity was O(M^(2N)) instead of O((M^2)N). Although, the fact that I almost mistook that complexity means I should probably spend some time learning about how regular expressions are implemented. Thanks for the link.


O(M^(2N)) would get absolutely ludicrous pretty damn quick.


Author of the library here.

The fuzziness doesn't affect the worst case time complexity, but it does make the "working set" of possible states larger and slows down matching somewhat. But search time is still linear to the length of the text.


Thanks for the information and for writing this lib. It looks like it's the only regex library out there with fuzzy matching, and it's got Python bindings. Could not have asked for more!


Adding transposition I've thought about, and received a number of requests for, but never got around to doing it. The reason isn't performance, but doing a change like this would get quite involved with the guts of the library, and I simply don't have the time...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: