Right, and there would also be more semantic boundaries, (possibly greatly) redu...

scott_s · on March 10, 2009

The AST also makes it easier to use heuristics based on semantic information to shorten the number of subtrees you're concerned with (previously substrings).

For example, if in a C-like language, you could decide to only look at subtrees that are at least complete statements, or go to a higher granularity and only look at compound statements. You could also eliminate common idioms, such as

  for (i = 0; i < N; i++)

Which will pop up all over the place and give false positives.