Read the paper, they actually only match 2 inserts, the other two inserts are modified by the authors in such a way that they are made to match (Table 1).
Both inserts 1 and 2 also match to Streptococcus phage, but a bacteriophage would of course not be such a bold claim as HIV matches are.
Also, be aware that because of the scientific interest in HIV, there are hundreds of HIV strains sequenced, a virus known for its mutation rate (especially in these two proteins gp120 and gag, as they are under pressure to mutate in order to evade the immunesystem). So in such a large library of protein sequences one is bound to find a match of a short 6 letter (amino acid) sequence. That's why E values exist to make a statement about the statistical significance.
Large HIV database inflating matches is indeed a big concern. But dismissing one miss matches sounds arbitrary: these segments were not arbitrarily selected, but real insertions on tops of sars.
Both inserts 1 and 2 also match to Streptococcus phage, but a bacteriophage would of course not be such a bold claim as HIV matches are.
Also, be aware that because of the scientific interest in HIV, there are hundreds of HIV strains sequenced, a virus known for its mutation rate (especially in these two proteins gp120 and gag, as they are under pressure to mutate in order to evade the immunesystem). So in such a large library of protein sequences one is bound to find a match of a short 6 letter (amino acid) sequence. That's why E values exist to make a statement about the statistical significance.