Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Read the paper, they actually only match 2 inserts, the other two inserts are modified by the authors in such a way that they are made to match (Table 1).

Both inserts 1 and 2 also match to Streptococcus phage, but a bacteriophage would of course not be such a bold claim as HIV matches are.

Also, be aware that because of the scientific interest in HIV, there are hundreds of HIV strains sequenced, a virus known for its mutation rate (especially in these two proteins gp120 and gag, as they are under pressure to mutate in order to evade the immunesystem). So in such a large library of protein sequences one is bound to find a match of a short 6 letter (amino acid) sequence. That's why E values exist to make a statement about the statistical significance.



For posteriority, here's the link to the Blast results for the second insert: https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Get&RID=39ACRKV...


Hello posteriority here! I wanted to reference these comments in a discussion but the links expired :/


Large HIV database inflating matches is indeed a big concern. But dismissing one miss matches sounds arbitrary: these segments were not arbitrarily selected, but real insertions on tops of sars.


Again, take any random six letter amino acid sequence and chances are high that it matches to some HIV protein.

I just did the experiment with my first name that is coincidentally 6 letters long, and lo and behold: a match to HIV env protein!

Has my first name now been designed by a bioweapons facility?


> Has my first name now been designed by a bioweapons facility?

No, of course not. Your parents were designed by a bioweapons facility, so that they would choose that name.

;-)


You revealed my secret




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: