> We may be able to point at a red object and call it red
That's still better than most people's pitch recognition. Play any note in the C scale to a random person (even someone who plays an instrument and has some musical skills) and their note identification will be barely a guess.
False comparison - there are about 10 octaves in the audible spectrum. Telling E4 from F4 is like distinguishing two slightly different blueish greens. A better comparison would be to classify sinewaves into bass/mid/treble, which I'm pretty sure most people can do.
That's still better than most people's pitch recognition. Play any note in the C scale to a random person (even someone who plays an instrument and has some musical skills) and their note identification will be barely a guess.