I've done a lot of work fixing up holes in bank telephone services over the years. I've got evidence of telephone banking customer service reps recording customer's voices and manually piecing together fragments in order to defeat biometric id systems and the like. I've also seen "what is the 3rd letter of your secret word" type voice challenges being pieced together over time to reveal the full secret word. It's inevitable that all these vectors will be automated at some point.
Can you explain a bit more of this? The service reps are using customers voices to get into what? Is this a targeted attack against specific people?
The secret word thing over time sounds even stranger. A single rep would need to take (length of secret word) calls with that customer to get their password. Where are they storing it, and what are they doing with it (that they can't already do using their customer rep-level system access)?
Certainly activity is higher amongst teams that deal with higher wealth individuals, so your question about specific people is broadly correct.
To get into what? Bypass biometric ID systems that are common in telephone banking systems. Audio was recorded in high fidelity via smartphones from customers and then manually pieced together in an audio editor and played back down the phone to a biometric system in order to bypass detection. As an adjunct, certain banks in the U.K. have microphones hidden in the counters of physical branches that cross reference your voice with known patterns such is the prevalence of such systems.
In regard to secret words, it was a team working within the bank that shared information to crack words. High value CS teams are traditionally very small to keep "the personal touch". CS teams never get access to the full secret word. They get prompted with which questions to ask and what response to expect, so therefore gluing small answers together is the trick.
Banks are nowhere near to being on this page yet. 99% haven't even committed to primary authentication method. It's a jumble of mobile apps, pin sentry devices, fobs, voice, logic engines, SS7 network squanning via back door agreements with smaller telco network providers, location. It's a real mess.
Can someone bookmark this post where I say the first billion dollar external bank fraud success will happen within the next 18 months please.
I had something close to that with Bank Sabadell. They had 40 four digit numbers on a card and you gave them one of the 40 which they chose. They've now moved to a fancier app based system,
Why? Have a page with a QR code seed in the internet banking. Scan it with a phone app, no interaction with customer service (unless you lose the phone).
If a bitcoin exchange can do it, I don't see why a bank couldn't (banking is easier - you can cancel transactions).
Insurance companies are doing this to detect stress levels and other factors to figure out if you are commiting insurance fraud when reporting accident.
I remember reading somewhere years ago that certain militarys (think Israel and the US were two of them) used speech synthesis technology to do things like give false commands over radio to enemy fighters.
Would be impressive considering the state of the art that I know of[1] would not be able to fool anyone. I find the truthfulness of this statement questionable, especially when they could have done it more easily by having someone issue the false orders. Or maybe they pieced together several real samples to issue a full command?
Yes arrests made. No idea of the outcome though. Plenty of people getting away with it though. Simple fraud still works. Identity theft etc. Very easy stuff. There's a great Vice documentary about fraud online somewhere where one of the fraudsters opens up his lockup to reveal 100+ garbage bags full of stolen bank statements, utility bills etc that they use to piece together fake identity ammo.
I suspect that one of the reasons people have trouble discriminating between these synthetic voices and the real thing is that they have become accustomed to hearing compression artifacts on phone calls, which sound very similar to the stitching artifacts in synthesized voices. For someone like me who grew up talking on analog phones, the synthetic voice sounds pretty obviously synthetic.
The cell carrier oligopoly companies are cheap, and would rather use their scarce and immensely valuable resources (bandwidth) to provide new services and open new revenue streams, not boost the quality of existing services which pretty much everyone already pays for.
And, as with many other things, the revealed preferences of many people is that they don't care much about quality of the call relative to other things.
I was just having a conversation with someone earlier today about how the average quality of most of the voice calls we make is far lower than it was decades ago on Ma Bell landlines. Of course, calls are much cheaper now, we have mobile phones, etc. Nonetheless...
Sprint's early on ads about hearing a pin drop on their fiberoptic network seem really quaint today.
Which, even stipulating that is the case, is not what most people use. Instead many phone conversations involve "Wait, are you still there" "Can you hear me?" [Shouted] "What?" No, we've most assuredly given up universal quality for convenience and lower prices.
We CAN provide high quality but we mostly don't want to make the tradeoffs to do so.
Opus has lot of variables, but if you could get 53k connections consistently with a modem, you were probably getting quality slightly under G.711 (because of robbed-bit signaling and any analog line noise). PCM encoding has a lot less latency than Opus, and circuit switching means essentially zero jitter. If you don't mind adding some latency, and can control jitter, Opus can handle wider bandwidth audio, and is much more bitrate efficient.
Sure they do. People are ditching landlines to save money even though making do with just a cell phone often gives significantly worse quality. (Which was actually the context of the discussion I mentioned. I was debating giving up my landline--well, Xfinity Voice--even though just using my cell service would result in poorer average call quality.)
That isn't true at all. Most US networks, and may others globally support VoLTE. The difference in sound quality is remarkable, to the point of being unnerving at first.
In order to make an end-to-end VoLTE call, everything needs to support it and have it enabled: the handsets, the network(s), call handling equipment, interconnections between networks, etc.
You can tell if your handset and network are enabled by making a call while on LTE; if your network indicator drops LTE when you make the call, it's not supported and enabled. Most carriers have a help page with a list of supported devices and how to enable it. I don't know if there's any good indicators that you have an end-to-end enabled call, unless you can tell from the sound quality.
Isn't it still possible to make a low-audio-quality call over the LTE _network_ (i.e. showing LTE when you make a call) due to negotiating down to older codecs if one of the other elements of the connection doesn't support higher-quality codecs?
I don't have them both together yet, I but I will probably update this comment later today. I now notice that I might have been premature, though, as where I am doesn't currently have LTE coverage.
I know you're joking, but if you haven't had the experience of using Skype on a smartphone with someone else, it's worth the hassle. It's incredible how crystal-clear the audio is. It sounds like you're next to the person.
Skype has the best voice quality, followed by land lines, and cell phones are dead last. Cell phone voice quality hasn't improved since my first cell phone in the 90s.
I frequently lately have had to have people call me back on my cellphone because the Skype call quality has been awful. Skype seems to be much more dependent on network conditions.
OK, now both arbitrary audio/video can be synthesized from samples. So anything remote or recorded can be fake. But hey, that's no worse than text. As Raphael noted, key-based authentication will be needed. Implementation for mass media would be tough, though. In some cases, you'd be limited to authenticating the recorder.
What do you think about the possibility of nonspoofable, imperceptible "signatures" that constantly revolve like OTP, that one could play in the background while they speak to you over the phone? Your device could possibly transparently interpret these signals for you.
I think it's worth exploring and that there is business potential.
If it does exist, it is limited to military use afaik. I don't know of any commercial or FOSS platform like that. I was thinking more like a small usb-chargable chip that you carry around that could be used no matter what phone you are from, or to record verifiable messages.
The hurdle is creating an algo that can derive all previous signals to verify non-live recordings, but that can't derive the future signals without a private key. I'm not sure what existing bodies of work there are in that domain.
It's weird how nobody so far has mentioned using this technology for singing -- given there already are singing voice synthesizers that have had extremely significant cultural impact: https://en.wikipedia.org/wiki/Vocaloid
The way Vocaloid voice banks are created is a highly sophisticated and guided process that involves sampling every phoneme the engine supports at various different pitches. Vocaloid is merely the engine that pieces together phonemes and apply things like vibrato, intonation, and other modulation. Very different tech.
Might be an interesting twist on the somewhat common fraud tactic on submitting bogus invoices via email. Follow up the email with a convincing urgent voice mail from the CEO. "Jim, these guys are personal friends of board member XYZ, can we get them paid ASAP?"
I wonder how common illegal invoice fraud is, because most of these invoice scams are not fraud, legally. They are angling for a contract without claiming that one existed in the first place.
When you imitate someone else's voice to validate an invoice you're committing fraud at the very least. Most invoice scammers wouldn't do that because they want to stay on the legal even if scummy side of things.
It's because it tries to sell you a subscription but your ad blocker blocks the ad, leaving the content blank. I disabled javascript on the domain and then the article stays.
It is weird no computer so called scientists know science.
Any detection system are characterized by the fact they will fail at detecting true positive, and tag false negative as true.
That is the 101 of science, there is a COST for detection and failure.
Biometrics are based on detection hence they can fail according to their sensitivity (false negative) and the cost of cheating (false positive). This analysis vary with progress.
AI too are detection systems, they also fail and can be defeated, it is just a question of balancing a cost/benefits analysis with a good budget. Hence criminal organizations and governments will tend to be the firsts one to exploit theses systems because they have larger budgets.
>The volunteers recognised cloned speech as such only half the time (ie, no better than chance).
That's incorrect. It's like saying that one lottery ticket has a 50% chance of winning. If they're fooling people half the time, that's way better than chance, since there are so many ways of screwing things up.
I think they are saying the volunteers performed no better than chance, i.e., volunteers are no better at guessing real vs. clone compared to randomly guessing. It's a really good outcome for the clone.