Imitating people’s speech patterns precisely could bring trouble

jdee · on April 23, 2017

I've done a lot of work fixing up holes in bank telephone services over the years. I've got evidence of telephone banking customer service reps recording customer's voices and manually piecing together fragments in order to defeat biometric id systems and the like. I've also seen "what is the 3rd letter of your secret word" type voice challenges being pieced together over time to reveal the full secret word. It's inevitable that all these vectors will be automated at some point.

eternalvision · on April 23, 2017

"My voice is my passport. Verify me."

The flaw of this authentication mechanism was detailed in popular culture many decades ago.

nickgrosvenor · on April 24, 2017

One of the great underrated films of all time. Sneakers

brians · on April 24, 2017

That phrase is now the equivalent of "Password1!" for banks and registrars.

gregmac · on April 23, 2017

Can you explain a bit more of this? The service reps are using customers voices to get into what? Is this a targeted attack against specific people?

The secret word thing over time sounds even stranger. A single rep would need to take (length of secret word) calls with that customer to get their password. Where are they storing it, and what are they doing with it (that they can't already do using their customer rep-level system access)?

jdee · on April 23, 2017

Certainly activity is higher amongst teams that deal with higher wealth individuals, so your question about specific people is broadly correct. To get into what? Bypass biometric ID systems that are common in telephone banking systems. Audio was recorded in high fidelity via smartphones from customers and then manually pieced together in an audio editor and played back down the phone to a biometric system in order to bypass detection. As an adjunct, certain banks in the U.K. have microphones hidden in the counters of physical branches that cross reference your voice with known patterns such is the prevalence of such systems. In regard to secret words, it was a team working within the bank that shared information to crack words. High value CS teams are traditionally very small to keep "the personal touch". CS teams never get access to the full secret word. They get prompted with which questions to ask and what response to expect, so therefore gluing small answers together is the trick.

literallycancer · on April 23, 2017

Wouldn't all these issues be solved by the customer reading a one time key over the phone?

jdee · on April 23, 2017

Banks are nowhere near to being on this page yet. 99% haven't even committed to primary authentication method. It's a jumble of mobile apps, pin sentry devices, fobs, voice, logic engines, SS7 network squanning via back door agreements with smaller telco network providers, location. It's a real mess.

Can someone bookmark this post where I say the first billion dollar external bank fraud success will happen within the next 18 months please.

tim333 · on April 24, 2017

I had something close to that with Bank Sabadell. They had 40 four digit numbers on a card and you gave them one of the 40 which they chose. They've now moved to a fancier app based system,

jacquesm · on April 24, 2017

Guess who would have access to the software that generates and distributes the one time keys.

literallycancer · on April 24, 2017

Why? Have a page with a QR code seed in the internet banking. Scan it with a phone app, no interaction with customer service (unless you lose the phone).

If a bitcoin exchange can do it, I don't see why a bank couldn't (banking is easier - you can cancel transactions).

Markoff · on April 24, 2017

Insurance companies are doing this to detect stress levels and other factors to figure out if you are commiting insurance fraud when reporting accident.

TLDR - purpose can be to detect lying

secfirstmd · on April 24, 2017

I remember reading somewhere years ago that certain militarys (think Israel and the US were two of them) used speech synthesis technology to do things like give false commands over radio to enemy fighters.

RealityVoid · on April 24, 2017

Would be impressive considering the state of the art that I know of[1] would not be able to fool anyone. I find the truthfulness of this statement questionable, especially when they could have done it more easily by having someone issue the false orders. Or maybe they pieced together several real samples to issue a full command?

[1] https://news.ycombinator.com/item?id=13992454

tedmiston · on April 23, 2017

Even without a mole on the inside, social engineering is my biggest concern of any company where I have a financial account.

jacquesm · on April 24, 2017

Did anyone get charged?

jdee · on April 24, 2017

Yes arrests made. No idea of the outcome though. Plenty of people getting away with it though. Simple fraud still works. Identity theft etc. Very easy stuff. There's a great Vice documentary about fraud online somewhere where one of the fraudsters opens up his lockup to reveal 100+ garbage bags full of stolen bank statements, utility bills etc that they use to piece together fake identity ammo.

lisper · on April 23, 2017

I suspect that one of the reasons people have trouble discriminating between these synthetic voices and the real thing is that they have become accustomed to hearing compression artifacts on phone calls, which sound very similar to the stitching artifacts in synthesized voices. For someone like me who grew up talking on analog phones, the synthetic voice sounds pretty obviously synthetic.

wamatt · on April 23, 2017

What evidence is there to suggest this will always be the case, in light of continual improvements?

closeparen · on April 23, 2017

The cell carrier oligopoly companies are cheap, and would rather use their scarce and immensely valuable resources (bandwidth) to provide new services and open new revenue streams, not boost the quality of existing services which pretty much everyone already pays for.

ghaff · on April 23, 2017

And, as with many other things, the revealed preferences of many people is that they don't care much about quality of the call relative to other things.

I was just having a conversation with someone earlier today about how the average quality of most of the voice calls we make is far lower than it was decades ago on Ma Bell landlines. Of course, calls are much cheaper now, we have mobile phones, etc. Nonetheless...

Sprint's early on ads about hearing a pin drop on their fiberoptic network seem really quaint today.

literallycancer · on April 23, 2017

I'm pretty sure no landlines from decades ago come even close to a decently configured voice chat server using e.g. Opus.

ghaff · on April 23, 2017

Which, even stipulating that is the case, is not what most people use. Instead many phone conversations involve "Wait, are you still there" "Can you hear me?" [Shouted] "What?" No, we've most assuredly given up universal quality for convenience and lower prices.

We CAN provide high quality but we mostly don't want to make the tradeoffs to do so.

toast0 · on April 24, 2017

Opus has lot of variables, but if you could get 53k connections consistently with a modem, you were probably getting quality slightly under G.711 (because of robbed-bit signaling and any analog line noise). PCM encoding has a lot less latency than Opus, and circuit switching means essentially zero jitter. If you don't mind adding some latency, and can control jitter, Opus can handle wider bandwidth audio, and is much more bitrate efficient.

TeMPOraL · on April 24, 2017

> revealed preferences

It's not like phone companies offered people "better call quality XOR another service", and people consistently picked the second option.

ghaff · on April 24, 2017

Sure they do. People are ditching landlines to save money even though making do with just a cell phone often gives significantly worse quality. (Which was actually the context of the discussion I mentioned. I was debating giving up my landline--well, Xfinity Voice--even though just using my cell service would result in poorer average call quality.)

mbrookes · on April 23, 2017

That isn't true at all. Most US networks, and may others globally support VoLTE. The difference in sound quality is remarkable, to the point of being unnerving at first.

stavros · on April 23, 2017

Do both devices have to support it? What do I need to do to enable and try it, if my network supports it?

toast0 · on April 24, 2017

In order to make an end-to-end VoLTE call, everything needs to support it and have it enabled: the handsets, the network(s), call handling equipment, interconnections between networks, etc.

You can tell if your handset and network are enabled by making a call while on LTE; if your network indicator drops LTE when you make the call, it's not supported and enabled. Most carriers have a help page with a list of supported devices and how to enable it. I don't know if there's any good indicators that you have an end-to-end enabled call, unless you can tell from the sound quality.

bdonlan · on April 24, 2017

Isn't it still possible to make a low-audio-quality call over the LTE _network_ (i.e. showing LTE when you make a call) due to negotiating down to older codecs if one of the other elements of the connection doesn't support higher-quality codecs?

toast0 · on April 24, 2017

Yes, that's quite possible, but it's also still the best indicator I'm aware of that you might be getting better quality

Markoff · on April 24, 2017

you can just disable VoLTE in your mobile, no matter what network you use

stavros · on April 24, 2017

I see, thank you. I have two new Xiaomi phones that do list VoLTE in the options, so I'll try calling one from the other and seeing if it works.

Filligree · on April 24, 2017

Did it?

stavros · on April 24, 2017

I don't have them both together yet, I but I will probably update this comment later today. I now notice that I might have been premature, though, as where I am doesn't currently have LTE coverage.

lisper · on April 23, 2017

Yeah, this. And also I didn't say it would always be the case, just that it's the case now. (And I'm only even advancing that as a hypothesis.)

psyc · on April 23, 2017

Do you have data to back up this definite claim? HN is all about the science.

mhb · on April 23, 2017

Cell calls sounded like crap 20 years ago and they still sound like crap. What other sort of evidence do you want?

psyc · on April 23, 2017

I'm going to need several statistically rigorous, peer-reviewed studies, that aren't behind paywalls. Please feed me links.

sillysaurus3 · on April 23, 2017

I know you're joking, but if you haven't had the experience of using Skype on a smartphone with someone else, it's worth the hassle. It's incredible how crystal-clear the audio is. It sounds like you're next to the person.

WalterBright · on April 23, 2017

Skype has the best voice quality, followed by land lines, and cell phones are dead last. Cell phone voice quality hasn't improved since my first cell phone in the 90s.

vidarh · on April 23, 2017

I frequently lately have had to have people call me back on my cellphone because the Skype call quality has been awful. Skype seems to be much more dependent on network conditions.

mattm · on April 23, 2017

The Line app (popular in Japan) has incredible voice quality. It's like someone standing right there whispering in my ear.

lisper · on April 23, 2017

Actually HN is all about startups. Sometimes that has to do with science, sometimes not.

mirimir · on April 23, 2017

OK, now both arbitrary audio/video can be synthesized from samples. So anything remote or recorded can be fake. But hey, that's no worse than text. As Raphael noted, key-based authentication will be needed. Implementation for mass media would be tough, though. In some cases, you'd be limited to authenticating the recorder.

kakarot · on April 24, 2017

What do you think about the possibility of nonspoofable, imperceptible "signatures" that constantly revolve like OTP, that one could play in the background while they speak to you over the phone? Your device could possibly transparently interpret these signals for you.

I think it's worth exploring and that there is business potential.

mirimir · on April 24, 2017

There will be a need for something like that. But maybe it already exists. I'm pretty clueless about smartphones.

kakarot · on April 25, 2017

If it does exist, it is limited to military use afaik. I don't know of any commercial or FOSS platform like that. I was thinking more like a small usb-chargable chip that you carry around that could be used no matter what phone you are from, or to record verifiable messages.

The hurdle is creating an algo that can derive all previous signals to verify non-live recordings, but that can't derive the future signals without a private key. I'm not sure what existing bodies of work there are in that domain.

zkms · on April 23, 2017

It's weird how nobody so far has mentioned using this technology for singing -- given there already are singing voice synthesizers that have had extremely significant cultural impact: https://en.wikipedia.org/wiki/Vocaloid

hatsunearu · on April 25, 2017

The way Vocaloid voice banks are created is a highly sophisticated and guided process that involves sampling every phoneme the engine supports at various different pitches. Vocaloid is merely the engine that pieces together phonemes and apply things like vibrato, intonation, and other modulation. Very different tech.

tyingq · on April 23, 2017

Might be an interesting twist on the somewhat common fraud tactic on submitting bogus invoices via email. Follow up the email with a convincing urgent voice mail from the CEO. "Jim, these guys are personal friends of board member XYZ, can we get them paid ASAP?"

lolc · on April 23, 2017

I wonder how common illegal invoice fraud is, because most of these invoice scams are not fraud, legally. They are angling for a contract without claiming that one existed in the first place.

When you imitate someone else's voice to validate an invoice you're committing fraud at the very least. Most invoice scammers wouldn't do that because they want to stay on the legal even if scummy side of things.

tyingq · on April 24, 2017

Hmm. An invoice is usually presented as net term billing for a service or product already delivered.

lolc · on April 24, 2017

There is the "art" of creating an offer that looks like an invoice. I thought that was the most common type of scam.

I don't actually know what the ratio is between these scams and actual fraud where a bogus invoice is sent.

azinman2 · on April 24, 2017

This WILL be used with celebrates and politicians in a satirical way, and it'll be awesome.

Side note: I've been wanting to do this with my own homebrew Alexa to create "NeNe" (of housewives fame)... a sassier and way funnier assistant.

aaron695 · on April 23, 2017

True.

But don't underestimate the amazing benefits that will come with it.

Education and Entertainment will be revolutionized. It will disrupt the world in a huge way as well as create new cyberpunk criminal endeavors.

daodedickinson · on April 23, 2017

I was surprised this wasn't used to fake a bombshell recording of Trump or Clinton in the last election. It's gonna happen more and more frequently.

strathmeyer · on April 23, 2017

Well I can see an article briefly then it disappears. 2017 we still can't get web content, guess I should go back to gopher.

JorgeGT · on April 23, 2017

It's because it tries to sell you a subscription but your ad blocker blocks the ad, leaving the content blank. I disabled javascript on the domain and then the article stays.

Markoff · on April 24, 2017

strange, worked fine for me in chrome+ublock origin and ajavscript enabled, though i loaded beta version of website automatically

cryptarch · on April 23, 2017

uBlock Origin fixed that for me.

Link for the lazy: https://github.com/gorhill/uBlock

Raphael · on April 23, 2017

We'll need Signal to beep after every sentence to indicate that it was signed with the supposed person's key.

stavros · on April 23, 2017

It already gives you an error message if the message couldn't be authenticated. That's the whole point of Signal.

anotheryou · on April 23, 2017

TL;DR: impersonation scams

SFJulie · on April 24, 2017

It is weird no computer so called scientists know science.

Any detection system are characterized by the fact they will fail at detecting true positive, and tag false negative as true.

That is the 101 of science, there is a COST for detection and failure.

Biometrics are based on detection hence they can fail according to their sensitivity (false negative) and the cost of cheating (false positive). This analysis vary with progress.

AI too are detection systems, they also fail and can be defeated, it is just a question of balancing a cost/benefits analysis with a good budget. Hence criminal organizations and governments will tend to be the firsts one to exploit theses systems because they have larger budgets.

Chai-T-Rex · on April 23, 2017

>The volunteers recognised cloned speech as such only half the time (ie, no better than chance).

That's incorrect. It's like saying that one lottery ticket has a 50% chance of winning. If they're fooling people half the time, that's way better than chance, since there are so many ways of screwing things up.

obastani · on April 23, 2017

I think they are saying the volunteers performed no better than chance, i.e., volunteers are no better at guessing real vs. clone compared to randomly guessing. It's a really good outcome for the clone.

jcoffland · on April 23, 2017

What is "chance"? In the article they mean no better than flipping a coin. You chose a lottery ticket analogy, but that choice was arbitrary.

bitwize · on April 23, 2017

My voice is my passport. Verify me.

Markoff · on April 24, 2017

just to save your time - tha app from Candyvoice is not available yet

mrec · on April 23, 2017

Seems to be paywalled.

jwilk · on April 23, 2017

Archived copy:

https://archive.fo/bxUaX

jcoffland · on April 23, 2017

Doesn't all new technology bring some sort of trouble?

thomastjeffery · on April 23, 2017

"Did you see that ludicrous display last night?"

I couldn't resist.