Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Imitating people’s speech patterns precisely could bring trouble (economist.com)
133 points by augustocallejas on April 23, 2017 | hide | past | favorite | 74 comments


I've done a lot of work fixing up holes in bank telephone services over the years. I've got evidence of telephone banking customer service reps recording customer's voices and manually piecing together fragments in order to defeat biometric id systems and the like. I've also seen "what is the 3rd letter of your secret word" type voice challenges being pieced together over time to reveal the full secret word. It's inevitable that all these vectors will be automated at some point.


"My voice is my passport. Verify me."

The flaw of this authentication mechanism was detailed in popular culture many decades ago.


One of the great underrated films of all time. Sneakers


That phrase is now the equivalent of "Password1!" for banks and registrars.


Can you explain a bit more of this? The service reps are using customers voices to get into what? Is this a targeted attack against specific people?

The secret word thing over time sounds even stranger. A single rep would need to take (length of secret word) calls with that customer to get their password. Where are they storing it, and what are they doing with it (that they can't already do using their customer rep-level system access)?


Certainly activity is higher amongst teams that deal with higher wealth individuals, so your question about specific people is broadly correct. To get into what? Bypass biometric ID systems that are common in telephone banking systems. Audio was recorded in high fidelity via smartphones from customers and then manually pieced together in an audio editor and played back down the phone to a biometric system in order to bypass detection. As an adjunct, certain banks in the U.K. have microphones hidden in the counters of physical branches that cross reference your voice with known patterns such is the prevalence of such systems. In regard to secret words, it was a team working within the bank that shared information to crack words. High value CS teams are traditionally very small to keep "the personal touch". CS teams never get access to the full secret word. They get prompted with which questions to ask and what response to expect, so therefore gluing small answers together is the trick.


Wouldn't all these issues be solved by the customer reading a one time key over the phone?


Banks are nowhere near to being on this page yet. 99% haven't even committed to primary authentication method. It's a jumble of mobile apps, pin sentry devices, fobs, voice, logic engines, SS7 network squanning via back door agreements with smaller telco network providers, location. It's a real mess.

Can someone bookmark this post where I say the first billion dollar external bank fraud success will happen within the next 18 months please.


I had something close to that with Bank Sabadell. They had 40 four digit numbers on a card and you gave them one of the 40 which they chose. They've now moved to a fancier app based system,


Guess who would have access to the software that generates and distributes the one time keys.


Why? Have a page with a QR code seed in the internet banking. Scan it with a phone app, no interaction with customer service (unless you lose the phone).

If a bitcoin exchange can do it, I don't see why a bank couldn't (banking is easier - you can cancel transactions).


Insurance companies are doing this to detect stress levels and other factors to figure out if you are commiting insurance fraud when reporting accident.

TLDR - purpose can be to detect lying


I remember reading somewhere years ago that certain militarys (think Israel and the US were two of them) used speech synthesis technology to do things like give false commands over radio to enemy fighters.


Would be impressive considering the state of the art that I know of[1] would not be able to fool anyone. I find the truthfulness of this statement questionable, especially when they could have done it more easily by having someone issue the false orders. Or maybe they pieced together several real samples to issue a full command?

[1] https://news.ycombinator.com/item?id=13992454


Even without a mole on the inside, social engineering is my biggest concern of any company where I have a financial account.


Did anyone get charged?


Yes arrests made. No idea of the outcome though. Plenty of people getting away with it though. Simple fraud still works. Identity theft etc. Very easy stuff. There's a great Vice documentary about fraud online somewhere where one of the fraudsters opens up his lockup to reveal 100+ garbage bags full of stolen bank statements, utility bills etc that they use to piece together fake identity ammo.


I suspect that one of the reasons people have trouble discriminating between these synthetic voices and the real thing is that they have become accustomed to hearing compression artifacts on phone calls, which sound very similar to the stitching artifacts in synthesized voices. For someone like me who grew up talking on analog phones, the synthetic voice sounds pretty obviously synthetic.


What evidence is there to suggest this will always be the case, in light of continual improvements?


The cell carrier oligopoly companies are cheap, and would rather use their scarce and immensely valuable resources (bandwidth) to provide new services and open new revenue streams, not boost the quality of existing services which pretty much everyone already pays for.


And, as with many other things, the revealed preferences of many people is that they don't care much about quality of the call relative to other things.

I was just having a conversation with someone earlier today about how the average quality of most of the voice calls we make is far lower than it was decades ago on Ma Bell landlines. Of course, calls are much cheaper now, we have mobile phones, etc. Nonetheless...

Sprint's early on ads about hearing a pin drop on their fiberoptic network seem really quaint today.


I'm pretty sure no landlines from decades ago come even close to a decently configured voice chat server using e.g. Opus.


Which, even stipulating that is the case, is not what most people use. Instead many phone conversations involve "Wait, are you still there" "Can you hear me?" [Shouted] "What?" No, we've most assuredly given up universal quality for convenience and lower prices.

We CAN provide high quality but we mostly don't want to make the tradeoffs to do so.


Opus has lot of variables, but if you could get 53k connections consistently with a modem, you were probably getting quality slightly under G.711 (because of robbed-bit signaling and any analog line noise). PCM encoding has a lot less latency than Opus, and circuit switching means essentially zero jitter. If you don't mind adding some latency, and can control jitter, Opus can handle wider bandwidth audio, and is much more bitrate efficient.


> revealed preferences

It's not like phone companies offered people "better call quality XOR another service", and people consistently picked the second option.


Sure they do. People are ditching landlines to save money even though making do with just a cell phone often gives significantly worse quality. (Which was actually the context of the discussion I mentioned. I was debating giving up my landline--well, Xfinity Voice--even though just using my cell service would result in poorer average call quality.)


That isn't true at all. Most US networks, and may others globally support VoLTE. The difference in sound quality is remarkable, to the point of being unnerving at first.


Do both devices have to support it? What do I need to do to enable and try it, if my network supports it?


In order to make an end-to-end VoLTE call, everything needs to support it and have it enabled: the handsets, the network(s), call handling equipment, interconnections between networks, etc.

You can tell if your handset and network are enabled by making a call while on LTE; if your network indicator drops LTE when you make the call, it's not supported and enabled. Most carriers have a help page with a list of supported devices and how to enable it. I don't know if there's any good indicators that you have an end-to-end enabled call, unless you can tell from the sound quality.


Isn't it still possible to make a low-audio-quality call over the LTE _network_ (i.e. showing LTE when you make a call) due to negotiating down to older codecs if one of the other elements of the connection doesn't support higher-quality codecs?


Yes, that's quite possible, but it's also still the best indicator I'm aware of that you might be getting better quality


you can just disable VoLTE in your mobile, no matter what network you use


I see, thank you. I have two new Xiaomi phones that do list VoLTE in the options, so I'll try calling one from the other and seeing if it works.


Did it?


I don't have them both together yet, I but I will probably update this comment later today. I now notice that I might have been premature, though, as where I am doesn't currently have LTE coverage.


Yeah, this. And also I didn't say it would always be the case, just that it's the case now. (And I'm only even advancing that as a hypothesis.)


Do you have data to back up this definite claim? HN is all about the science.


Cell calls sounded like crap 20 years ago and they still sound like crap. What other sort of evidence do you want?


I'm going to need several statistically rigorous, peer-reviewed studies, that aren't behind paywalls. Please feed me links.


I know you're joking, but if you haven't had the experience of using Skype on a smartphone with someone else, it's worth the hassle. It's incredible how crystal-clear the audio is. It sounds like you're next to the person.


Skype has the best voice quality, followed by land lines, and cell phones are dead last. Cell phone voice quality hasn't improved since my first cell phone in the 90s.


I frequently lately have had to have people call me back on my cellphone because the Skype call quality has been awful. Skype seems to be much more dependent on network conditions.


The Line app (popular in Japan) has incredible voice quality. It's like someone standing right there whispering in my ear.


Actually HN is all about startups. Sometimes that has to do with science, sometimes not.


OK, now both arbitrary audio/video can be synthesized from samples. So anything remote or recorded can be fake. But hey, that's no worse than text. As Raphael noted, key-based authentication will be needed. Implementation for mass media would be tough, though. In some cases, you'd be limited to authenticating the recorder.


What do you think about the possibility of nonspoofable, imperceptible "signatures" that constantly revolve like OTP, that one could play in the background while they speak to you over the phone? Your device could possibly transparently interpret these signals for you.

I think it's worth exploring and that there is business potential.


There will be a need for something like that. But maybe it already exists. I'm pretty clueless about smartphones.


If it does exist, it is limited to military use afaik. I don't know of any commercial or FOSS platform like that. I was thinking more like a small usb-chargable chip that you carry around that could be used no matter what phone you are from, or to record verifiable messages.

The hurdle is creating an algo that can derive all previous signals to verify non-live recordings, but that can't derive the future signals without a private key. I'm not sure what existing bodies of work there are in that domain.


It's weird how nobody so far has mentioned using this technology for singing -- given there already are singing voice synthesizers that have had extremely significant cultural impact: https://en.wikipedia.org/wiki/Vocaloid


The way Vocaloid voice banks are created is a highly sophisticated and guided process that involves sampling every phoneme the engine supports at various different pitches. Vocaloid is merely the engine that pieces together phonemes and apply things like vibrato, intonation, and other modulation. Very different tech.


Might be an interesting twist on the somewhat common fraud tactic on submitting bogus invoices via email. Follow up the email with a convincing urgent voice mail from the CEO. "Jim, these guys are personal friends of board member XYZ, can we get them paid ASAP?"


I wonder how common illegal invoice fraud is, because most of these invoice scams are not fraud, legally. They are angling for a contract without claiming that one existed in the first place.

When you imitate someone else's voice to validate an invoice you're committing fraud at the very least. Most invoice scammers wouldn't do that because they want to stay on the legal even if scummy side of things.


Hmm. An invoice is usually presented as net term billing for a service or product already delivered.


There is the "art" of creating an offer that looks like an invoice. I thought that was the most common type of scam.

I don't actually know what the ratio is between these scams and actual fraud where a bogus invoice is sent.


This WILL be used with celebrates and politicians in a satirical way, and it'll be awesome.

Side note: I've been wanting to do this with my own homebrew Alexa to create "NeNe" (of housewives fame)... a sassier and way funnier assistant.


True.

But don't underestimate the amazing benefits that will come with it.

Education and Entertainment will be revolutionized. It will disrupt the world in a huge way as well as create new cyberpunk criminal endeavors.


I was surprised this wasn't used to fake a bombshell recording of Trump or Clinton in the last election. It's gonna happen more and more frequently.


Well I can see an article briefly then it disappears. 2017 we still can't get web content, guess I should go back to gopher.


It's because it tries to sell you a subscription but your ad blocker blocks the ad, leaving the content blank. I disabled javascript on the domain and then the article stays.


strange, worked fine for me in chrome+ublock origin and ajavscript enabled, though i loaded beta version of website automatically


uBlock Origin fixed that for me.

Link for the lazy: https://github.com/gorhill/uBlock


We'll need Signal to beep after every sentence to indicate that it was signed with the supposed person's key.


It already gives you an error message if the message couldn't be authenticated. That's the whole point of Signal.


TL;DR: impersonation scams


It is weird no computer so called scientists know science.

Any detection system are characterized by the fact they will fail at detecting true positive, and tag false negative as true.

That is the 101 of science, there is a COST for detection and failure.

Biometrics are based on detection hence they can fail according to their sensitivity (false negative) and the cost of cheating (false positive). This analysis vary with progress.

AI too are detection systems, they also fail and can be defeated, it is just a question of balancing a cost/benefits analysis with a good budget. Hence criminal organizations and governments will tend to be the firsts one to exploit theses systems because they have larger budgets.


>The volunteers recognised cloned speech as such only half the time (ie, no better than chance).

That's incorrect. It's like saying that one lottery ticket has a 50% chance of winning. If they're fooling people half the time, that's way better than chance, since there are so many ways of screwing things up.


I think they are saying the volunteers performed no better than chance, i.e., volunteers are no better at guessing real vs. clone compared to randomly guessing. It's a really good outcome for the clone.


What is "chance"? In the article they mean no better than flipping a coin. You chose a lottery ticket analogy, but that choice was arbitrary.


My voice is my passport. Verify me.


just to save your time - tha app from Candyvoice is not available yet


Seems to be paywalled.



Doesn't all new technology bring some sort of trouble?


"Did you see that ludicrous display last night?"

I couldn't resist.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: