Hijacking YouTube to transmit your data

nnq · on July 27, 2016

> This is a fundamental hole in security with no logical workaround.

There is no hole in anything. You're not violating anyone's privacy or stealing anything from anyone. Even the bandwith is given to you for free. It's how things are supposed to work.

You're just exercising your right to privacy by using such a thing.

One can tolerate someone re-inventing/re-discovering steno and making it sound like it's smth new... but not someone having no f idea whatsoever of what "security" means and what his "right to privacy" is... ffs!

codeulike · on July 27, 2016

I think he's talking about situations where a network blocks certain things, and how steganography allows you to bypass that.

edit: stegano, not steno

seanhunter · on July 27, 2016

I think you guys mean 'steganography' rather than 'stenography'.

https://en.wikipedia.org/wiki/Steganography

codeulike · on July 27, 2016

haha oops

chriswarbo · on July 26, 2016

https://en.wikipedia.org/wiki/Steganography

Animats · on July 27, 2016

Right. The article author seems to have re-invented steganography.

The hard problem is finding a way to encode data in video in a way that will survive recompression, resizing, or other video processing. The watermarking people have struggled with this for years. There are various spread-spectrum like schemes with good noise immunity that can do this.

YouTube has an ongoing battle between the copyright-infringement identification system and versions of audio and video modified to evade it.

visarga · on July 27, 2016

Can't they simply use the least significant bit from each color channel to carry data? I think a single bit flip would change colors by a factor of 1/128, indetectable for the eye. Of course, use compression, encryption and redundancy too.

Animats · on July 27, 2016

That won't survive video compression.

jwatte · on July 27, 2016

Google "steganography."

You can significantly increase the bit rate. For example, overlay a QR code over each of four consecutive frames. You can do this without frame dropping. You only need to add about 6dB of the code for this to be recoverable. Similarly, if you know how the codec works, you can exploit that. (Your proposed method is actually pessimal for a modern B frame codec!)

Then there is hiding modem transmissions in techno music sound tracks ;-)

0xmohit · on July 27, 2016

One could achieve a similar thing using PDF files by utilizing a feature called "File Attachments" [0], [1].

There is nothing insecure about it too.

You could even need commercial software for it, and could use TeX and friends [2] to achieve the same.

[0] https://blogs.adobe.com/insidepdf/2010/11/pdf-file-attachmen...

[1] https://wwwimages2.adobe.com/content/dam/Adobe/en/devnet/pdf...

[2] http://tex.stackexchange.com/questions/208012/attaching-file...

knorker · on July 27, 2016

Sigh.

Youtube deliberately has the "upload a video" feature. It's not a mistake. It's not a security hole.

Also see this misguided and confused soul:

http://seclists.org/fulldisclosure/2014/Mar/123

voltagex_ · on July 27, 2016

I wonder how difficult it would be to find the optimal "storage" method for data within YouTube videos.

On the video side, you're dealing with at least VP9 and H264 which I'm assuming "destroy" your data somewhat in the encoding process. The audio side is Opus and AAC, with similar challenges.

niftich · on July 27, 2016

H.264, VP9, etc., are all macroblock-based DCT codecs with I-frames that contain a full still image (like a JPEG), and other kinds of frames that contain instructions in terms of motion vectors of how those macroblocks move around. They also use colorspace transforms and color subsampling, so they intentionally sacrifice some color accuracy.

But writing a data stream into a 2D still image in a way that can be decoded later is a solved problem, ie. 2D barcodes like DataMatrix, QR Code, and Microsoft Tag (which has up to 8 colors to further increase data density). These formats have built-in error correction that compensates for some missing blocks. However, we can tune the format to be closer to the video codec's internal structure, to make them play nicer together.

For example, we can set each barcode block to be within 50% to 100% of pixel size of the video's macroblock, to make it more likely that the video codec can reuse the macroblock with motion vectors in a P/B-frame, instead of having to put more bits to it, or have it accidentally mangle it.

Realistically, we can also increase our color palette, as we're not going to be scanning these barcodes in bad light conditions -- all we need to do is get the color mostly right. But the more we increase the palette, the less video codec can reuse blocks; so this is something we'll want to experiment with.

The biggest problem for the barcode approach comes from the addition of the 3rd, temporal dimension. We can have each frame form its own independently scannable barcode, but doing so, we'll want to build in some temporal redundancy, ie. have a chunk of data, or error correction for said chunk, be present in more than one frame -- to protect against occasional frame drops, very inconvenient frame drops (like when you lose an I-frame and the video is grey- on green-blocky for several more frames), and offer some extra protection against "normal" decoding errors.

By the way, there are existing implementations of this concept:

[1] http://thruglassxfer.com/

[2] Demo of above: https://www.youtube.com/watch?v=2_8GlFdlb0Y

[3] Same idea, some hackable code: https://github.com/Neohapsis/QRCode-Video-Data-Exfiltration

swiley · on July 27, 2016

2D barcodes solve a slightly different problem and end up wasting a lot of space on two things:

1 you don't need a header for every frame but these barcodes do.

2 (for QR this is the worst one) there is a lot of space wasted to help detect and correct perspective distortion.

spiritus_ · on July 27, 2016

You know those youtube videos which offer additional content in the description ? We can just enable a browser plugin to retrieve these files from a small corner of the video (displaying dynamic qr codes).

Would this be useful ?

brian-armstrong · on July 27, 2016

You can generate .wav files with all sorts of modulation methods centered at the frequency of your choice with transmission measure in kbps with https://github.com/quiet/quiet which provides a wav file encoder. You could then just add this wav on top of your video.

And if you're really feeling adventurous libquiet provides floating point output that can be put into any channel like video if you're willing to plumb it in there.

</plug>

nitrogen · on July 27, 2016

How well does the modulation scheme used survive MP3/AAC/Vorbis/Opus encoding?

awesomepantsm · on July 27, 2016

This makes no sense. If you have the ability to run software to decode the youtube video, then let's be honest, what is actually stopping you from just using Tor browser, or a proxy site to get to your content? Or just a USB stick with whatever data that you downloaded elsewhere?

0x0 · on July 26, 2016

I was kind of expecting to see a live stream with white noise "modem" audio/video.. :)

carey · on July 27, 2016

I thought it might at least be something like the SSTV messages in the Portal ARG, which sounds a lot like modem noise.

melle · on July 27, 2016

Another example of steganography can be found in songs by Aphex Twin, e.g. Windowlicker (https://en.wikipedia.org/wiki/Windowlicker)

bcook · on July 27, 2016

He also put his own face into one song.

http://www.bastwood.com/?page_id=10

flashman · on July 27, 2016

> Replace every even frame with a copy of the subsequent odd frame

I think this is supposed to be previous odd frame, given that 1 2 3 4 5 6 7 8 9 10 becomes 1 1 3 3 5 5 7 7 9 9.

ryanmarsh · on July 27, 2016

Couldn't this be done with two live streams for TX/RX? effectively creating a VPN? As the author said there's plenty of modulation techniques possibly. Surely some much faster ones could be used.

The biggest downside I could think of would be the lag: data > render frames > encode frames > network > decode stream > render frames > scrape data

hoffcoder · on July 27, 2016

I think the author has not thought of frame reordering in error scenarios. If UDP is used, frames could even get dropped in the middle, and in case of TCP, frames could arrive out of order. That would wreck havoc in the odd-odd numbering sequence of the encoding.

visarga · on July 27, 2016

Tornado codes to the rescue, then.

https://en.wikipedia.org/wiki/Tornado_code

ignoramous · on July 27, 2016

Afaik, YouTube streams over TCP.

visarga · on July 27, 2016

Maybe this can be used to distribute tracker IPs / seed information for p2p networks, eliminating the need for a root server.

megablast · on July 27, 2016

They can encode the magnet number inside trailers for the actual films the number represents.

akx · on July 27, 2016

So the root server is YouTube? :)

masukomi · on July 27, 2016

> Once they identify videos that might contain encrypted data, they can then begin to work on decrypting that data. The amount of video data on the internet is massive and it is growing at an exponential rate (the zeta-bytes of data they would have to sift through, I cant even imagine the headache).

um. they already do that. They scan all the uploaded videos for copyrighted audio, fingerprinting and comparing the uploaded audio of a bajillion videos against 1/2 a bajillion songs.

joebergeron · on July 27, 2016

While this is little more than simple steganography, I'd be curious to see what kinds of encrypted data size / video size ratios are achievable, perhaps using some more nuanced techniques or approaches.

It's definitely an interesting idea, but it's really nothing new. I remember a few years ago reading about people hiding compressed .zip archives inside of jpegs or something like that.

roddux · on July 27, 2016

I wonder how much data you could reliably transmit without the video/audio quality of the base video notably changing.

Does YouTube cut out audio frequencies that are beyond the hearing range?

libeclipse · on July 26, 2016

Hmm. The word encryption is pretty loosely used.

brian-armstrong · on July 27, 2016

Well, if you have a mechanism for sending date you can always encrypt on top

cellularmitosis · on July 27, 2016

Sure, but that doesn't change the fact that the author is confusing the terms "encrypt" and "encode"

palakchokshi · on July 26, 2016

he probably encoded HODOR HODOR HODOR multiple times in that video at the end.

rasz_pl · on July 26, 2016

>for a 30 frames/second video, a 15 bit/s transfer rate is obtained.

jimktrains2 · on July 27, 2016

If you're just sending a 128bit key, we're only talking about 10sec of video.

aji · on July 26, 2016

yeah, somehow I feel this is less than optimal

eximius · on July 27, 2016

clearly. It is trivially improvable by simply adding more sections to the video. Or not caring about the original video.

on July 27, 2016

[dead]

knorker · on July 27, 2016

Did you just write the same comment twice from two separate accounts?

gambiting · on July 27, 2016

Did you just write the same comment twice from two separate accounts?