Most of those frames are redundant.

lajamerr · on Dec 12, 2022

There's value in redundancy and continuous stream of images where one follows the other.

It would be nice to have a dataset of a couple "raising" a Video recorder for 1 year as if they would a baby. A continuous stream of data.

Could train a model to predict the next frames based on what it's seen so far.

mindcrime · on Dec 12, 2022

It would be nice to have a dataset of a couple "raising" a Video recorder for 1 year as if they would a baby. A continuous stream of data.

The project I'm working on right now is to build a sort of "body" for a (non ambulatory, totally non anthropomorphic) "baby AI" that senses the world using cameras, microphones, accelerometer/magnetometer/gyroscope sensor, temperature sensors, gps, etc. The idea is exactly to carry it around with me and "raise" it for long periods of time (a year? Sure, absolutely, in principle. But see below) and explore some ideas about how learning works in that regime.

The biggest (well, one of the biggest) challenge(s) is going to be data storage. Once I start storing audio and video the storage space required is going to ramp up quickly, and since I'm paying for this out of my own pocket I'm going to be limited in terms of how much data I can keep around. Will I be able to keep a whole year? Don't know yet.

There's also some legal and ethical stuff to work out, around times when I take the thing out in public and am therefore recording audio and video of other people.

lajamerr · on Dec 12, 2022

Glad to hear you are working on such a project. There definitely will be a lot of privacy concerns in any such project so it may be difficult to open source the data to broad public.

But could still be useful to research institutes who follow privacy guidelines.

It might be best to do a short stint of 1 week to test the feasibility. That should give you a good estimate on future projections of how much data it will consume after a month, 3 months, and a year.

I imagine any intelligent system could work with reduced data quality/lossy data at least on the audio.

As long as it's consistent in the type/amount of compression. So instead of WAV/FLAC/RAW. You could encode it to something like Opus 100 Kbps and that would give you 394.2 Gigabytes of Data for a single year for the audio.

As for video... it would definitely require a lot of tricks to store on a hobbyist level.

mindcrime · on Dec 12, 2022

Yep. Your reply here encapsulates a lot of what I've been thinking about for the past few weeks. I'd love to open-source at least some of the data I collect, but the privacy/ethics issues have to be considered. And as far as that goes, there are legal/ethical issues around simply collecting data even if I don't share it, that come into play where other people are involved.

It might be best to do a short stint of 1 week to test the feasibility. That should give you a good estimate on future projections of how much data it will consume after a month, 3 months, and a year.

Yep. That's basically the approach I took with "phase 1" where the only data being ingested was gps / accelerometer data. I just let it run for a couple of weeks and then extrapolated out what the storage requirements would be for the future. Obviously audio and video are going to change the equation a lot, but the same principle is what I am planning to employ.

I imagine any intelligent system could work with reduced data quality/lossy data at least on the audio.

Yep, that's another area I've been thinking a lot about. The "instinct" is to capture everything at the highest possible resolution / sampling rate / etc. and store in a totally lossless format. But that is also the most expensive scenario and if it's not strictly required, then why do it? We know human hearing at least can work with relatively crappy audio. Look at the POTS phone system and it's 8khz of bandwidth for example. Does that analogy hold for video? Good question.

As long as it's consistent in the type/amount of compression. So instead of WAV/FLAC/RAW. You could encode it to something like Opus 100 Kbps and that would give you 394.2 Gigabytes of Data for a single year for the audio.

Agreed.

As for video... it would definitely require a lot of tricks to store on a hobbyist level.

Definitely. One thing that may help with costs in the short-term is that I'm very explicitly not (for now anyway) using a cloud storage service. Data ingestion is to a server I own and physically have in my home. I can get away with this because while the aggregate total amount of data may wind up fairly big over longer periods of time, the rate at which I need to ingest data isn't all that high (there's only one of these devices sending to the server). And I can just keep adding 5TB or 10TB drives as needed. When one fills up, I can unplug it, replace it with another, label and store it, and move on. The big risks here are that I don't really have any redundancy in that scenario, especially if my home burns down or something. But in that case I have bigger problems to worry about anyway!

There are other downsides to this approach, like dealing with the case of needing to access the entire year's worth of data "at once" for analysis or training, but I'm not sure that need will ever even arise.

sharemywin · on Dec 12, 2022

here was an article on using latent embeddings for compression. might be useful.

https://pub.towardsai.net/stable-diffusion-based-image-compr...

bena · on Dec 12, 2022

And unclassified. And of poor quality.

Babies have a much harder task. They have to construct a corpus of knowledge from absolutely nothing.

the8472 · on Dec 12, 2022

The upside is that babies get to interact with the environment they're training on. Image models can't move the camera a few cm to the right if they're interested in the perspective of a particular scene.

coolspot · on Dec 12, 2022

Not absolutely nothing, the neural net is initialized with some weights encoding basic things (breathing, sucking, crying, etc.). Newborn horse walks and follows mother after first 5-10 minutes.

trasz2 · on Dec 12, 2022

How do we know they start from nothing?

CamperBob2 · on Dec 12, 2022

In fact, we're pretty sure that they don't "start from nothing." E.g., https://en.wikipedia.org/wiki/The_Language_Instinct

bena · on Dec 12, 2022

We're not pretty sure of anything e.g. https://en.wikipedia.org/wiki/Educating_Eve

CamperBob2 · on Dec 12, 2022

On the surface, that sounds like a reasonable position to take. ("Cowley proposes an alternative: that language acquisition involves culturally determined language skills, apprehended by a biologically determined faculty that responds to them. In other words, he proposes that each extreme is right in what it affirms, but wrong in what it denies. Both cultural diversity of language, and a learning instinct, can be affirmed; neither need be denied.")

GPT's ability to fool intelligent people into thinking that it is "intelligent" itself seems like a powerful argument that language, more than anything else, is what makes humans capable of higher thought. Language is all GPT has. (Well, that and a huge-ass cultural database.)

Intelligence is one of those areas in which, once you fake it well enough, you've effectively made it. Another 10x will be enough to tie the game against an average human player.

bena · on Dec 12, 2022

There's a really easy, yet unconscionably horrible experiment we could perform to test the assumption that we're preprogrammed with any sort of knowledge.

Take a baby and stick it in a room. Let it grow up with absolutely no stimulation whatsoever. They are given food and that's about it. What do you think it can demonstrate knowledge of by the time it reaches 5? 10? 15?

All behavior is learned behavior. People talk about sucking and breathing and walking horses and what not, but babies do have to learn how to latch and how to feed. Now, they can work it out themselves. But quick acquisition of a skill does not mean the skill already existed.

Not to mention it's a far cry from sucking to language. Or knowing what a person is. Or who a person is.