Hacker Newsnew | past | comments | ask | show | jobs | submit | ljoshua's commentslogin

Xoogler here (so I can’t help with any changes or feature requests now unfortunately) but yes, all of Google runs on Gmail. The amount of email I got as a software engineer there was crazy voluminous. I and most Googlers around me made heavy use of filters, labels, and all the text search operators available (see https://support.google.com/mail/answer/7190?hl=en&co=GENIE.P...). I also learned to operate Gmail purely via the keyboard with the built in keyboard shortcuts.

I’d occasionally have the frustration of not finding what I was looking for, but usually if I combined a search with at least one other operator (who it was from, what label it might have received, etc.) I almost always found what I was looking for pretty quickly.

And as for the signature image attachments thing, I think that’s actually an artifact of how the sender compiles the email, not Gmail. The “has:attachment” operator is one I use a lot and is usually quite reliable.

Hope that provides a little insight!


Hey @code_brian, would Tavus make the conversational audio model available outside of the PALs and video models? Seems like this could be a great use case for voice-only agents as well.

You can reach out to our sales team. You can chat with our AI SDR here, and they will review it and reach out. https://www.tavus.io/demo

Just a plug for the book this content is derived from, Noam Wasserman's "The Founders Dilemmas." It lays out so many facets of startup decisions that deserve thought from the outset to prevent issues. It also strikes a good balance IMO between being based in statistics and research, and including anecdotes from actual experiences that bring the statistics full circle. I'd highly recommend it.


I’ve never been on the Spotify train, but with an all-Apple household, including HomePod Minis in multiple rooms, I’ve been stuck in iTunes/Apple Music land. We own our music, which is nice. And I dutifully pay the $24.99 per year for iTunes Match so that I can tell Siri what to play on HomePods, but I will be 0% surprised when they deprecate that service.

Anyone have a good non-Apple way of getting Siri to play songs from a personal music collection on HomePods? My kids use it most.


Just yesterday I paid my annual $24.99 iTunes Match subscription to keep my music library synced between my laptops, phones, and HomePods. It’s a beautiful thing, but it feels tenuous every time that renewal goes in. Will it be my last? I hope not!

There is just something about actually owning the music that appeals to me and my wife (and yes, we’re children of the original iTunes era, when you could load up your playlist and then click the cool nuclear-looking button in iTunes to burn it to a CD). It won’t last forever, but I’ll keep with it till it dies because it works.


Every month, many newspapers publish a list of "Here's what's leaving Netflix this month."

Because of the "100 million songs!" marketing from the various streaming services, a lot of people don't realize this happens with music, too.

My wife is a streaming person, and occasionally something she likes will no longer be available when she looks for it. I think the most recent instance was some David Bowie album.

That can't happen to me, because I buy my music. It can't be taken away because some company's contract changed somewhere.

I like to believe I'm extra-resilient because I buy my music as MP3's and then import it into iTunes.


> I buy my music as MP3's

How?


Amazon.


Less a technical comment and more just a mind-blown comment, but I still can’t get over just how much data is compressed into and available in these downloadable models. Yesterday I was on a plane with no WiFi, but had gemma3:12b downloaded through Ollama. Was playing around with it and showing my kids, and we fired history questions at it, questions about recent video games, and some animal fact questions. It wasn’t perfect, but holy cow the breadth of information that is embedded in an 8.1 GB file is incredible! Lossy, sure, but a pretty amazing way of compressing all of human knowledge into something incredibly contained.


It's extremely interesting how powerful a language model is at compression.

When you train it to be an assistant model, it's better at compressing assistant transcripts than it is general text.

There is an eval which I have a lot of interested in and respect for https://huggingface.co/spaces/Jellyfish042/UncheatableEval called UncheatableEval, which tests how good of a language model an LLM is by applying it on a range of compression tasks.

This task is essentially impossible to 'cheat'. Compression is a benchmark you cannot game!


Knowledge is learning relationships by decontextualizing information into generalized components. Application of knowledge is recontextualizing these components based on the problem at hand.

This is essentially just compression and decompression. It's just that with prior compression techniques, we never tried leveraging the inherent relationships encoded in a compressed data structure, because our compression schemes did not leverage semantic information in a generalized way and thus did not encode very meaningful relationships other than "this data uses the letter 'e' quite a lot".

A lot of that comes from the sheer amount of data we throw at these models, which provide enough substrate for semantic compression. Compare that to common compression schemes in the wild, where data is compressed in isolation without contributing its information to some model of the world. It turns out that because of this, we've been leaving a lot on the table with regards to compression. Another factor has been the speed/efficiency tradeoff. GPUs have allowed us to put a lot more into efficiency, and the expectations that many language models only need to produce text as fast as it can be read by a human means that we can even further optimize for efficiency over speed.

Also, shout out to Fabrice Bellard's ts_zip, which leverages LLMs to compress text files. https://bellard.org/ts_zip/


And of course, once we extended lossy compression to make use of the semantic space, we started getting compression artifacts in semantic space - aka "hallucinations".


That seems worthy of a blog post!


I don't know, it's not that profound of an insight. You throw away color information, the image gets blocky. You throw away frequency information, the image gets blurry. You throw away semantic information, shit stops making sense :).

Still, if someone would turn that into a blog post, I'd happily read it.


There's more to it than that. You can draw strong analogies and also discuss where the analogy suffers. For example, you can compare decreased performance with accurately recalling specific information with high-frequency attenuation in lossy codecs.


Agreed. It's basically lossy compression for everything it's ever read. And the quantization impacts the lossiness, but since a lot of text is super fluffy, we tend not to notice as much as we would when we, say, listen to music that has been compressed in a lossy way.


It's a bit like if you trained a virtual band to play any song ever, then told it to do its own version of the songs. Then prompted it to play whatever specific thing you wanted. It won't be the same because it kinda remembers the right thing sorta, but it's also winging it.


I've been referring to LLMs as JPEG for all the world's data, and people have really started to come around to it. Initially most folks tended to outright reject this comparison.


Ted Chiang wrote a great piece about that: https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

I think it's a solid description for a raw model, but it's less applicable once you start combining an LLM with better context and tools.

What's interesting to me isn't the stuff the LLM "knows" - it's how well an LLM system can serve me when combined with RAG and tools like web search and access to a compiler.

The most interesting developments right now are models like Gemma 3n which are designed to have as much capability as possible without needing a huge amount of "facts" baked into them.


Wikipedia is about 24GB, so if you're allowed to drop 1/3 of the details and make up the missing parts by splicing in random text, 8GB doesn't sound too bad.

To me the amazing thing is that you can tell the model to do something, even follow simple instructions in plain English, like make a list or write some python code to do $x, that's the really amazing part.


It blows my mind that I can ask for 50 synonyms, instantly get a great list with great meaning summaries.

Then ask for the same list sorted and get that nearly instantly,

These models have a short time context for now, but they already have a huge “working memory” relative to us.

It is very cool. And indicative that vastly smarter models are going to be achieved fairly easily, with new insight.

Our biology has had to ruthlessly work within our biological/ecosystem energy envelope, and with the limited value/effort returned by a pre-internet pre-vast economy.

So biology has never been able to scale. Just get marginally more efficient and effective within tight limits.

Suddenly, (in historical, biological terms), energy availability limits have been removed, and limits on the value of work have compounded and continue to do so. Unsurprising that those changes suddenly unlock easily achieved vast untapped room for cognitive upscaling.


> These models [...] have a huge “working memory” relative to us. [This is] indicative that vastly smarter models are going to be achieved fairly easily, with new insight.

I don't think your second sentence logically follows from the first.

Relative to us, these models:

- Have a much larger working memory.

- Have much more limited logical reasoning skills.

To some extent, these models are able to use their superior working memories to compensate for their limited reasoning abilities. This can make them very useful tools! But there may well be a ceiling to how far that can go.

When you ask a model to "think about the problem step by step" to improve its reasoning, you are basically just giving it more opportunities to draw on its huge memory bank and try to put things together. But humans are able to reason with orders of magnitude less training data. And by the way, we are out of new training data to give the models.


> But humans are able to reason with orders of magnitude less training data.

Common belief, but false. You start learning from inside the womb. The data flow increases exponentially when you open your eyes and then again when you start manipulating things with your hands and mouth.

> When you ask a model to "think about the problem step by step" to improve its reasoning, you are basically just giving it more opportunities to draw on its huge memory bank and try to put things together.

We do the same with children. At least I did it to my classmates when they asked me for help. I'd give them a hint, and ask them to work it out step by step from there. It helped.


> Common belief, but false. You start learning from inside the womb. The data flow increases exponentially when you open your eyes and then again when you start manipulating things with your hands and mouth.

But you don't get data equal to the entire internet as a child!

> We do the same with children. At least I did it to my classmates when they asked me for help. I'd give them a hint, and ask them to work it out step by step from there. It helped.

And I do it with my students. I still think there's a difference in kind between when I listen to my students (or other adults) reason through a problem, and when I look at the output of an AI's reasoning, but I admittedly couldn't tell you what that is, so point taken. I still think the AI is relying far more heavily on its knowledge base.


There seems to be lots of mixed data points on this, but to some extent there is knowledge encoded into the evolutionary base state of the new human brain. Probably not directly as knowledge, but "primed" to quickly to establish relevant world models and certain types of reasoning with a small number of examples.


Your field of vision is equivalent to something like 500 Megapixels. And assume it’s uncompressed because it’s not like your eyeballs are doing H.264.

Given vision and the other senses, I’d argue that your average toddler has probably trained on more sensory information than the largest LLMs ever built long before they learn to talk.


There's an adaptation in there somewhere, though. Humans have a 'field of view' that constrains input data, and on the data processing side we have a 'center of focus' that generally rests wherever the eye rests (there's an additional layer where people learn to 'search' their vision by moving their mental center of focus without moving the physical focus point of the eye.

Then there's the whole slew of processes that pick up two or three key points of data and then fill in the rest (EX the moonwalking bear experiment [0]).

I guess all I'm saying is that raw input isn't the only piece of the puzzle. Maybe it is at the start before a kiddo _knows_ how to focus and filter info?

[0] https://www.youtube.com/watch?v=xNSgmm9FX2s


Attention is all you need. :)


You're an LLM, Harry!


> Have much more limited logical reasoning skills.

Relative to the best humans, perhaps, but I seriously doubt this is true in general. Most people I work with couldn’t reason nearly as well through the questions I use LLMs to answer.

It’s also worth keeping in mind that having a different approach to reasoning is not necessarily equivalent to a worse approach. Watch out for cherry-picking the cons of its approach and ignoring the pros.


> Relative to the best humans,

For some reason, the bar for AI is always against the best possible human, right now.


It seems that 90% of discussion about AI boils down to people who feel threatened by it in some way, and are lashing out in irrational ways as a result. (Source for 90% figure: Sturgeon's Law.)


But doesn't this also apply to the other side of the argument? People are invested in AI either professionally or financially just emotionally because they want it to make their lives better, and so they loose sight of AI's flaws.

I don't know who is right—which IMHO what makes this topic interesting.


1. X could happen.

2. I would hate if X happened.

3. Therefore X is not possible.


> And by the way, we are out of new training data to give the models.

Only easily accessible text data. We haven't really started using video at scale yet for example. It looks like data for specific tasks goes really far too ... for example agentic coding interactions aren't something that has generally been captured on the internet. But capturing interactions with coding agents, in combination with the base-training of existing programming knowledge already captured is resulting in significant performance increases. The amount of specicialed data we might need to gather or synthetically generate is perhaps orders of magnitude less that presumed with pure supervised learning systems. And for other applications like industrial automation or robotics we've barely started capturing all the sensor data that lives in those systems.


My response completely acknowledged their current reasoning limits.

But in evolutionary time frames, clearly those limits are lifting extraordinarily quickly. By many orders of magnitude.

And the point I made, that our limits were imposed by harsh biological energy and reward limits, vs. todays models (and their successors) which have access to relatively unlimited energy, and via sharing value with unlimited customers, unlimited rewards, stands.

It is a much simpler problem to improve digital cognition in a global ecosystem of energy production, instant communication and global application, than it was for evolution to improve an individual animals cognition in the limited resources of local habitats and their inefficient communication of advances.


Not to mention, Language Modeling is Compression https://arxiv.org/pdf/2309.10668

So text wikipedia at 24G would easily hit 8G with many standard forms of compression, I'd think. If not better. And it would be 100% accurate, full text and data. Far more usable.

It's so easy for people to not realise how massive 8GB really is, in terms of text. Especially if you use ascii instead of UTF.


The 24G is the compressed number.

They host a pretty decent article here: https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia

The relevant bit:

> As of 16 October 2024, the size of the current version including all articles compressed is about 24.05 GB without media.


Nice link, thanks.

Well I'll fallback position, and say one is lossy, the other not.


A neat project you (and others) might want to check out: https://kiwix.org/

Lots of various sources that you can download locally to have available offline. They're even providing some pre-loaded devices in areas where there may not be reliable or any internet access.


For reference (according to Google):

> The English Wikipedia, as of June 26, 2025, contains over 7 million articles and 63 million pages. The text content alone is approximately 156 GB, according to Wikipedia's statistics page. When including all revisions, the total size of the database is roughly 26 terabytes (26,455 GB)


better point of reference might be pages-articles-multistream.xml.bz2 (current pages without edit/revision history, no talk pages, no user pages) which is 20GB

https://en.wikipedia.org/wiki/Wikipedia:Database_download#Wh...?


this is a much more deserving and reliable candidate for any labels regarding the breadth of human knowledge.


it barely touches the surface


regarding depth, not breadth, certainly


Wikipedia itself describes its size as ~25GB without media [0]. And it's probably more accurate and with broader coverage in multiple languages compared to the LLM downloaded by the GP.

https://en.wikipedia.org/wiki/Wikipedia:Size_of_Wikipedia


Really? I'd assume that an LLM would deduplicate Wikipedia into something much smaller than 25GB. That's its only job.


> That's its only job.

The vast, vast majority of LLM knowledge is not found in Wikipedia. It is definitely not its only job.


When trained on next word prediction with the standard loss function, by definition it is it's only job.


What happens if you ask this 8gb model "Compose a realistic Wikipedia-style page on the Pokemon named Charizard"?

How close does it come?


8.1 GB is a lot!

It is 64,800,000,000 bits.

I can imagine 100 bits sure. And 1,000 bits why not. 10,000 you lose me. A million? That sounds like a lot. Now 64 million would be a number I can't well imagine. And this is a thousand times 64 million!


the study of language models from an information theory/compression POV is a small field but increasingly impt for efficiency/scaling - we did a discussion about this today https://www.youtube.com/watch?v=SWIKyLSUBIc&t=2269s


The Encyclopædia Britannica has about 40,000,000 words [1] or about 0.25 GB if you assume 6 bytes per word. It’s impressive but not outlandish that an 8.1 GB file could encode a large swath of human information.

[1]: https://en.wikipedia.org/wiki/Encyclopædia_Britannica


Intelligence is compression some say


Very much so!

The more and faster a “mind” can infer, the less it needs to store.

Think how much fewer facts a symbolic system that can perform calculus needs to store, vs. an algebraic, or just arithmetic system, to cover the same numerical problem solving space. Many orders of magnitude less.

The same goes for higher orders of reasoning. General or specific subject related.

And higher order reasoning vastly increases capabilities extending into new novel problem spaces.

I think model sizes may temporarily drop significantly, after every major architecture or training advance.

In the long run, “A circa 2025 maxed M3 Ultra Mac Studio is all you need!” (/h? /s? Time will tell.)


I don't know who else took notes by diffing their own assumptions with lectures / talks. There was a notion of what's really new compared to previous conceptual state, what adds new information.


Some say that. But what I value even more than compression is the ability to create new ideas which do not in any way exist in the set of all previously-conceived ideas.


I'm toying with the phrase "precedented originality" as a way to describe the optimal division of labor when I work with Opus 4 running hot (which is the first one where I consistently come out ahead by using it). That model at full flog seems to be very close to the asymptote for the LLM paradigm on coding: they've really pulled out all the stops (the temperature is so high it makes trivial typographical errors, it will discuss just about anything, it will churn for 10, 20, 30 seconds to first token via API).

Its good enough that it has changed my mind about the fundamental utility of LLMs for coding in non-Javascript complexity regimes.

But its still not an expert programmer, not by a million miles, there is no way I could delegate my job to it (and keep my job). So there's some interesting boundary that's different than I used to think.

I think its in the vicinity of "how much precedent exists for this thought or idea or approach". The things I bring to the table in that setting have precedent too, but much more tenuously connected to like one clear precedent on e.g. GitHub, because if the thing I need was on GitHub I would download it.


How well does that apply to robotics or animal intelligence? Manipulating the real world is more fundamental to human intelligence than compressing text.


Under the predictive coding model (and I'm sure some others), animal intelligence is also compression. The idea is that the early layers of the brain minimize how surprising incoming sensory signals are, so the later layers only have to work with truly entropic signal. But it has non-compression-based intelligence within those more abstract layers.


I just wonder if neuroscientists use that kind of model.


I doubt there's any consensus on one model, but it's certainly true that many neuroscientists are using the predictive coding at least some of the time

https://scholar.google.com/scholar?hl=en&as_sdt=0%2C36&q=pre...


Crystallized intelligence is. I am not sure about fluid intelligence.


Fluid intelligence is just how quickly you acquire crystallized intelligence.

It's the first derivative.


Talking about that, people designed a memory game, dual n back, which allegedly improve fluid intelligence.


I don't know why, but I was reminded of Douglas Hofstadter's talk: Analogy is cognition: https://www.youtube.com/watch?v=n8m7lFQ3njk&t=964s.


Back in the '90s, we joked about putting “the internet” on a floppy disk. It’s kind of possible now.


Yeah, those guys managed to steal the internet.


How does this compare to, say, the compression ratio of a lossless 8K video and a 240p Youtube stream of the same video?


I will never tire of pointing out that machine learning models are compression algorithms, not compressed data.


I kinda made an argument the other day that they are high-dimensional lossy decompression algorithms, which might be the same difference but looking the other way through the lens.


ML algorithms are compression algorithms, the trained models are compressed data.


they're an upgraded version of self-executable zip files that compresses knowledge like mp3 compresses music, without knowing exactly wtf are either music nor knowledge

the self-execution is the interactive chat interface.

wikipedia gets "trained" (compiled+compressed+lossy) into an executable you can chat with, you can pass this through another pretrained A.I. than can talk out the text or transcribe it.

I think writing compilers is now an officially a defunct skill of historical and conservation purposes more than anything else; but I don't like saying "conservation", it's a bad framing, I rather say "legacy connectivity" which is a form of continuity or backwards compatibility


It is truly incredible.

One factor, is the huge redundancies pervasive in our communication.

(1) There are so many ways to say the same thing, that (2) we have to add even more words to be precise at all. Without a verbal indexing system we (3) spend many words just setting up context for what we really want to say. And finally, (4) we pervasively add a great deal of intentionally non-informational creative and novel variability, and mood inducing color, which all require even more redundancy to maintain reliable interpretation, in order to induce our minds to maintain attention.

Our minds are active resistors of plain information!

All four factors add so much redundancy, it’s probably fair to say most of our communication (by bits, characters, words, etc., may be 95%?, 98%? or more!) pure redundancy.

Another helpful compressor, is many facts are among a few “reasonably expected” alternative answers. So it takes just a little biasing information to encode the right option.

Finally, the way we reason seems to be highly common across everything that matters to us. Even though we have yet to identify and characterize this informal human logic. So once that is modeled, that itself must compress a lot of relations significantly.

Fuzzy Logic was a first approximation attempt at modeling human “logic”. But has not been very successful.

Models should eventually help us uncover that “human logic”, by analyzing how they model it. Doing so may let us create even more efficient architectures. Perhaps significantly more efficient, and even provide more direct non-gradient/data based “thinking” design.

Nevertheless, the level of compression is astounding!

We are far less complicated cognitive machines that we imagine! Scary, but inspiring too.

I personally believe that common PCs of today, maybe even high end smart phones circa 2025, will be large enough to run future super intelligence when we get it right, given internet access to look up information.

We have just begun to compress artificial minds.


Yea. Same for a 8gb stable diffusion image generator. Sure not the best quality. But there is so much information inside.


How big is Wikipedia text? Within 3X that size with 100% accuracy


Google AI response says this for compressed size of wikipedia:

"The English Wikipedia, when compressed, currently occupies approximately 24 GB of storage space without media files. This compressed size represents the current revisions of all articles, but excludes media files and previous revisions of pages, according to Wikipedia and Quora."

So 3x is correct but LLMs are lossy compression.


I've been doing the AI course on Brilliant lately, and it's mindblowing the techniques that they come up with to compress the data.


Same thing with image model. 4 Go stable diffusion model can draw and represent anything humanity know.


How about a full glass of wine? Filled to the brim.


I don't like the term "compression" used with transformers because it gives the wrong idea about how they function. Like that they are a search tool glued onto a .zip file, your prompts are just fancy search queries, and hallucinations are just bugs in the recall algo.

Although strictly speaking they have lots of information in a small package, they are F-tier compression algorithms because the loss is bad, unpredictable, and undetectable (i.e. a human has to check it). You would almost never use a transformer in place of any other compression algorithm for typical data compression uses.


A .zip is lossless compression. But we also have plenty of lossy compression algorithms. We've just never been able to use lossy compression on text.


>We've just never been able to use lossy compression on text.

...and we still can't. If your lawyer sent you your case files in the form of an LLM trained on those files, would you be comfortable with that? Where is the situation you would compress text with an LLM over a standard compression algo? (Other than to make an LLM).

Other lossy compression targets known superfluous information. MP3 removes sounds we can't really hear, and JPEG works by grouping uniform color pixels into single chunks of color.

LLM's kind of do their own thing, and the data you get back out of them is correct, incorrect, or dangerously incorrect (i.e. is plausible enough to be taken as correct), with no algorithmic way to discern which is which.

So while yes, they do compress data and you can measure it, the output of this "compression algorithm" puts in it the same family as a "randomly delete words and thesaurus long words into short words" compression algorithms. Which I don't think anyone would consider to compress their documents.


> If your lawyer sent you your case files in the form of an LLM trained on those files, would you be comfortable with that?

If the LLM-based compression method was well-understood and demonstrated to be reliable, I wouldn't oppose it on principle. If my lawyer didn't know what they were doing and threw together some ChatGPT document transfer system, of course I wouldn't trust it, but I also wouldn't trust my lawyer if they developed their own DCT-based lossy image compression algorithm.


> LLM's kind of do their own thing, and the data you get back out of them is correct, incorrect, or dangerously incorrect (i.e. is plausible enough to be taken as correct), with no algorithmic way to discern which is which.

Exactly like information from humans, then?


People summarize (compress) documents with LLMs all day. With legalese the application would be to summarize it in layman's terms, while retaining the original for legal purposes.


Yes, and we all know (ask teachers) how reliable those summaries are. They are randomly lossy, which makes them unsuitable for any serious work.

I'm not arguing that LLMs don't compress data, I am arguing that they are technically compression tools, but not colloquially compression tools, and the overlap they have with colloquial compression tools is almost zero.


At this moment LLMs are used for much of the serious work across the globe so perhaps you will need to readjust your line of thinking. There's nothing inherently better or more trustworthy to have a person compile some knowledge than, let's say, a computer algorithm in this case. I place my bets on the latter to have better output.


But lossy compression algorithms for e.g. movies and music are also non-deterministic.

I'm not making an argument about whether the compression is good or useful, just like I don't find 144p bitrate starved videos particularly useful. But it doesn't seem so unlike other types of compression to me.


> They are randomly lossy, which makes them unsuitable for any serious work.

Ask ten people and they'll give ten different summaries. Are humans unsuitable too?


Yes, which is why we write things down, and when those archives become too big we use lossless compression on them, because we cannot tolerate a compression tool that drops the street address of a customer or even worse, hallucinates a slightly different one.


SMS codes are kind of a lossy text-compression.


There is an excellent talk by Jack Rae called “compression for AGI”, where he shows (what I believe to be) a little known connection between transformers and compression;

In one view, you can view LLMs as SOTA lossless compression algorithms, where the number of weights don’t count towards the description length. Sounds crazy but it’s true.


his talk here https://www.youtube.com/watch?v=dO4TPJkeaaU

and his last before departing for Meta Superintelligence https://www.youtube.com/live/U-fMsbY-kHY?si=_giVEZEF2NH3lgxI...


A transformer that doesn't hallucinate (or knows what is a hallucination) would be the ultimate compression algorithm. But right now that isn't a solved problem, and it leaves the output of LLMs too untrustworthy to use over what are colloquially known as compression algorithms.


It is still task related.

Compressing a comprehensive command line reference via model might introduce errors and drop some options.

But for many people, especially new users, referencing commands, and getting examples, via a model would delivers many times the value.

Lossy vs. lossless are fundamentally different, but so are use cases.


NotebookLM audio overviews/podcasts have been an absolute boon for my homeschooled kids. They devour audiobooks and podcasts, and they love learning by listening to these first. Then when we come together for class, we discuss what was covered, and can spend time diving into specifics or doing activities based on the content. It’s super nice to have another option for a learning medium here.

To generate them, we’ve scanned the physical book pages, and then with a simple Python script fed the images into GCP’s Document AI to extract the text en-masse, and concatenated the results together into a text-only version of the chapter. Give that text to NotebookLM and run with it.


I've used them. They're very nifty. Google did good here.

One thing I'll note is they only cover the "high level" aspects. No depth. I'd recommend them for someone who is either already very knowledgeable or for someone not at all knowledgeable who is looking for an overview before they plan to do deeper learning/studying through reading.


> or for someone not at all knowledgeable who is looking for an overview before they plan to do deeper learning/studying through reading

Yep. This is what I have used them (sparingly) for — a scaffold to build the deeper learning onto. My brain struggles to retain information when it doesn’t have a high-level understanding of how/why a system works and how individual parts connect and interact, even if it is all eventually revealed later.


Very well said.


Why not simply upload the pdf version of the scanned book or document? Extracting the text out of a scanned document via GCP Document AI API sounds like unnecessary use of resources


I was running into context window issues doing this. I could have gone in and split up the scanned book into chapters or something to get around this, and did that for a couple of subjects. But it wasn't too much work (and literally cost me pennies, like six of them) to get the pure text extract, and it's pretty easy to work with now. (Besides, which random dev doesn't love a little side challenge to explore new APIs at home every now and then? ;) )


I hope you encourage your kids to actually read as well.


Oh don’t worry, they make excellent use of their library cards. :)


> for my homeschooled kids

Learning requires making mistakes. Kids need to learn social skills in low-stakes environments. School is the best environment for this. When a person misses this part of their childhood education, they may struggle to learn these skills later in life.


It sounds like you may be speaking from experience, and if so, I respect that.

My kids have done both public schooling and now homeschooling. For a variety of personal reasons, public schooling was not going to be an option for a couple of them, so we're trying this out now and it has been successful. We are tightly integrated into a very active church group, and they have lots of social interactions on a regular basis there, as well as opportunities with other homeschooled kids around town.

It's definitely a balance, and there's no one silver bullet on either side of the fence, but the best any of us can do is actively strive for giving each child the best and most appropriate experiences for them.


The ability to recognize sociopaths and manipulation is an important life skill which may not be obtainable at your church activities or with trusted families. People without these skills may be manipulated in the workplace and suffer avoidable career setbacks, stress, and attending health problems.



The challenge till this is widely supported (caniuse.com currently pegs it at 46% globally [1]) will be using this as a progressive enhancement that does not provide a worse or unusable experience for users with browsers not supporting it yet.

In other words, don’t include critical information or functionality in the new styling that isn’t available in the underlying plain select element! But such is always a good practice anyway.

Very nice to see this taking shape though! Should be a huge improvement over the div monster that custom select box replacements often are. :)

[1] https://caniuse.com/mdn-css_properties_appearance_base-selec...


I agree that this is a huge improvement, but it's also over a decade late IMO. This should've been accomplished well before now, especially given that the issue has been there since the beginning.


Because frontend is frontend, Javascript frameworks dominated the conversation even for silly things like basic web forms for the past 15 years. Basic HTML/CSS is now catching up to the fact that not everyone wants to run a Javascript monstrosity for custom styling on very basic tasks.


The prevalence of JS and JS backed components is due to the reluctance of browser vendors to introduce new HTML elements that everyone has been lobbying for in the same time period.

By and large browser vendors for the longest time, even today still in many respects, repeatedly ignore pleas for more elements that cover common use cases.

Even when they do arrive, they can be half baked - like dialog or details / summary - and that doesn’t help matters


> Even when they do arrive, they can be half baked - like dialog or details / summary - and that doesn’t help matters

How are those half-baked? No smooth transition for details/summary, maybe?

Dialog seems to work well enough with little to no javascript required:

    <dialog>
        <h3>Warning:</h3>
        ...
        <button onclick='this.closest("dialog").close()'>Dismiss</button>
    </dialog>
My personal bugbear is the date/time input - FF doesn't even show a click element for time, you have to type in the time.


There's some quirks with the API around open vs openModal if you aren't aware of the accessibility implications you may not even realize this is the case.

Forms have some special quirks inside of a dialog.

The biggest thing though, is for the life of me I don't understand why you can't open and close a dialog without JavaScript. There's no way to do it.


> The biggest thing though, is for the life of me I don't understand why you can't open and close a dialog without JavaScript. There's no way to do it.

You can use popovers like this without JavaScript:

    <button popovertarget="some-element" popovertargetaction="show">Open</button>
    
    <div id="some-element" popover="auto">
        <button popovertarget="some-element" popovertargetaction="hide">Close</button>
    </div>

You can mark a <dialog> element as open by default with the `open` attribute, and you can close it with a button using the `dialog` form method. No JavaScript required for that either.

I don’t think there’s any way at present to open a `<dialog>` element specifically without JavaScript, but command/commandfor handles this and was recently added to the HTML specification:

https://github.com/whatwg/html/pull/9841


I'm with you... would be nice to have:

   <button type="open-dialog" target="dialogId">Open Dialog</button>
   ...
   <dialog id="dialogId">
     <button type="close-dialog">Close Dialog</button>
   </dialog>
It would just make so much sense.


> I'm with you... would be nice to have:

You can do this right now already:

    // For  opening
    <button onclick='document.querySelector("#dialogId").showModal()'>Open</button>

    // For closing
    <button onclick='this.closest("dialog").close()'>Close</button>
I think the problem here is that it is impossible to actually use the result from `close()`, as it can return a status, IIRC.

> It would just make so much sense.

That way above that I propose is about as sensible as the way you propose. If there are any problems with my proposal that are solved by your proposal, I'd love to hear it.


The point was to be able to do it without JS.


I understand, but from a pragmatic viewpoint, it is no more and no less complicated to do it with `onclick` JS than it would be to do it with some other attribute.

For all practical purposes, there is no difference between the two.

I understand that the `onclick` wouldn't fire in browsers where JS is turned off, but in that case (no JS) you're going to have an awful user experience using dialogs even with builtin open/close attributes for dialogs.



If you need to worry about ADA compliance (I always do, but not always paid to), this can be difficult to address.

I remember Opera, before it was bought, had the best support for html5 elements and date/time inputs were the best (along with almost all others). Sad to think that a leader in this area was sold off, and not to a buyer that cared about usability and the web the way Opera did.


You can't make enough HTML elements to make everyone happy. This was the wrong idea from the start.


The solution isn't to make a new html element for each use case.

The solution is to make html elements extensible.

With the `appearance: base-select` css rule, we now have a standardized way to extend <select> with html and css, (and we have the potential to declaratively extend intractability too with invoking commands without needing to reach for JS)


> The solution is to make html elements extensible.

The solution would be to make Apple ship this:

https://developer.mozilla.org/en-US/docs/Web/HTML/Global_att...


I would argue you can make enough HTML elements to make enough people happy. 80% for the 80%.


Forms are hard because they are the most stateful and most ubiquitous UI component that everyone comes in contact with. HTML has a few barebones tools to help you out, but they aren't good enough for the best user experience if you really wanted to polish a form like show errors exactly when you want to show them and only allow submission exactly when you want to allow it and represent error states in much better ways.

It's especially obvious once you're making forms outside of HTML and you realize that it's not any simpler with any less UX consideration. The only thing that changed was the language you're writing that polish in.


Over TWO decades late.

It is crazy to think what we only just have in anything non JavaScript in the past 20 years.


It's like how border-radius was added after rounded corners via images fell out of fashion.


I somehow feel Safari drags its feet on basic things platform improvements because they want to focus on iOS apps instead.


The narrative Safari is behind and Apple doesn’t care about the web is so tired…

This 8,000+ word article on Safari 18.4 (released today, BTW) doesn’t read like an organization that doesn’t care about the web [1].

[1]: https://webkit.org/blog/16574/webkit-features-in-safari-18-4...


The pace of improvement in the web is slow.

I don’t want to debate if a mega corp cares or not, but compared to how UI frameworks were in the 90s, the developer experience is anemic.

I don’t know if HTML, CSS and JavaScript are the best path forward. Maybe they are doing exactly what their spec set out to do.

But we need something better that doesn’t leash one to an ecosystem that takes 30% of your revenue.


Eh; the standard is two weeks old. Written by someone working for Apple by the way.


I’m wondering where in my comment it sounded like I wanted this standard specially to be implemented yesterday?


This is true, and I try not to get eager about new browser improvements for this reason. But look at the porgress over time in browser abilities, it's astounding.

The days are long but the years are short.


The perpetual 5 year problem of web development. I wish there was a way to do forward standards


> But such is always a good practice anyway.

One more reminder to develop for people who may not perceive color and shape as you do. If you're hiding critical information in your menu styles, that information is presumably inaccessible to people who are using a screen reader.


You'll probably still keep a <div>-based control in your page, and selectively hide the <select> based one or this one, or generate different HTML for different browsers if you can do that.


This might be a bad take, but I think developers should also consider exactly which users are using their app. If it’s the entire internet, then absolutely you need to consider backwards compatibility. If it’s an internal app, then consider not caring and using new APIs.


Very cool! I was just looking for something like this last night to generate quizzes for my middle- and grade-school sure kids as supplement to their normal work. So far I’ve settled on NotebookLM.

One thing I’d love to see is some examples of what the generated quizzes look like/work like before I upload something. Right now, I have no idea what I’ll get before uploading something. Some examples or demos on the front page would be great.


That's a great suggestion! thanks for having a look. I just updated the landing page with some examples :D


Nice, great screenshots, thx!

"An unexpected error occurred. Please try again." Occurred after clicking the "Generate Questions" button. Console shows a 429 error. I'm sure your logging has picked it up, but just FYI.


thanks! it was just some rate limiting for the free version, I fixed that now


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: