It is more than the spam war...a lot of content is just no longer produced and e...

eugenekolo · on June 14, 2023

I can't believe that. There's still so many personal blogs by real people. They just don't show up in the first 3 pages unless you query for them very specifically.

Seems most queries result in something like 30% SEO spam sites, 30% quora, 30% reddit, 10% other.

Edit: I don't disagree that Discord/Youtube/Other closed gardens have taken open searchable data away, but it's not like there's now no authentic searchable data at all. Perhaps Google also needs to learn to search those closed gardens better.

bombcar · on June 14, 2023

Google flourished because it could find forums (and blogs) and mine those, but much of that content has disappeared into Facebook and Discord (and YouTube - we must not discount how many things that would have been easily parseable blogs are now buried in livestreams and videos).

drowsspa · on June 14, 2023

Discord is probably the worst of all. I'm not a gamer and I hate it so much that a lot of tech content is now locked behind private Discord channels. Even Facebook is more discoverable than that

thatguy0900 · on June 14, 2023

Even when you are already on discord, search and trying to read old conversations is awful on discord, because that's not at all what discord was made to do.

Tostino · on June 14, 2023

So I've been working on a side project to make a Youtube channel I watch have its content be more discoverable through text. I've had great results by scraping the Youtube transcription, and running that for a few passes through GPT 3.5 with some prompts to essentially act as an editor. The original transcription was often terrible in some spots. Just whole phrases or multiple words mistranscribed throughout. For almost all of them, GPT 3.5 was able to clean them up and restore the original meaning through understanding the context of the monolog and fixing obviously incorrect words or phrases.

I've watched through a sample of about 20 of the 3,000 videos I'm working through, and the corrected transcription really did an amazing job at restoring the original meaning from the spoken words that was hard to understand from the original machine transcription.

pneumonic · on June 14, 2023

That is exactly where LLMs are useful. (People thinking of them as "AI", meaning AGI is just so wrong. Writing legal briefs??) Using them to ex post facto adjust transcripts in order to make them available and searchable is great.

johnnyanmac · on June 15, 2023

>we must not discount how many things that would have been easily parseable blogs are now buried in livestreams and videos

on the flip side, would those blogs have been created at all if they weren't financially motivated by streaming/video to provide the content?

there's a lot of discussion here about internet commuities, but this comment brings to question why blogs started to die down to begin with. At least with reddit you get clout if you share stuff (useless clout, but sometimes you just want a pat on the back).

TheCapn · on June 14, 2023

Blogs are parallel to research papers in a sense. They're useless without peer review unless you're already intimately familiar with the source material and able to critically evaluate the contents.

So Blogs are more useful when they're aggregated through a site like Reddit, where users have already done the vetting on whether the linked page is valuable. Reddit comments are invaluable to pages by adding additional context. Noting when the content has become dated or inaccurate due to external changes, etc. Sites like Brian Kreb's blog are the exception as the author is well known and respected. But the general blogs? It takes time to earn that community respect.

Then beyond that, how often have you gone on the hunt for something obscure only to run across 3 or more blog pages which look entirely unique, but have the exact same article pasted to them? It isn't that the contents are bad/wrong/inaccurate, but rather who do you trust? How much effort are you going to put in to finding which blog was the original, written by the expert and which ones are bots copying the info?

johnnyanmac · on June 15, 2023

>where users have already done the vetting on whether the linked page is valuable.

and ironically enough, if you post your own blog on reddit to be critiqued, there's a good chance it is removed for "self promotion". Funny how that "vetting" works, huh? So you get back to "how do I make my blog discoverable so it can be peer reviewed" and we're at square 1 again.

>How much effort are you going to put in to finding which blog was the original, written by the expert and which ones are bots copying the info?

A lot if it's important. Because as is I already have to do that muckracking on reddit to see who is trying to understand or even read the article and who just wants to soapbox their tangential pet rant. tracing a source back is child's play in comparison.

simion314 · on June 14, 2023

For me YouTube is always on the top, instead of the text pages where I can read the answer in a few seconds Google pushes me their video platform, probably in the hope of making money. I am logged in so I do not understand how those geniuses working at Google would think that videos in a language I do not know might be more relevant then text content.

reaperducer · on June 14, 2023

For me YouTube is always on the top, instead of the text pages where I can read the answer in a few seconds Google pushes me their video platform, probably in the hope of making money.

To be fair, i have the same problem with Duck.

I wish i could backlist sites from my search results. YouTube and Pinterest are not helpful for the things i look for.

rodric · on June 14, 2023

How great is your wish? If you host your own instance of Whoogle, which gives Google search results, you can set one of the environment variables to block particular websites from search results.

desro · on June 14, 2023

You can blacklist domains and adjust the rank of sites in your results on Kagi.com

Not affiliated, just a very satisfied customer

acchow · on June 14, 2023

> Perhaps Google also needs to learn to search those closed gardens better.

There's a hard limit at what is allowed by robots.txt since Google abides by this standard.

dageshi · on June 14, 2023

The people who were effectively experts in subjects that used to run websites on their areas of expertise moved to youtube.

zelphirkalt · on June 14, 2023

A huge portion of those blogs are medium and other bad blogging providers, possibly with paywall.

dylan604 · on June 14, 2023

yeah, the GP really reads like it was regurgitating someone's notes that attended an internal Googs meeting on why they are ranking new higher as their mantra

reaperducer · on June 14, 2023

a lot of content is just no longer produced and exposed to the open web (think about how much content goes into tiktok, discord, etc and you will never get that into your search results)

I see this all the time when trying to find information about old computers. So many of the good vintage computing resources are locked in social services or mailing lists that the information never shows up in search engines.

It feels a lot like the days when information was balkanized between AOL, GEnie, CompuServe, American PeopleLink, Delphi, etc.

Search engines were supposed to fix that and make all the world's information discoverable. They didn't.

noknownsender · on June 14, 2023

I've experienced this a lot myself.

It's weird.

I used to be considered a Digital Native.

Now I feel like an outsider on the internet.

never_inline · on June 14, 2023

There certainly is content. Often the ones I could find two years ago but now cannot.

That's because the web is full of juvenile sub-normie content such as geeksforgeeks (if you consider programming topics for example). It shadows the very specific queries with highly SEO'd Juvenile stuff.