More

Sevii · 2026-02-17T07:16:14 1771312574

Are agents actually capable of answering why they did things? An LLM can review the previous context, add your question about why it did something, and then use next token prediction to generate an answer. But is that answer actually why the agent did what it did?

gas9S9zw3P9c · 2026-02-17T07:25:24 1771313124

It depends. If you have an LLM that uses reasoning the explanation for why decisions are made can often be found in the reasoning token output. So if the agent later has access to that context it could see why a decision was made.

Kubuxu · 2026-02-17T08:14:06 1771316046

Reasoning, in majority of cases, is pruned at each conversation turn.

DonHopkins · 2026-02-17T09:49:21 1771321761

The cursor-mirror skill and cursor_mirror.py script lets you search through and inschpekt all of your chat histories, all of the thinking bubbles and prompts, all of the context assembly, all of the tool and mcp calls and parameters, and analyze what it did, even after cursor has summarized and pruned and "forgotten" it -- it's all still there in the chat log and sqlite databases.

cursor-mirror skill and reverse engineered cursor schemas:

https://github.com/SimHacker/moollm/tree/main/skills/cursor-...

cursor_mirror.py:

https://github.com/SimHacker/moollm/blob/main/skills/cursor-...

  The German Toilet of AI

  "The structure of the toilet reflects how a culture examines itself." — Slavoj Zizek

  German toilets have a shelf. You can inspect what you've produced before flushing. French toilets rush everything away immediately. American toilets sit ambivalently between.

  cursor-mirror is the German toilet of AI.

  Most AI systems are French toilets — thoughts disappear instantly, no inspection possible. cursor-mirror provides hermeneutic self-examination: the ability to interpret and understand your own outputs.

  What context was assembled?
  What reasoning happened in thinking blocks?
  What tools were called and why?
  What files were read, written, modified?

  This matters for:

  Debugging — Why did it do that?
  Learning — What patterns work?
  Trust — Is this skill behaving as declared?
  Optimization — What's eating my tokens?

  See: Skill Ecosystem for how cursor-mirror enables skill curation.

----

https://news.ycombinator.com/item?id=23452607

According to Slavoj Žižek, Germans love Hermeneutic stool diagnostics:

https://www.youtube.com/watch?v=rzXPyCY7jbs

>Žižek on toilets. Slavoj Žižek during an architecture congress in Pamplona, Spain.

>The German toilets, the old kind -- now they are disappearing, but you still find them. It's the opposite. The hole is in front, so that when you produce excrement, they are displayed in the back, they don't disappear in water. This is the German ritual, you know? Use it every morning. Sniff, inspect your shits for traces of illness. It's high Hermeneutic. I think the original meaning of Hermeneutic may be this.

https://en.wikipedia.org/wiki/Hermeneutics

>Hermeneutics (/ˌhɜːrməˈnjuːtɪks/)[1] is the theory and methodology of interpretation, especially the interpretation of biblical texts, wisdom literature, and philosophical texts. Hermeneutics is more than interpretive principles or methods we resort to when immediate comprehension fails. Rather, hermeneutics is the art of understanding and of making oneself understood.

----

Here's an example cursor-mirror analysis of an experiment with 23 runs with four agents playing several turns of Fluxx per run (1 run = 1 completion call), 1045+ events, 731 tool calls, 24 files created, 32 images generated, 24 custom Fluxx cards created:

Cursor Mirror Analysis: Amsterdam Fluxx Championship -- Deep comprehensive scan of the entire FAFO tournament development:

amsterdam-flux CURSOR-MIRROR-ANALYSIS.md:

https://github.com/SimHacker/moollm/blob/main/skills/experim...

amsterdam-flux simulation runs:

https://github.com/SimHacker/moollm/tree/main/skills/experim...

mkesper · 2026-02-17T11:52:22 1771329142

Just an update re German toilets: No toilet set up in the last 30 years (I know of) uses a shelf anymore. This reduces water usage by about 50% per flush.

DonHopkins · 2026-02-17T12:26:30 1771331190

But then what do you have to talk about all day??!

kgeist · 2026-02-17T13:39:36 1771335576

LLMs often already "know" the answer starting from the first output token and then emulate "reasoning" so that it appeared as if it came to the conclusion through logic. There's a bunch of papers on this topic. At least it used to be the case a few months ago, not sure about the current SOTA models.

nrds · 2026-02-17T18:26:12 1771352772

Wait, that's not right, let me think through this more carefully...

bananapub · 2026-02-17T10:44:56 1771325096

of course not, but it can often give a plausible answer, and it's possible that answer will actually happen to be correct - not because it did any - or is capable of any - introspection, but because it's token outputs in response to the question might semi-coincidentally be a token input that changes the future outputs in the same way.

Onavo · 2026-02-17T10:10:35 1771323035

Well, the entire field of explainable AI has mostly thrown in the towel..

Sevii · 2026-02-16T02:00:30 1771207230

LLMs are continuously improving. So something that didn't work a year ago became possible in November. If you tried to build Openclaw in 2024 it wouldn't have worked. Openclaw isn't groundbreaking, but it is extremely on the edge of the LLM capability curve.

Sevii · 2026-02-11T19:48:00 1770839280

The industrial revolution was extremely hard on individual craftspeople. Jobs became lower paying and lower skilled. People were forced to move into cities. Conditions didn't improve for decades. If AI is anything comparable it's not going to get better in 5-10 years. It will be decades before the new 'jobs' come into place.

shimman · 2026-02-11T23:20:14 1770852014

Seriously, it took nearly ~150 years before the people actually benefited from the industrial revolution. Saying that we need to condemn two lifetimes worth of suffering to benefit literally a few thousand people out of billions is absolutely ludicrous.

wiseowise · 2026-02-11T23:24:39 1770852279

But think about corporate aristocracy and their children!

munksbeer · 2026-02-12T09:23:36 1770888216

This is basically not true. It's hard to debate this when we don't start from a position of truth.

shimman · 2026-02-12T14:38:55 1770907135

It pretty much is, unless you think it's totally cool to work in highly dangerous jobs that paid poorly while being treated like chattel slaves. There is a reason why the 1800s had the most violent labor actions in the US, it wasn't because they were treated "well."

Completely disingenuous, learn your labor history.

munksbeer · 2026-02-12T23:16:22 1770938182

People didn't feel the benefits for 150 years? Just absolute nonsense.

Sevii · 2026-02-11T05:56:20 1770789380

I think the AI sales orgs are just immature. It's hard to say this but Google's Gemini sales team might be more professional.

agcat · 2026-02-11T19:30:27 1770838227

What do you like about Gemini sales team?

Sevii · 2026-02-08T23:00:39 1770591639

AI isn’t good enough to do consulting yet.

Sevii · 2026-01-30T15:21:34 1769786494

What I don't get is, how are these free LLMs getting funded? Who is paying $20-100 million to create an open weights LLM? Long term why would they keep doing it?

dham · 2026-01-30T15:33:14 1769787194

I see what you're saying, but it doesn't matter that much in the long run. If everything stopped right now, the state-of-the-art open source models can still solve a lot of problems. They may never solve coding, per se, but they're good enough.

direwolf20 · 2026-01-30T15:51:19 1769788279

Billionaires trying to hurt each other. Facebook released LLaMa hoping to hasten OpenAI's bankruptcy.

LtWorf · 2026-01-30T16:57:16 1769792236

But it's not open, and in fact AFAIK it's not possible to use commercially.

direwolf20 · 2026-01-30T17:37:19 1769794639

It's possible, just not legal if they find out and you're worth suing.

LtWorf · 2026-01-31T10:14:51 1769854491

Thanks for the pointless correction!

Sevii · 2026-01-28T20:13:51 1769631231

It now takes 3 button presses to switch tabs in mobile safari. It used to take just two before glass.

Sevii · 2026-01-28T16:44:05 1769618645

volkk · 2026-01-28T16:49:40 1769618980

it's insane. people just don't want to use their brains to communicate anymore i guess. you've just experienced something traumatic like a layoff, and you can't even just take a few hours to internalize it and be vulnerable online, rather than jumping immediately onto social media to use the opportunity to sound like a market analyst

Sevii · 2026-01-28T16:42:25 1769618545

FAANG has been engaged in mass layoffs for two years now. How can you possibly make the claim that there is a surplus of people who can pass the interview loops? Obviously, there isn't because they are firing people who passed those loops.

drecked · 2026-01-28T17:44:05 1769622245

You’re ignoring the part where FAANG massively overhired in the years preceding.

Meta and Amazon doubled their headcount in the 2-3 years of the pandemic.

Others like Google increased by 60+%.

You’re also forgetting about this little thing popularly called AI that happened in the intervening years.

There may be an argument that H1B isn’t fit to purpose in a post AI world (although that argument is also false if we think software engineering will remain a viable job going forward, but that’s a different topic).

But it’s much harder to argue that H1B hurt US employers when thr industry they hired the majority of H1B employees in the first 2 decades of the 2000s, also saw some of the highest growth in jobs while simultaneously posting the highest growth in salaries (there may have been certain minor industries hiring a few thousand people, like Oceanographer that had a slightly higher increase, but even that was likely not true because BLS data doesn’t factor compensation in the form of stock options which disproportionally provided wealth for SW engineers relative to other workers).

johnnyanmac · 2026-01-28T20:31:05 1769632265

>You’re ignoring the part where FAANG massively overhired in the years preceding.

Yes, because overhiring is a lie generated to justify layoffs. I'd hope by year 3 that we'd see through this. If they "overhired", why is hiring still up globally while down in the US?

>You’re also forgetting about this little thing popularly called AI that happened in the intervening years.

What about it? Hiring numbers are still up. Its clearly not replacing workers as of now.

Sevii · 2026-01-26T15:23:58 1769441038

Vibecoding is great for open source. Open source is already dominated by strong solo programmers like antirez, linus, etc. People with very strong motivations to create software they see as necessary. Vibecoding makes creating open source projects easier. It makes it easier to get from an idea to "Hey guys check this out!" The only downside to open source is the fly by PRs vibecoding enables which are currently draining maintainer time.

tracker1 · 2026-01-26T16:55:27 1769446527

I think the solution to the latter is simply to maintain high standards in terms of structure and organization. I've always been a fan of KISS should override any other non-requirement of software. Any my non-requirement, I mean anything that is just subjective. Don't create complexity you don't actually need, or that doesn't make an outsized contribution to making other areas of the code easier to reason with.

Sometimes having dozens of one-off scripts is easier/simpler than trying to create the uber-flexibly one tool does all solution.

progx · 2026-01-26T16:48:08 1769446088

And make one PR after another, i can see how happy Linus & Co. would be of all the garbage features ;-)