More

causalmodels · 2026-01-24T00:00:17 1769212817

Yeah this has always seemed very silly. It is trivial to use claude code to reverse engineer itself.

mi_lk · 2026-01-24T00:20:42 1769214042

looks like it's trivial to you because I don't know how to

n2d4 · 2026-01-24T00:59:41 1769216381

If you're curious to play around with it, you can use Clancy [1] which intercepts the network traffic of AI agents. Quite useful for figuring out what's actually being sent to Anthropic.

[1] https://github.com/bazumo/clancy

fragmede · 2026-01-24T01:46:10 1769219170

If only there were some sort of artificial intelligence that could be asked about asking it to look at the minified source code of some application.

Sometimes prompt engineering is too ridiculous a term for me to believe there's anything to it, other times it does seem there is something to knowing how to ask the AI juuuust the right questions.

lsaferite · 2026-01-24T13:33:37 1769261617

Something I try to explain to people I'm getting up to speed on talking to an LLM is that specific word choices matter. Mostly it matters that you use the right jargon to orient the model. Sure, it's good and getting the semantics of what you said, but if you adjust and use the correct jargon the model gets closer faster. I also explain that they can learn the right jargon from the LLM and that sometimes it's better to start over once you've adjusted you vocabulary.

adastra22 · 2026-01-24T01:53:13 1769219593

That is against ToS and could get you banned.

Der_Einzige · 2026-01-24T01:57:27 1769219847

GenAI was built on an original sin of mass copyright infringement that Aaron Swartz could only have dreamed of. Those who live in glass houses shouldn't throw stones, and Anthropic may very well get screwed HARD in a lawsuit against them from someone they banned.

Unironically, the ToS of most of these AI companies should be, and hopefully is legally unenforceable.

adastra22 · 2026-01-24T02:35:02 1769222102

Are you volunteering? Look, people should be aware that bans are being handed out for this, lest they discover it the hard way.

If you want to make this your cause and incur the legal fees and lost productivity, be my guest.

mlrtime · 2026-01-24T11:55:27 1769255727

How would they know what you do on your own computer?

adastra22 · 2026-01-24T15:19:53 1769267993

Claude is run on their servers.

fragmede · 2026-01-24T02:59:57 1769223597

You're absolutely right! Hey Codex, Claude said you're not very good at reading obfuscated code. Can you tell me what this minified program does?

adastra22 · 2026-01-24T03:14:43 1769224483

I don't know what Codex's ToS are, but it would be against ToS to reverse engineer any agent with Claude.

chillfox · 2026-01-24T15:42:31 1769269351

Then use something like deepseek.

causalmodels · 2026-01-23T17:03:39 1769187819

It is fine to have criticisms of this, I have many, but saying that Yegge hasn't built real software is just not true.

anonymous908213 · 2026-01-23T17:04:38 1769187878

Yegge obviously built real software in the past. He has not built real software wherein he never looked at the code, as he is now promoting.

causalmodels · 2026-01-23T17:09:22 1769188162

Ok but this entire idea is very new. Its not an honest criticism to say no one has tried the new idea when they are actively doing it.

Honestly I don't get the hostility. Yegge is running an experiment. I don't think it will work, but it will be interesting and informative to watch.

anonymous908213 · 2026-01-23T17:19:06 1769188746

The 'experiment' isn't the issue. The problem is the entire culture around it. LLM tools are being shoved into everything, LLMs are soaking up trillions in investment, engineers are being told over and over that everything has changed and this garbage is making us obsolete, software quality is decreasing where wide LLM usage is being mandated (eg. Microsoft). Gas Town does not give the vibe of a neutral experiment but rather looks be a full-on delve into AI psychosis with the way Yegge describes it.

To be clear, I think LLMs are useful technology. But the degree of increasing insanity surrounding it is putting people off for obvious reasons.

causalmodels · 2026-01-23T18:44:48 1769193888

I share the frustration with the hype machine. I just don't think a guy with a blog is an appropriate target for our frustration with corporate hype culture.

direwolf20 · 2026-01-23T19:26:31 1769196391

The experiment is fine if you treat it as an experiment. The problem is the state of the industry where it's treated as serious rather than silly — possibly even by Steve himself.

WesolyKubeczek · 2026-01-23T17:50:42 1769190642

> Ok but this entire idea is very new. Its not an honest criticism to say no one has tried the new idea when they are actively doing it.

Not really new. Back in the day companies used to outsource their stuff to the lowest bidder agencies in proverbial Elbonia, never looked at the code, and then panickedly hired another agency when the things visibly were not what was ordered. Case studies are abound on TheDailyWTF for the last two decades.

Doing the same with agents will give you the same disastrous results for comparably the same money, just faster. Oh and you can't sue them, really.

Maybe it's better, who knows.

causalmodels · 2026-01-23T18:41:34 1769193694

Fair point on the Elbonia comparison. But we can't sue the SQLite maintainers either, and yet we trust them with basically everything. The reason is that open source developed its own trust mechanisms over decades. We don't have anything close to that with LLMs today. What those mechanisms might look like is an open question that is getting more important as AI generated code becomes more common.

WesolyKubeczek · 2026-01-23T19:50:49 1769197849

> But we can't sue the SQLite maintainers either, and yet we trust them with basically everything.

But you don’t pay them any money and don’t enter into contractual relationship with them either. Thus you can’t sue them. Well, you can try, of course, but.

You could sue an Elbonian company, though, for contract breach. LLMs are like usual Elbonian quality with two middlemen but quicker, and you only have yourself to blame when they inevitably produce a disaster.

swiftcoder · 2026-01-23T17:15:45 1769188545

> saying that Yegge hasn't built real software is just not true

I mean... I feel like it's somewhat telling that his wikipedia page spends half its words on his abrasive communication style, and the only thing approximating a product mentioned is a (lost) Rails-on-Javascript port, and 25 years spent developing a MUD on the side.

Certainly one doesn't get to stay a staff-level engineer at Google without writing code - but in terms of real, shipping software, Yegge's resume is a bit light for his tenure in BigTech

causalmodels · 2026-01-22T19:59:49 1769111989

Because my local is a laptop and doesn't have a GPU cluster or TPU pod attached to it.

5d41402abc4b · 2026-01-23T08:41:51 1769157711

If you have enough RAM, you can run Qwen A3B models on the CPU.

quikoa · 2026-01-23T13:37:49 1769175469

RAM got a little more expensive lately for some reason.

causalmodels · 2026-01-20T22:55:33 1768949733

Interesting direction but the 98.8% FPR in Table 1 seems like a dealbreaker. Anyone understand what's going on with the contradictory results between the text and tables?

dwattttt · 2026-01-20T23:10:24 1768950624

> Empirically, CTVP attains very good detection rates with reliable false positives

A novel use of the word "reliable"? Jokes aside, either they mean the FPR as the opposite of what you'd expect, the table is not representative of their approach, or they're just... really optimistic?

godelski · 2026-01-21T06:17:58 1768976278

  >  Anyone understand what's going on with the contradictory results between the text and tables?

Well Figure 1 would also disagree. It shows a FPR of 47.5%.

From Sec 3, end of second to last paragraph

  | The protocol is deterministic given fixed RNG seeds, caches model outputs

by program hash, and *bounds false positives via the chosen percentile and gap parameters.*

I believe this is a choice, though I think it is suspect that the FPR is pushed this high to get the TP results.

Disclaimer: I only gave this a very cursory skim so don't rely on me too much

causalmodels · 2026-01-17T09:18:18 1768641498

The article's Karim Khan example pretty deeply undercuts the thesis. Losing access to your bank account is the actual coercive power. Losing a Microsoft email is an inconvenience in comparison.

The real 'military bases' are banks.

a_humean · 2026-01-17T09:42:36 1768642956

If your business has everything on GCP/AWS/Azure (which is very common) and the Americans choose to weaponinse US tech against your country or business, then unless you have non-US backups you are probably dead and all of your employees unemployed. If you are a state, all of your services and functions are probably dead and you have to rebuild from nothing. That is certainly true of my company and there are some mutterings starting where I am internally about worst case disaster recovery if suddenly one of these suppliers just disapeared.

In this new world you cannot trust that this will not happen. As a European relying on the Americans is honestly probably little better than relying on the Russians and probably on par with relying on the Chinese in terms of risk profile. Note we are actually for all intents and purposes at war with Russia.

The amount of leverage the Americans have over Europe is insane, and every captial should be trying to mitgate that risk asap.

formerly_proven · 2026-01-17T09:51:43 1768643503

European companies largely do not recognize this as a risk because they consider B2B contracts with Microsoft or AWS essentially iron-clad.

zwaps · 2026-01-17T10:01:07 1768644067

This is because Europeans can not listen.

Microsoft executives under oath said that they will not be able to honor those contracts if there is pressure from the US administration. We should know this, but we keep forgetting: laws, contracts, courts etc always bow before political and military might. In peacetime, we delude ourselves into thinking it aint so.

The situation is now clear as day. What op stated is 100 percent correct.

The US will have successfully invaded an EU country by 2027.

They will, if it comes to this, immediately and successfully weaponize all three hyperscalers.

It is abundantly clear where thinks are going.

If any country, organization or company is not prepared for this by mid 2026, they are blind and deaf

surgical_fire · 2026-01-17T10:56:43 1768647403

You are correct. In Europe, governments and businesses should treat the US as a hostile foreign country, and relying on anything from there is a massive risk.

The only thing is that weaponizing the hyperscalers would also be disastrous for the hyperscalers. They would be liable to lose their assets in Europe, access to European markets, etc and so on. Which would as a consequence cause a tangible harm to the US economy itself.

Not that in Europe we should rely on it for anything. Any business is wise to move away from any sort of dependency that is subject to US pressure. Governments in particular should consider it a matter of life and death.

Roark66 · 2026-01-17T11:55:25 1768650925

We (Polish) have been raising an alarm about Russia since the first Chechen war and it took additional dozen+ years and a land invasion of a European country before countries in Western EU woke up.

Do you think they are going to be quicker reacting to danger from the other side?

I highly doubt it. EU is like a huge steam ship. It takes a lot of effort to turn it. But once it gets going good luck stopping it. This will have consequences for the EU-Us relations for the rest of this century.

I fact it is exactly what a Russian agent if he managed to become a president of US would do. A Putin's wet dream basically. Be hostile enough towards Russia to preserve appearances - seize a tanker or two, while undermining long term US and EU interests (the interests of these two are naturally aligned very well, it takes much more than an idiot to drive a wedge between them).

surgical_fire · 2026-01-17T13:26:14 1768656374

The thing is that the EU is a complex structure. The interests of countries such as Poland, Italy, Germany and Ireland differ wildly, which is why things are so slow to maneuver, politically speaking.

I always considered that the over reliance on US a weakness. It was comfortable because it postpones some difficult discussions (for example, in terms of defense and military spending it is completely bonkers for the EU to not act as a federal entity). Since this subject is thorny, it was alright to rely on the US for defense and just kick this can down the road.

The US becoming hostile at least forces the countries in the EU to face reality a little, and perhaps speed some things up (see for example the recent EU-Mercosur trade agreement).

Eddy_Viscosity2 · 2026-01-17T13:41:40 1768657300

The other factor is that both Russia and the US have people 'on the inside' in the EU governments. They bought them. They own them and they do what they are told.

Tepix · 2026-01-17T10:07:42 1768644462

This used to be true but it's rapidly changing.

epolanski · 2026-01-17T10:17:50 1768645070

My clients are slowly but surely migrating to European cloud vendors, Scaleway especially.

general1465 · 2026-01-17T10:46:57 1768646817

> If your business has everything on GCP/AWS/Azure (which is very common) and the Americans choose to weaponinse US tech against your country or business,

these companies have datacenters in Europe too. It is not wild to think that if push comes to shove and US cut off Europe, then Europeans can just take control over those European data centers and restore access to GCP/AWS/Azure in Europe because these datacenters are on their soil and predominantly employing Europeans.

Roark66 · 2026-01-17T11:46:26 1768650386

>. It is not wild to think that if push comes to shove and US cut off Europe, then Europeans can just take control over those European data centers and restore access to GCP/AWS/Azure in Europe because these datacenters are on their soil and predominantly employing Europeans.

Good luck with that. Those systems are extremely interconnected. We should (and are) be building sovereign EU equivalents to not just cloud providers but also major services like google/ms 365 and so on.

Woodi · 2026-01-17T16:45:08 1768668308

> Google/ms/aws...

Meh.

EU need to start with own PC hardware factories first. And PC compatible designs. What is unlikely - on first sight of troubles they will buy everything from US. As all good 3rd Word countries do.

jsiepkes · 2026-01-17T09:27:42 1768642062

There are plenty banks owned and operated within the EU. One bank folded for US pressure but when push comes to shove the EU can force banks in the EU to uphold EU rules and regulations.

That's not the case for digital infrastructure like Google Workspace, Google cloud, Office 365, AWS, etc.

wickedsight · 2026-01-17T09:38:08 1768642688

> when push comes to shove the EU can force banks in the EU to uphold EU rules and regulations.

This made me realize that many people who are extremely critical of the power the EU has, have no idea how much that power is often protecting them.

This is not a dismissal of the fact that it's absolutely critical to stay vigilant about how that power is used. But it's quite clear that without that power, the US would've abused theirs way more within Europe.

phi0 · 2026-01-17T10:22:44 1768645364

When the US sanctioned Hong Kong’s Chief Executive in 2020, because of a law allowing extradition to China, no single bank was letting her open an account, including Chinese ones. She was receiving her salary fully in cash.

The EU compelling banks to do business despite US sanctions seems pretty unlikely even if relations continue to degrade.

tomjen3 · 2026-01-17T10:32:28 1768645948

AWS, etc has datacenters in the EU.

Microsoft relies on the EUs courts to recognise their property rights.

Qwertious · 2026-01-17T10:01:52 1768644112

>Losing a Microsoft email is an inconvenience in comparison.

Losing access to data is potentially worse than losing access to your bank account. I doubt Microsoft will let you grab a copy of all your emails after they block/ban you.

epolanski · 2026-01-17T10:16:43 1768645003

You may have tied your services, e.g. your digital bank account to your email.

This is a very major inconvenience.

causalmodels · 2025-12-11T23:25:51 1765495551

Brand name pharmaceuticals are sort of a different thing. Brand names must comply with the naming guidelines of the FDA, European Medicines Agency, and HealthCanda simultaneously. In practice, this makes it tricky to use actual words. So my companies adopt an 'empty vessel' naming approach. The empty vessels are nonsense words that (1) invoke an emotion (wegovy is a good example), (2) can be trademarked, and (3) it can survive brand pressure.

causalmodels · 2025-12-11T20:27:14 1765484834

I just asked it and it said that it uses the on device TTS capabilities.

furyofantares · 2025-12-11T21:08:37 1765487317

I find it very unlikely that it would be trained on that information or that anthropic would put that in its context window, so it's very likely that it just made that answer up.

causalmodels · 2025-12-11T21:28:34 1765488514

No, it did not make it up. I was curious so I asked it asked it to imitate a posh British accent imitating a South Brooklyn accent while having a head cold and it explained that it didn't have have fine grained control over the audio output because it was using a TTS. I asked it how it knew that and it pointed me towards [1] and highlighted the following.

> As of May 29th, 2025, we have added ElevenLabs, which supports text to speech functionality in Claude for Work mobile apps.

Tracked down the original source [2] and looked for additional updates but couldn't find anything.

[1] https://simonwillison.net/2025/May/31/using-voice-mode-on-cl...

[2] https://trust.anthropic.com/updates

furyofantares · 2025-12-11T21:37:54 1765489074

If it does a web search that's fine, I assumed it hadn't since you hadn't linked to anything.

Also it being right doesn't mean it didn't just make up the answer.

causalmodels · 2025-11-19T06:01:58 1763532118

The dashed lines on top of the data points and labels is making me wince

causalmodels · 2025-10-21T17:43:12 1761068592

The ads you're going to need to worry about are not going to be shown on webpages.

bsparker · 2025-10-21T17:57:37 1761069457

Are you implying that they are going to be inside of the chat response

zukzuk · 2025-10-21T18:03:47 1761069827

They are going to be the chat response.

wahnfrieden · 2025-10-21T18:09:18 1761070158

Yes, they are hiring for it. They want you to use their own apps instead of a web browser so that blocking tech cannot be created for it.

https://sandstormdigital.com/2025/10/16/openai-is-building-i...

https://www.contentgrip.com/openai-internal-ad-infrastructur...

hbn · 2025-10-21T19:53:27 1761076407

If the ads are just brought in as a stream of text from the same endpoint that's streaming you the response you're wanting, how can that be blocked in the browser anyway?

Another local LLM extension that reads the output and determines if part of it is too "ad-ey" so it can hide that part?

wahnfrieden · 2025-10-21T20:30:04 1761078604

It will depend on how they implement the sponsored content. If there are regulations that require marking it as sponsored, that makes it easy to block. If not, then sure maybe via LLMs.

causalmodels · 2025-10-02T20:26:55 1759436815

These numbers aren't that crazy when contextualized with the capex spend. One hundred million is nothing compared to a six hundred billion dollar data center buildout.

Besides, people are actively being trained up. Some labs are just extending offers to people who score very highly on their conscription IQ tests.