More

andy99 · 2026-01-24T19:14:45 1769282085

This has almost never been my experience in ~20 years of working. Other than a few fleeting assholes, all of my work relationships have essentially been collegial, with all parties, regardless of position, looking at how we can best get the work done that’s in front of us. I’ve never felt exploited or used and never felt I was exploiting those I managed.

I think if one sees their work this way, maybe it comes true? It’s a very cynical way of looking at things.

catlifeonmars · 2026-01-24T19:22:39 1769282559

You’ve never worked in retail or service industries, have you?

andy99 · 2026-01-24T19:38:32 1769283512

I have, and your reply is a pretty weak fallback from

  The relationship between owners and workers has always been extractive. The adversarial relationship is built in.

I’m sure extractive relationships exist, but it’s certainly not an iron law of work, and I’m not even sure it’s that common in most modern workplaces.

catlifeonmars · 2026-01-24T20:03:34 1769285014

Why is this a weak fallback?

Since we’re speaking anecdotally: I’ve also worked in service industry, and I have personally observed employers/managers abusing their power to elevate themselves at the expense of their employees. Does that make you reconsider? I would hazard to guess it doesn’t.

My point being that anecdotal evidence isn’t particularly useful.

> I’m not even sure it’s that common in most modern workplaces.

I don’t know what to tell you honestly. This is an incredibly naive take

Edit:

I feel I’ve been uncharitable in responding to you. I think we are likely talking past each other and what an “extractive” relationship is. I don’t think people are malicious. Most people (IMO) are essentially good and maliciousness is relatively rare. That said, if you work for an employer, you will always be resisting pressure from above to do more work for less pay. Maybe you’re lucky and you have an excellent middle manager (I have had some) who are skilled at preventing shit from rolling downhill. The fact remains the pressure exists and eventually, someone breaks. Maybe they have a bad day, or fall into financial distress, or the economy sucks. It doesn’t really matter. The people who pay the highest cost are the people at closer to the bottom of that hierarchy.

andy99 · 2026-01-23T16:19:05 1769185145

> Issues and discussions can use AI assistance but must have a full human-in-the-loop. This means that any content generated with AI must have been reviewed and edited by a human before submission.

I can see this being a problem. I read a thread here a few weeks ago where someone was called out on submitting an AI slop article they wrote with all the usual tells. They finally admitted it but said something to the effect they reviewed it and stood behind every line.

The problem with AI writing is at least some people appear incapable of critically reviewing it. Writing something yourself eliminates this problem because it forces you to pick your words (there could be other problems of course).

So the AI-blind will still submit slop under the policy but believe themselves to have reviewed it and “stand behind” it.

andy99 · 2026-01-22T16:11:56 1769098316

   For example, authors may have given an LLM a partial description of a citation and asked the LLM to produce bibtex

This is equivalent to a typo. I’d like to know which “hallucinations” are completely made up, and which have a corresponding paper but contain some error in how it’s cited. The latter I don’t think matters.

burkaman · 2026-01-22T16:39:57 1769099997

If you click on the article you can see a full list of the hallucinations they found. They did put in the effort to look for plausible partial matches, but most of them are some variation of "No author or title match. Doesn't exist in publication."

Here's a random one I picked as an example.

Paper: https://openreview.net/pdf?id=IiEtQPGVyV

Reference: Asma Issa, George Mohler, and John Johnson. Paraphrase identification using deep contextual- ized representations. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 517–526, 2018.

Asma Issa and John Johnson don't appear to exist. George Mohler does, but it doesn't look like he works in this area (https://www.georgemohler.com/). No paper with that title exists. There are some with sort of similar titles (https://arxiv.org/html/2212.06933v2 for example), but none that really make sense as a citation in this context. EMNLP 2018 exists (https://aclanthology.org/D18-1.pdf), but that page range is not a single paper. There are papers in there that contain the phrases "paraphrase identification" and "deep contextualized representations", so you can see how an LLM might have come up with this title.

gold23 · 2026-01-22T22:51:52 1769122312

It's not the equivalent of a typo. A typo would be immediately apparent to the reader. This is a semantic error that is much less likely to be caught by the reader.

andy99 · 2026-01-21T00:26:18 1768955178

Nature abhors a vacuum

andy99 · 2026-01-20T21:24:07 1768944247

What does that look like in your opinion, what do you use?

andy99 · 2026-01-20T21:05:41 1768943141

Depends on what you’re doing. Using the smaller / cheaper LLMs will generally make it way more fragile. The article appears to focus on creating a benchmark dataset with real examples. For lots of applications, especially if you’re worried about people messing with it, about weird behavior on edge cases, about stability, you’d have to do a bunch of robustness testing as well, and bigger models will be better.

Another big problem is it’s hard to set objectives is many cases, and for example maybe your customer service chat still passes but comes across worse for a smaller model.

Id be careful is all.

candiddevmike · 2026-01-20T21:09:01 1768943341

One point in favor of smaller/self-hosted LLMs: more consistent performance, and you control your upgrade cadence, not the model providers.

I'd push everyone to self-host models (even if it's on a shared compute arrangement), as no enterprise I've worked with is prepared for the churn of keeping up with the hosted model release/deprecation cadence.

blharr · 2026-01-21T20:01:12 1769025672

Where can I find information on self-hosting models success stories? All of it seems like throwing tens of thousands away on compute for it to work worse than the standard providers. The self-hosted models seem to get out of date, too. Or there ends up being good reasons (improved performance) to replace them

andy99 · 2026-01-20T21:22:59 1768944179

How much you value control is one part of the optimization problem. Obviously self hosting gives you more but it costs more, and re evals, I trust GPT, Gemini, and Claude a lot more than some smaller thing I self host, and would end up wanting to do way more evals if I self hosted a smaller model.

(Potentially interesting aside: I’d say I trust new GLM models similarly to the big 3, but they’re too big for most people to self host)

jmathai · 2026-01-20T21:30:18 1768944618

You may also be getting a worse result for higher cost.

For a medical use case, we tested multiple Anthropic and OpenAI models as well as MedGemma. Pleasantly surprised when the LLM as Judge scored gpt5-mini as the clear winner. I don't think I would have considered using it for the specific use cases - assuming higher reasoning was necessary.

Still waiting on human evaluation to confirm the LLM Judge was correct.

lorey · 2026-01-20T22:16:54 1768947414

That's interesting. Similarly, we found out that for very simple tasks the older Haiku models are interesting as they're cheaper than the latest Haiku models and often perform equally well.

andy99 · 2026-01-20T21:52:19 1768945939

You obviously know what you’re looking for better than me, but personally I’d want to see a narrative that made sense before accepting that a smaller model somehow just performs better, even if the benchmarks say so. There may be such an explanation, it feels very dicey without one.

vercaemert · 2026-01-23T10:38:23 1769164703

You just need a robust benchmark. As long as you understand your benchmark, you can trust the results.

We have a hard OCR problem.

It's very easy to make high-confidence benchmarks for OCR problems (just type out the ground truth by hand), so it's easy to trust the benchmark. Think accuracy and token F1. I'm talking about highly complex OCR that requires a heavyweight model.

Scout (Meta), a very small/weak model, is outperforming Gemini Flash. This is highly unexpected and a huge cost savings.

Some problems aren't so easily benchmarked.

jmathai · 2026-01-21T20:12:18 1769026338

Volume and statistical significance? I'm not sure what kind of narrative I would trust beyond the actual data.

It's the hard part of using LLMs and a mistake I think many people make. The only way to really understand or know is to have repeatable and consistent frameworks to validate your hypothesis (or in my case, have my hypothesis be proved wrong).

You can't get to 100% confidence with LLMs.

lorey · 2026-01-20T21:30:54 1768944654

You're right. We did a few use cases and I have to admit that while customer service is easiest to explain, its where I'd also not choose the cheapest model for said reasons.

andy99 · 2026-01-19T20:01:00 1768852860

Is anyone aware of a more thorough argument for why this must be the case? Is it a commonly held view? It sounds realistic, but not necessarily and immutable law, I’d like to know what thought has been given to this.

lurk2 · 2026-01-19T20:13:36 1768853616

It’s an incentive problem. If even one party defects in a society of pacifists, the pacifists have no real method of recourse besides refusing to interact with the defector, and how many people are going to do that if the defector starts killing people to enforce compliance?

Some subscribe to a soft pacifism where non-destructive violent resistance like disarming the defector or disabling the defector using less-lethal technologies like a tazer would be fine. Pure pacifists who don’t believe in any kind of physical resistance whatsoever are almost exclusively religious practitioners who don’t ascribe a high degree of value to life in this world because they believe non-resistance will bear spiritual fruit in the next world.

hackable_sand · 2026-01-19T21:13:40 1768857220

Under scrutiny I'm sure your comment falls apart, but it is accurate from orbit.

I happen to hold this philosophy under different words.

mystraline · 2026-01-19T21:56:58 1768859818

Its also appropriate to remember that MLK was friends with Malcolm X, and both chose their own means to support the same end goal.

MLK chose nonviolent shows of force, whereas Malcolm X chose more direct forms of violence.

Governments could save face by negotiating with MLK, as he used nonviolent means. They couldn't negotiate with Malcolm X because thats the whole "we cannot negotiate with criminals and terrorists".

marklemay · 2026-01-20T02:14:33 1768875273

In Malcolms auto biography they it was explicit that they were not friends.

peppersghost93 · 2026-01-19T20:08:27 1768853307

It's because people in positions of power can safely ignore nonviolence. They can't ignore the other option. Nonviolence on it's own is not productive.

mystraline · 2026-01-19T22:06:44 1768860404

Thats what disturbs me about yesteryears's protest marches, like MLK's March on Washington, compared to 50501 and No Kings.

MLK wanted a non-violent showing of force as to stay "legal", but a strong implicit threat of "well, you know, theres a LOT of us. We're peaceful for now". The bus boycotts almost bankrupted down in Atlanta, so money attacks also work.

But now, we have No Kings and 50501. The whole idea of mass protest as a 'nonviolent but imminent threat' is completely gone. Protests were a prelude to something to be done. Now, its more of a political action rally, with not much of anything to follow up the initial energy.

Which is also why the protests; pussy hat rebellion, 50501, No Kings - they've all failed. Theres no goals. Its just chanting and some signs.

andy99 · 2026-01-19T22:22:38 1768861358

Imo this is what happened once protests became a “right”. I know most people here won’t agree with the Canada trucker protest, but I remember when it happened, people were saying “ok, you’ve had your protest and exercised your rights, you’ve been heard, you can go home now” - framing it just like that, as a rally to show an opinion rather than a threat. It felt to me like “the establishment” just treats them as peformative, because as you say they usually are, and then doesn’t know what to do when it’s actually something they have to react to.

HNisCIS · 2026-01-19T20:15:01 1768853701

A commonly cited example is during the Battle of Seattle the cops wanted to beat the shit out of a nonviolent sit in and the black bloc protected them through a combination of strength and diversion. The non violent people are there for the optics and the violent people are there assuring that any move made on the nonviolent protesters will be rewarded swiftly.

The important part is that the violence mostly doesn't start until someone tries to hurt those who are there peacefully. Good was there peacefully so retaliation is becoming a possibility.

shimman · 2026-01-19T20:49:20 1768855760

Yeah, the thorough argument is that people in power don't want people to rise up and challenge their authority.

It's absolutely not realistic. Every right we have was fought for and people died trying to get it. This is especially true in America where a fifth of the population was enslaved at inception. Nothing has never been given to us it had to be taken from abusers of power and there have always been abusers of power in this country.

I mean Trump is no different than Washington. Washington routinely ignored laws, he tried to have his lackeys go get his "property" from free states while never willing to go to court (a provision of the fugitive slave act).

John Adam's called Shays's Resistance terrorists because they had the audacity to close down courts to stop foreclosures of farms (fun fact, that was the first time since the revolution where Americans fired artillery at other Americans (and it was a paid mercenary army by Boston merchants killing over credit)).

You can go down the list, it's always been there but luckily there were always people fighting against it trying to better society against those that simply dragged us down.

andy99 · 2026-01-19T14:58:14 1768834694

Yes, just clicked on the video, instantly nauseous. I get motion sick generally.

andy99 · 2026-01-19T14:34:28 1768833268

There was a COBOL LLM eval benchmark published a few years ago, looks like it hasn’t been maintained: https://github.com/zorse-project/COBOLEval

At least I think that’s the repo, there was an HN discussion at the time but the link is broken now: https://news.ycombinator.com/item?id=39873793

andy99 · 2026-01-18T18:20:29 1768760429

I recently learned Bear blog (a small blogging platform, posts on which often appear on HN) has a “discover” section with a front page style ranking. Their algorithm is on the page

  This page is ranked according to the following algorithm:
  Score = log10(U) + (S / (B * 86,400))

  Where,
  U = Upvotes of a post
  S = Seconds since Jan 1st, 2020
  B = Buoyancy modifier (currently at 14)

See https://bearblog.dev/discover/