More

denhaus · 2025-10-01T20:27:56 1759350476

the final one is not AI, it’s a glorb video from years ago: https://m.youtube.com/watch?v=NkYSK-_hVDQ

denhaus · 2025-06-05T00:28:42 1749083322

As a clarification, we used fine tuning more than prompt engineering because low or few-shot prompt engineering did not work for our use case.

denhaus · 2025-06-05T00:27:38 1749083258

Regarding point 3, my colleagues and i studied this for a use case in science: https://doi.org/10.1038/s41467-024-45563-x

caterama · 2025-06-05T01:17:21 1749086241

Can you provide a "so what?" summary?

melagonster · 2025-06-05T03:12:17 1749093137

>We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as a list of JSON objects. This approach represents a simple, accessible, and highly flexible route to obtaining large databases of structured specialized scientific knowledge extracted from research papers.

denhaus · 2025-06-09T02:35:32 1749436532

Short answer: It’s a way to generate structured databases for (most) scientific topics. Why? Apply data driven methods to these databases. So what? It’s a powerful way to ask and investigate scientific questions/trends otherwise hidden inside a million scientific papers.

Example: Consider what PDB has done for our understanding of protein folding, as well as the ML/computational techniques they’ve enabled (eg, Alphafold). Most scientific questions and properties are not as data-rich as protein folding. What if they could be?

Longer answer: The last 15 years in computational/ML + science have shown that structured databases open up entirely new frontiers in discovery (eg Protein Data Bank, Materials Project). But most scientific topics/properties are NOT in structured DBs, they’re scattered about in millions of papers. It’s especially a huge problem in some topics in materials science. It’s not that these problems are data scarce, but that it’s hard to actually collate their data in a structured format. You literally cannot use most ML methods because structured DBs do not exist.

This paper is a way to generate massive structured databases of specialized, intricate, and hierarchical knowledge graphs from scientific literature. Fine tuning works, prompt engineering does not (at the time, perhaps this has changed). Once you have a database, you can analyze an entire subfield or topic in science with ML or stats methods.

denhaus · on Jan 20, 2025

I share your viewpoint on this, that DFT is a poor proxy model for ML to approximate.

However, the alternative of using, for example - experimental data, is that the synthesis procedures, measurement parameters, sample impurities, and even differences between experimental apparatus means training datasets of even modest size are insanely heterogeneous. So models either are either trained to predict differences between materials due to experimental discrepancies, trained on very small datasets, or must have a slew of post-hoc physics-based adjustments added to get reasonable numbers.

Higher order computational methods (including simply more intensive, non-high throughput DFT) are accurate but expensive as you know. Some of them have systematic error in the way DFT does, and are essentially based on user choice of (many!) parameters. Charged defect calculations are on example of this. Finding large (>10^4) training sets with similar parameters for computation is difficult. “ML” for these kinds of calculations usually consists of like, calculating a hundred (or 10) crystals within a narrow chemical system, doing a linear regression on one variable (eg, valence of cation on some site), and getting numbers +\- 10% of a “true” number.

GGA/meta-GGA DFT, on the other hand, can be applied at a sufficient fidelity to get real(ish) numbers in a homogenous way across huge numbers of crystals. So you are correct, you are predicting an approximate number for a property in many cases. But if we know the approximate number is wrong due to systematic error (and we can, in some situations) we can apply corrections or higher order methods to get the right(ish) answer. More, it’s highly dependent on which property you’re interested in. Some properties, like band gap, can be off by a lot. Others, like formation energy, can be calculated pretty accurately even with run-of-the-mill GGA DFT. Elastic moduli are generally ok.

in summary, approximating DFT with ML is just the least messy way to get real-ish answers across a large number of materials. Of course, there’s a point at which low-fidelity DFT calculations are - (1) so cheap and (2) so inaccurate, generally - that having an ML model approximate them is pointless. Most large DBs of materials now use good enough DFT that the numbers they calculate are not pointless for ML to learn from.

In the future, I think models trained on large numbers of DFT calculations will have to be applied to narrow sets of higher fidelity calculations by tuning. Much like you can fine tune a generalized LLM to do specific things. That might be where ML can actually bring real value to materials design.

Also, it’s worth considering that synthesizing novel materials can be insanely difficult. So 1 in 4 is not bad in my opinion.

denhaus · on Nov 22, 2024

i agree. in principle this is totally fine. in reality, highly paid lawyers throw the law around to accomplish whatever their objectives are. if they’re paid by a media company, and the media company’s objective is to reduce “illegal” reproduction of their content, principle is less important than winning.

denhaus · on July 22, 2024

More than half of the PhDs I know (in technical fields) have families. Actually, almost all of them.

denhaus · on July 8, 2024

On most firm conditions, I absolutely think you can rip it on fat skis with awful edges. Especially on groomers. I do it every year on 120mm waist skis. It is highly dependent on skill level. For an example, watch freestylers carve switch down some rock-solid melt frozen 45deg park feature on detuned $200 skis they picked up at a swap meet 8 years ago. The result is much more dependent on the rider, not the equipment. I’ve watched pro dudes rip harder than 99.999% of skiers on joke trash skis from the 90s (rusting, holes in the base, chunks missing from edges) and broken snowblades.

On true ice, NO ONE is ripping it except racers or ex-racers with good equipment.

denhaus · on July 8, 2024

I laughed out loud at the “fly off a cliff edge”

On the real however, getting down the mountain safely after dusk etc with dull vs. sharp edges will likely only affect intermediate skiers. Beginner skiers are going to crash no matter what they’re riding if on steep and icy terrain. Expert skiers know when and how to ride conservatively and can basically ride anything in any conditions “safely” (even if that means just sliding a firm patch rather than carving it), as long as they’re aware of the limitations of their gear. 90% of the year I ride pow skis in any conditions (including melt freeze etc) with super dull edges - it’s totally fine. The other 10% is just to have a little more fun on very firm days.

Intermediates, on the other hand, will be overly aggressive beyond their capabilities. They’ll bounce their helmet off a melt-frozen knoll at first opportunity, similar to what you said!

bamboozled · on July 8, 2024

What I didn't really like about he parents comment is that it seems kind of "lazy" not to do your edges. My wife broker her arm recently when a kid fell over in front of her and she couldn't stop quickly. On steep icy terrain this is a concern. That was my true "fly off a cliff edge" example.

The comment seemed like a "I don't wear a helmet because it's uncool" sort of thing...just do your edges occasionally, what's the issue?

denhaus · on July 9, 2024

Well I did as a racer and still do on my firm snow skis. But there is a slight performance trade off depending on what kind of skiing you like to do. I keep my edges on my pow skis dull because it’s slightly easier to slash and dash in crud and makes very little difference on icy terrain. For fully cambered directional skis being ridden very aggressively, it would make more of a difference. Every once in a while i’ll do the edges on my firm snow skis though if I’m bored, since a few years ago I bought a full diamond edge tuning kit and feel like I need to get some use out of it.

Point being, I’ve never found sharp edges matter much for most recreational riding. None of this has any relation to being uncool. Maybe in your wifes scenario, having razor sharp edges would have saved her from breaking her arm; almost certainly not.

bamboozled · on July 11, 2024

Why not just have "clean" edges though? Low bevel angle, but able to work on ice fine when required?

In pow and crud, I don't know if the edges do really anything, except if there was any rust, they'd probably grip the crud a bit.

denhaus · on July 8, 2024

It’s a racing thing, mostly. Most shops offer a basic service for a machine wax + edge sharpening so that’s when most people get it done (even if they don’t really know what they’re paying for)

denhaus · on July 1, 2024

For anyone interested, we wrote a paper on a similar topic: https://www.nature.com/articles/s41467-024-45563-x