More

thomasmg · 2026-01-25T17:08:22 1769360902

Schubfach, Ryū, Dragonbox etc support round-tripping and shortest-width, which (it sounds like) is not important for you. The idea of round-tripping is that if you convert a double to a string and then parse that, you get the exact same value. Shortest-width is to correctly round and generate the shortest possible text. I tried to implement a version that does _just_ round-tripping but is not shortest-width, it is around 290 lines for both parsing and toString [1]

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...

thomasmg · 2026-01-22T19:28:52 1769110132

There are many variants. It really depends on what features you need. Cuckoo filters were mentioned. If you want to add and remove entries and the regular counting Bloom filter are not good enough, I would look at the "succinct counting blocked Bloom filter" [1]: they only need about twice the space than regular Bloom filters. Sure, cuckoo filters need less memory, but they can fail basically any time, while these can not.

Tiny filters: Some time ago I worked on tiny statistics [2]. This includes some 64-bit HyperLogLog implementations; some use linear counting, which is basically a 64-bit Bloom filter, until some limit, and only then switch to HyperLogLog. This is great for distinct counts of columns in databases (cardinality estimation). This project also includes 64-bit approximate counts and histograms.

[1] https://github.com/FastFilter/fastfilter_java/blob/master/fa... [2] https://github.com/thomasmueller/tinyStats

Sesse__ · 2026-01-23T16:02:17 1769184137

FWIW, I found https://github.com/FastFilter/fastfilter_java/issues/28 a pretty good explanation of what's going on in the succinct counting blocked Bloom filters. (I'm not sure if the blocking is needed for the normal Bloom filter part, though, i.e., all bits don't necessarily need to fall within the same block, no? But the counts are stored separately for each block.)

thomasmg · 2026-01-23T16:12:27 1769184747

Yes, non-blocked is also possible. This would need a bit less space, and would be a bit slower. The counts > 1 (per bit that is set) are stored spearately, yes.

Sesse__ · 2026-01-23T18:29:37 1769192977

> This would need a bit less space, and would be a bit slower.

I guess that is because the count storage update is really slow, right, so it's better to have one than two (or whatever number of set bits) operations? At least the linked code seems to process it one by one bit when updating, and without some sort of “rank of n-th set bit” operation (which would accelerate the “select” version fairly well), I'm not sure it could be made much faster than that either.

Edit: I see https://github.com/FastFilter/fastfilter_java/blob/master/fa... tries to make that operation in O(1), but to be honest, with this many operations, the cure almost looks worse than the disease. :-) Haven't benchmarked, though.

thomasmg · 2026-01-25T06:32:16 1769322736

Yes the count storage update is a bit slow. It could be implemented as a background thread, so that it is not blocking. It depends on the use case wheter this is a problem. The 'blocked' variant might be faster to optimize I assume, if this is needed.

Sesse__ · 2026-01-25T08:56:42 1769331402

> It could be implemented as a background thread, so that it is not blocking.

How would you realistically do this without creating more overhead in the thread communication than what you're saving? I've never heard of offloading a 30–40 cycle operation to a thread before. Typically sending an atomic across CPUs is what, 200 cycles? (That's assuming you have the thread just sitting there in some kind of pool; firing up a new one is much, much more expensive. It also assumes you never need to hear back from it, so wouldn't work well for a decrease where you need to know the count.)

thomasmg · 2026-01-25T18:30:48 1769365848

Right, it would be wasteful if it's done for each entry separately. But that is not needed. (I should probably write an article about this and a better implementation).

The "succinct counting (blocked) Bloom filters" have two components: the regular Bloom filter for querying, and the count storage. The count storage is not "strictly" needed for querying: you will never get a false _negative_ is increments / decrements in the count storage are deferred. So updating the count storage can be done asynchronously. Both increments and decrements, if you want. So these can be buffered, eg 100 increments / decrements at a time. Sure, the false-positive rate is slightly higher than needed if the decrements are deferred, but with a large filter this effect is small.

So that means, you can buffer increments / decrements in the count storage. You still want to do the "or" operations in the the Bloom filter part synchronously, so that you don't get false negatives. And then, it is no longer just 30-40 cycles, but 3000-4000 cycles, or 30'000-40'000 cycles, or so. I understand this would not be trivial to implement, but also not very complex. I never really had a real-world use case so far, so I didn't work on a full implementation yet.

Sesse__ · 2026-01-25T20:54:41 1769374481

Sure, you can batch. But that means your decrements must be _entirely_ buffered, right? Because you cannot remove anything from the main filter until you know its count is zero, and you don't know its count until you've processed all increments. And you need to do that under a lock, or you'll have a race where you could have a false negative in the main filter for a short while.

thomasmg · 2026-01-25T23:55:09 1769385309

Yes exactly!

thomasmg · 2026-01-22T15:04:59 1769094299

I'm one of the authors. Feel free to ask anything.

kuhdungbingo · 2026-01-22T15:49:42 1769096982

Now would this be useful in high scaling games, specifically: determining if you might be a winner of game with a grid of 10^nx10^n (where n>5). A very large "cow bingo game", where the insertions are made randomly spawned on a grid? Seems one of the other filters would be more appropriate as they support dynamic insertion.

Still very neat, nice work!

thomasmg · 2026-01-23T05:56:21 1769147781

Yes, you would need a dynamic filter (eg. regular Bloom filter or cuckoo filter) for this, due to the insertions.

Static filters are good for eg. leaked password lists, LSM trees (LevelDB, RocksDB), and so on.

thomasmg · 2026-01-21T13:00:29 1769000429

The PDF standard uses base 85 encoding (Ascii 85).

thomasmg · 2026-01-12T16:05:18 1768233918

I wonder, how can a programming language have the productivity of a high-level language ("write like a high-level language"), if it has manual memory management? This just doesn't add up in my view.

I'm writing my own programming language that tries "Write like a high-level language, run like C.", but it does not have manual memory management. It has reference counting with lightweight borrowing for performance sensitive parts: https://github.com/thomasmueller/bau-lang

jhgb · 2026-01-13T03:21:26 1768274486

C is literally a high level language.

throwaway17_17 · 2026-01-13T05:40:11 1768282811

Seriously, in the discussion happening in this thread C is clearly not a high-level language in context.

I get your statement and even agree with it in certain contexts. But in a discussion where high-level languages are presumed (in context) to not have memory management, looping constructs are defined over a semantics inferred range of some given types, overloading of functions (maybe even operators), algebraic datatypes, and other functional language mixins: C most certainly IS NOT a high level language.

This is pedantic to the point of being derailing and in some ways seemed geared to end the discussion occurring by sticking a bar in the conversations spokes.

saidnooneever · 2026-01-13T11:16:34 1768302994

glad you bring up context in this note. i find C high level too but u are right, in a comparisson you can still say its really low level.

C was coined originally as high level because the alternatives were things like assembler. a term rooted in comparisson more than anything.

throwaway17_17 · 2026-01-13T14:06:52 1768313212

Thanks, my parent’s comment is almost a thought-terminating cliche in this kind of discussion. However, Chisnall’s now classic ‘C is not a low level language’ article is one of my favorite papers on language theory and potential hardware design. A discussion about the shortcomings of viewing C as a low level language can/could be profitable, deep, and interesting; but context is king.

afreire · 2026-01-13T09:30:53 1768296653

It has autofree and drop traits.

thomasmg · 2026-01-03T21:19:55 1767475195

Yes. I did the exact same thing a few weeks ago (in Java, but the idea is to port it to my own programming language). I wrote a terminal UI. Lots of small challenges. Many things to learn, like minimax, bitboards, immutable vs mutable.

thomasmg · 2025-12-11T08:41:57 1765442517

I'm surprised that the simple, ~80 lines version of stable-in-place merge sort (see link in the above comments) is not more widely known. It is O(n log n log n) and not all that hard to implement.

thomasmg · 2025-12-11T08:31:34 1765441894

There is stable in-place merge sort [1], which is O(n*log(n)^2) and not that complex or hard to implement (about 80 lines, and that includes the ~15 lines of binarySearch, which you might need anyway).

[1] https://github.com/thomasmueller/bau-lang/blob/main/src/test...

thomasmg · 2025-12-11T05:53:35 1765432415

There is stable in-place merge sort, it runs in O(n*log(n)^2). It is about 3 times more complex than shell sort. I implemented it here https://github.com/thomasmueller/bau-lang/blob/main/src/test... (most sort algos you mentioned above are in the same direcory btw)

You didn't mention heap sort. A simple implementation, which doesn't do any method calls just like shell sort (also next to the merge sort above) is about twice as complex than shell sort.

bxparks · 2025-12-11T16:30:54 1765470654

Thanks for the reference to in-place merge sort. GitHub is having a lot problems right now, I cannot see your code. I will look at it when I get a chance.

I think I ignored Heap sort because it uses O(N) extra RAM, which is precious on a resource-constrained microcontroller.

thomasmg · 2025-12-11T20:01:52 1765483312

Heap sort is in-place (and does not even need recursion, unlike quicksort). But yes, it is not stable, and usually slower than even shell sort (except for very large arrays).

thomasmg · 2025-11-12T06:34:51 1762929291

I saw that in Swift, a method can declare it throws an exception, but it doesn't (can't) declare the exception _type_. I'm not a regular user of Swift (I usually use Java - I'm not sure what other languages you are familiar with), but just thinking about it: isn't it strange that you don't know the exception type? Isn't this kind of like an untyped language, where you have to read the documentation on what a method can return? Isn't this a source of errors itself, in practise?

Someone · 2025-11-12T07:07:20 1762931240

> isn't it strange that you don't know the exception type?

Java experience taught us that, when writing an interface, it is common not to know the exception type. You often can’t know, for example, whether an implementation can time out (e.g. because it will make network calls) or will access a database (and thus can throw RollbackException). Consequently, when implementing an interface, it is common in Java to wrap exceptions in an exception of the type declared in the interface (https://wiki.c2.com/?ExceptionTunneling)

thomasmg · 2025-11-12T08:53:20 1762937600

Yes I know Java and the challenges with exceptions there (checked vs unchecked exceptions, errors). But at least (arguably) in Java, the methods (for checked exceptions at least) declares what class the exception / exceptions is. I personally do not think wrapping exceptions in other exception types, in Java, is a major problem. In Swift, you just have "throws" without _any_ type. And so the caller has to be prepared for everything: a later version of the library might suddenly return a new type of exception.

One could argue Rust is slightly better than Java, because in Rust there are no unchecked exceptions. However, in Rust there is panic, which is in a way like unchecked exceptions, which you can also catch (with panic unwinding). But at least in Rust, regular exceptions are fast.

Someone · 2025-11-12T12:11:10 1762949470

> And so the caller has to be prepared for everything: a later version of the library might suddenly return a new type of exception.

But you get the same with checked exceptions in Java. Yes, an interface will say foo can only throw FooException, but if you want to do anything when you get a FooException, you have to look inside to figure out what exactly was wrong, and what’s inside that FooException isn’t limited.

A later version of the library may suddenly throw a FooException with a BarException inside it.

magicalhippo · 2025-11-12T07:29:01 1762932541

What I liked about Bosst's error_code[1], which is part of the standard library now, is that it carties not just the error but the error category, and with it a machinery for categories to compare error_codes from other categories.

So as a user you could check for a generic file_not_found error, and if the underlying library uses http it could just pass on the 404 error_code with an http_category say, and your comparison would return true.

This allows you to handle very specific errors yet also allow users to handle errors in a more generic fashion in most cases.

[1]: https://www.boost.org/doc/libs/latest/libs/system/doc/html/s...

tcfhgj · 2025-11-12T08:44:00 1762937040

When using a language forcing checked exceptions, you would know, wouldn't you?

mayoff · 2025-11-12T06:58:47 1762930727

Swift gained limited support for “typed throws” in Swift 6.0 (2024).

https://github.com/swiftlang/swift-evolution/blob/main/propo...

I say limited because the compiler doesn't (yet, as of 6.2) perform typed throw inference for closures (a closure that throws is inferred to throw `any Error`). I have personally found this sufficiently limiting that I've given up using typed throws in the few places I want to, for now.

amomchilov · 2025-11-13T04:52:50 1763009570

Typed exceptions are unlike typed parameters or return values. They don’t just describe the interface of your function, but expose details about its implementation and constrain future changes.

That’s a huge limitation when writing libraries. If you have an old function that declares that it can throw a DatabaseError, you can’t e.g. add caching to it. Adding CacheError to the list of throwable types is an API breaking change, just like changing a return type.

Swift has typed errors now, but they shouldn’t be used carefully, and probably not be the default to reach for

nielsbot · 2025-11-13T00:32:12 1762993932

I don't think it's strange at all--my main uses of the returned errors are

1a. yes, there was some error 1b. there was an error--throw another local error and encapsulate the caught error 2. treat result of throwing call as `nil` and handle appropriately

I don't think typed throws add anything to the language. I think they will result in people wasting time pondering error types and building large error handling machines :)

When I used Java, I found typed exceptions difficult to reason about and handle correctly.