Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
MutexProtected: A C++ Pattern for Easier Concurrency (awesomekling.github.io)
118 points by soheilpro on April 6, 2023 | hide | past | favorite | 132 comments


This is how Rust’s mutexes work as well - albeit using RAII rather than a callback, but with the same fundamental innovation of putting the data inside the mutex object. I’ve used it in my C++ code as well; it’s a great pattern.


Crucially the Rust borrow checker means using this wrong from (safe) Rust is impossible, whereas it's relatively easy to use either C++ approach wrongly, especially if nobody has actually sat down and explained why you're doing it. In C++ we can "just" leak the reference to the protected data, in (safe) Rust of course the borrow checker means that programs with this mistake won't build.

Rust's ZSTs (~ unit types have zero storage size) make it reasonable to write Mutex<MyMarker> and then require that people show you their MyMarker (which they can only get at by holding the lock) to do stuff thus enforcing locking even when it's not really data you're protecting. Because these are zero size, the resulting machine code is unchanged but the type checking means if you forget to lock first your program won't even build, which shifts locking mistakes very hard left.


Can you give an example of using a ZST as a marker in a Mutex? I think I understand that you're suggesting this as a way of locking some section of code even if it's not a specific piece of data that you're locking, but I'm wondering how this could "enforce locking" then, since anything you do while locked could just as easily be accidentally done without the lock, right?


Sure:

https://rust.godbolt.org/z/74KT3Tcaa

The idea is we wrap code that ought to only run with a lock as a function, and we define the function such that one of its required arguments is a reference to the Marker type. When somebody tries to write code to call these functions, they're reminded by its signature that they'll need the Marker type, and the only way to get it is to lock some Mutex provided for the purpose.

You're correct that if they're able to do all the stuff which needs a lock themselves without calling these functions, they can hurt themselves, and if that's easy enough they might do it by accident. However it's not often that we've got something that's dangerous enough to need protecting this way and yet also so easy that people would rather re-implement it wrongly than read how to take the lock.

Rust's POSIX stdio features are protected this way, and obviously on some level you could bypass all that, and unsafely cause write syscalls with the right file descriptor number but like, I'd guess it'd take me an hour to write all that, whereas println!() and co. are right there in front of me, so...


You would have an `fn do_something_that_requires_held_lock<'a>(marker: ZstMarker<'a>)`. To call the function, you need a value of type `ZstMarker<'a>`, and you would structure your API such that the only way of getting that value is by locking the mutex.

Crucially, the lifetime parameter `'a` ensures that the marker can't outlive the muted being unlocked.

One example of this pattern is the `Python<'py>` marker in the `pyo3` library (https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html), which represents holding the Python GIL. The internals of `pyo3` do lots of `unsafe` FFI with the Python runtime, but the API exposed by the library is safe thanks to this pattern.


But is there a way to use Rust's approach of RAII guards in C++, rather than accessing the mutex inside a closure?


Of course. Boost.synchronized and the folly variant have been mentioned elsethread.


Adding my favorite to the pile: https://github.com/dragazo/rustex


> RAII

Resource acquisition is initialization


This is one of those cases where the acronym itself carries more meaning than the expansion.


Neat idea. Though I think having to pass a lambda for anything you want to do with the fields is awful ergonomics.

Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.



No, but Boost is one giant bear of dependency with lots of duplication with STL. Personally I prefer to use tiny includable file / lib for specific functionality that I need rather than keep that bear on my real estate.


This common criticism conflates two things.

Boost is not monolithic, nor are any of the sub-parts especially huge (like ... asio is big because an asynchrony system is a big project; regex is small because it can be). Nor, as other commenters have pointed out, is including those dependencies particularly onerous; many are single-file or just a handful of files.

I suspect your criticism has more to do with "one giant bear of dependency". Which: fair enough. Getting Boost set up is a pain; the pain does not reduce if you only want to use one sub-part of it, because you still have to use b2, their custom build language and associated cmake-ish system (yuck!) to prepare it (there are ways to avoid this, but they are not well-supported). Even CMake's (pretty good!) Boost discovery/inclusion support can only conceal that weirdness to a point, and often breaks in unexpected environments--well, breaks more often than the average CMake setup, which breaks a lot...

Similarly, getting your IDE to understand boost's presence can be tricky since many IDEs (looking at you VS) have overly-opinionated ways of incorporating it that don't jive well with how a CLI/portable project build system pulls it in.

But the first and second paragraphs aren't the same thing. Initial setup pains are real and suck, but the initial setup pains and the overhead/monolithic-ness of actually using boost in your projects once it's set up are another. Initial setup pain is largely a one-time or infrequent cost, and the infrequency of that cost should be weighed against the not-inconsiderable convenience of Boost as a tool.

Not saying it's universally worth it; for some projects (especially ones that need to be built on a wide variety of environments, though this is rarer than most people, even some project authors, think) it's not appropriate. But many of the standard criticisms of Boost's runtime utility are specious.


I'll point out that if you're using header-only portions of Boost, you don't have to build Boost. Lots (most?) of Boost is header-only. You can just download the tarball and include headers out of it. It's barely even a dependency. Of all the deps my C++ app has, Boost is the easiest; I just download and extract and it's done, no build step. I don't need the actual compiled library; only the headers.


Boost is multiple libraries, you don't have to use all of it. Many of it are header-only as well.


Seriously I have no desire installing this giant thing and deciphering what can be extracted and how.

This is not to diminish value of Boost but it is just simply not for me.


Using only the header-only stuff is super easy; there's no deciphering or extracting at all. You just include the headers and _don't_ link the Boost library and see if it builds. If it doesn't, then I guess that wasn't a header-only library. You're not building or installing Boost at all; you're just downloading a copy of the headers. Only the template instantiations you actually use will end up in your binary.

(This sounds flippant but it's literally how I use boost in real life. Why decipher when the compiler can just tell you?)


not to mention that the documentation is generally utterly awful, with mostly only reference-style function documentation without exposition but manually distributed across lots of small 1-page paragraphs.


This is in general is a problem with C++ libraries because doc generation is not standardized unlike Rust and other languages.


Doxygen was a great step forwards when it came out, but C++ documentation doesn’t seem to have evolved since, and it only fulfills the “reference” pillar of documentation.

Rust’s books (mdbook?) are amazing. Lots of libraries have good, clear documentation explaining how to use the library, on top of the automatic docs.rs output (which I still sometimes find difficult to navigate, but think is just my incomplete understanding). I have no idea how the community has managed to consistently achieve this.


I once tried to use a library that used boost and since I didn’t want to require my code to need the entirety of boost (which is gigabytes in size!) I tried to extract just the parts of boost that were needed. There are so many interdependencies between sublibraries of boost that after about two hours I decided I will never again use any library that relied on boost.


Boost isn't gigabytes in size wat are you talking about. All the headers are here in this 14mb archive: https://github.com/ossia/sdk/releases/download/sdk25/boost_1... and that is enough to use 90% of the boost libs as they are mostly header only


I don’t know. I’m just telling my experience. Whatever release I thought I needed at the time was huge.


That's how folly::Synchronized works, it supports both the callback interface as MutexProtected, and an interface that returns a RAII lock holder that gives access to the underlying object. It is generally the preferred synchronization utility at Meta and thus it is widely used.

https://github.com/facebook/folly/blob/main/folly/docs/Synch...


As someone who had to occasionally write C++ in FBCode but not enough to really feel comfortable with it, `folly::Synchronized` and its straightforward semantics were really nice.


When using locks I'm more paranoid about misusing the lock and don't mind typing a bit more to make the code obvious. Also, it seems that good auto-completion for that pattern is simple to achieve.

Alternatives to make sure you are not grabbing the wrong lock include this much uglier GUARDED_BY macro, - http://clang.llvm.org/docs/ThreadSafetyAnalysis.html

I'd say the extra lambda is a fair price to pay


It’s a fair price to pay, but I think the suggested change is better because it makes you pay less for the same benefit. It still would be obvious that there’s a lock, but the code would IMO be simpler:

  thing->field.with([&](Field& field) {
    use(field);
  });
would become

  {
     auto field = Mutex.locker(thing.field); // or thing.field.lock();
     use(field);
  }
or

  use(Mutex.locker(thing.field));
Yes, that requires you to understand destructors exist, but if you’re programming C++, that’s a given.

I find that easier to understand (partly because the first doesn’t mention ‘lock’. I don’t think ‘with’ is the best name there)


> because it makes you pay less for the same benefit.

Requiring a lambda capturing stuff and be passed around is not what I would call "pay less", when the alternative is just adding a block and instantiate a std::lock_guard


Fuller quote: “I think the suggested change is better because it makes you pay less for the same benefit”

The change suggested in this part of this discussion is by user “Blackthorn” in https://news.ycombinator.com/item?id=35464828. It says:

> Neat idea. Though I think having to pass a lambda for anything you want to do with the fields is awful ergonomics.

> Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.

User “dietr1ch” replied in https://news.ycombinator.com/item?id=35465610 he wanted to pay with a lambda to make it clear a lock is taken.

I tried to clarify that Blackthorn‘s suggestion would make that clear without requiring that lambda.


Yes, that's a great way to improve the ergonomics of MutexProtected in simple cases! We should totally add that as a complement to the lambda API. :)


After sketching this out, it's nice that this:

    auto x = state.with([](auto& state) { return state.x; });
Becomes this:

    auto x = state.locked()->x;
But it also creates this very accessible footgun:

    auto& x = state.locked()->x;
Where it's way too easy to bind a reference to something that should be protected by the lock, but now isn't. So I'm not sure this is a great idea anymore.


At the end of the day, this is C++, there are no compiler tracked lifetimes and it is easy to leak references out of the lambda as well.

At least the synchronized pattern makes it easy to document which state is protected by which mutex.


> But it also creates this very accessible footgun:

Well, you said it yourself in the article though. There's always ways around the locking; C++ doesn't really give you the full ability to guarantee a field is locked when access. You'll need to either trust the users to some extent or use a style guide to disallow the pattern (I'd suggest only allowing use of auto x = state.locked() to avoid lifetime questions around state.locked()->x). You'd need to use compiler annotations to get any better.


Wait why is this a footgun?


Because now you're holding a reference to `x` which is supposed to be protected by a mutex, even after the mutex is unlocked.

With the lambda-only API, it's much harder to make this mistake, since a temporary reference like this will still go out of scope at the end of the lambda expression.


You specifically mentioned that this is a footgun:

> auto& x = state.locked()->x;

But I don't see how the reference here is gonna make a difference unless i am reading the lifetime of the lock here incorrectly. For example, this is perfectly fine right?

```

{

auto& x = state.locked()->x;

}

```

This will only be a problem if you have an outside struct that holds a reference

```

auto &a = "something";

{

auto& x = state.locked()->x;

a = x;

}

```

Which can still happen even if you use a lambda.


Holding the reference to a field that is protected by a mutex implies there is another thread out there that will race with your reference in either reading or writing it.

Even just a read is racy, as there is no “atomic read” of any size value if it is not already wrapped as atomic.


Yeah, the syntax/ergonomics is tortured. This is my problem with modern C++ in general. As someone who isn’t a full-time C++ programmer (but who maintains a library written in C++), I have to shy away from anything too fancy as the syntax is too baroque and hard to recall.


> Yeah, the syntax/ergonomics is tortured. This is my problem with modern C++ in general.

I disagree. I mean, the ergonomics in this case are indeed awful, but this has nothing to do with C++, modern or not. This is an API design problem, possibly made worse by people trying to be too clever for their own good.


IMO passing a lambda for synchronized code makes it much easier to read (going off of working with folly::Synchronized)


> Neat idea.

I don't agree. It sounds like a higher level path to deadlocks.

> Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.

It sounds like you're trying to invent std::lock_guard with extra steps.

https://en.cppreference.com/w/cpp/thread/lock_guard


> It sounds like ... std::lock_guard with extra steps.

I'm not the previous poster, but I think they are still suggesting that the value not be accessible until the lock is acquired.

Something like this (Apologies, I haven't done C++ is a long time and this is just off the top of my head. It's based on the example from the lock_guard example for the sake of comparison):

    guardthing<int> guarded_value;

    void safe_increment()
    {
        auto lock = guarded_value.lock();
        lock.value += 1;

        std::cout << "value: " << value << "; in thread #"
                  << std::this_thread::get_id() << '\n';
     
        // the mutex in guarded_value is automatically released when lock
        // goes out of scope
    }


> It sounds like you're trying to invent std::lock_guard with extra steps.

No, because lock_guard doesn't guard fields. lock_guard is simply the RAII option they were talking about in the article.


> MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.

In fact, MutexProtected provides ::lock_exclusive() and ::lock_shared() methods which do exactly that. The article just fails to mention them. https://github.com/SerenityOS/serenity/blob/master/Kernel/Lo...


That looks even worse than a simple callback.

Does it preserve the correct ‘constness’ of the field in the case of shared mutexes?


Another feature of the MutexProtected implementation in SerenityOS is the ability to either obtain an exclusive lock (writable) or a shared lock (read-only). A shared lock provides a const reference to the protected value and therefore enforces read-only access semantics at compile-time by leveraging the C++ type system (specifically, const-correctness).

The initial PR introducing the ancestor to MutexProtected has some details on the motivation behind it and also unearthed a bunch of incorrect locking that the C++ compiler caught when introducing it: https://github.com/SerenityOS/serenity/pull/8851


I also use this pattern: https://github.com/alefore/edge/blob/master/src/concurrent/p...

I added an optional template Validator parameter that can be used to double-check invariants on the protected fields.

I also declared a Protected with condition subclass. Here is one example of how I use it to implement a "barrier" of sorts (the operation instance blocks in the destructor until execution of all callables passed to "Add" has completed): https://github.com/alefore/edge/blob/master/src/concurrent/o...

I found it slightly preferable over Abseil annotations; I think it's slightly more robust (i.e., make errors, like forgetting an annotation, less likely) and I like making this explicit in the type system.


Clang’s static thread safety analysis is pretty good for this too: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html


Yep, it's a useful pattern albeit not a game changer or anything. I've been using it in Kotlin for a while. Here's an implementation:

https://gist.github.com/mikehearn/1913202829403f65331123f047...

One of the nice things about doing it in Kotlin rather than C++ is the clean syntax using trailing lambda blocks and anonymous objects:

    private val state = Locker(object {
        var value = 1
        var another = "value"
    })


    val nextValue = state.locked { ++value }
This works because Kotlin lets you define anonymous types that can take part in type inference even though they're not denotable, and then use them as lambda receivers. So "this" inside the final block points to the unnamed singleton.

It has a few nice features:

• It's runtime overhead free. Kotlin frontend can inline the "locked" call so you end up with the equivalent of manual locking.

• There's no way to get to the state without holding the lock unless you deliberately leak it outside the lambda.

• You can control re-entrancy.

• It tells the compiler the lambda is only invoked once, which has various benefits for allowing more natural code constructs.

These days I'd probably code it with an explicit ReadWriteLock so it's Loom compatible and to allow explicit read/write blocks, but the JVM optimizes locks pretty well so if there's never any contention the memory for the lock is never allocated. I'd also experiment with making it value type so the overhead of the Locker object goes away. But it was never necessary so far.


[edit: deleted first half of my comment - it was just duplication of existing top comment]

-----

Even better, when possible (it isn't always), is to avoid using mutexes at all, and instead have the data structure owned by a relevant thread. When you need to access a data structure, pass a message to that thread, and then just access the data structure freely when you get there. (Of course the inter thread queue uses a mutex or some other synchronisation mechanism, and the queue itself effectively acts as a synchronisation mechanism.)

    struct SomeMsg {
        SomeMsg() {
            thread_foo->pass_message(std::bind_front(this, &SomeMsg::handle_foo));
        }
        void handle_foo(Foo& foo) {
            // ... use foo ...
            thread_bar->pass_message(std::bind_front(this, &SomeMsg::handle_bar));
        }
        void handle_bar(Bar& bar) {
            // ... etc. ...
        }
    };


Chrome does this, more or less, with different syntax. https://chromium.googlesource.com/chromium/src/+/main/docs/t...


You’ve just described Actors. Now take the next step and add a work stealing schedular for a fuller Actor system.


I’ve been toying with these lately, too. My solution was to give each actor an executor where it will run:

    actor<int> foo(stlab::default_executor, “name”, 42);
    auto f = foo.send([](auto& x){
        std::cout << x << ‘\n’; // prints 42
    });
Each call to ‘send’ returns a stlab::future which can have a continuation chained, etc.

The nice thing about these is the executor based interface is generic. The actor could be dedicated to a thread, run on an OS-provided thread pool, etc.

At the time the actor is running, the thread it is on is temporarily given the name of the actor to ease debugging.


Almost. If each thread were an Actor (I think that's what you're saying?) then that would structure the code around each type of work e.g. here is the code for all the database operations. That's not a very useful way to group things for the developer. The way I wrote it above, the code is structured around a single activity across all the types of work it has to do.


> (Of course the inter thread queue uses a mutex or some other synchronisation mechanism, and the queue itself effectively acts as a synchronisation mechanism.)

I'd suggest using MPSC queues for this purpose (assuming the recipient is a single thread rather than a pool).


Or...you could use Clangs GUARDED_BY annotations and thread safety analysis...

https://clang.llvm.org/docs/ThreadSafetyAnalysis.html


Even if very similar, I prefer the MutexedObj class described in those two articles

https://fekir.info/post/extend-generic-thread-safe-mutexed_o...

https://fekir.info/post/sharing-data-between-threads/#_bind-...

In particular it explains why providing getters or operator* like some other implementation do (not in this case) is not a feature, and it does not use multiple classes (like the linked implementation of SerenityOS does) making the implementation simpler.

Also the "simple" implementation is just 20 lines of code...



folly::Synchronized<T> provides a similar pattern. It works well for very simplistic concurrency schemes.

  struct { int a; int b; } A;

  folly::Synchronized<A> syncA;

  int c;
  {
    auto aReader = syncA.rlock();
    c = aReader->a;
    // aReader going out of scope drops the rlock
  }
  {
    auto aWriter = syncA.wlock();
    aWriter->b = 15;
    // aWriter going out of scope drops the wlock
  }
I would argue that the pseudo-pointer accessor objects are more usable than passing lambdas to a `with()` method, but if you want that, folly::Synchronized provides similar `withWLock()` and `withRLock()` methods.

And in fact, MutexProtected seems to provide APIs like Synchronized's `rlock()`/`wlock()`: `lock_shared()` and `lock_exclusive()`: https://github.com/SerenityOS/serenity/blob/master/Kernel/Lo... I don't know why the article doesn't touch on these.

https://github.com/facebook/folly/blob/main/folly/docs/Synch...

https://github.com/facebook/folly/blob/main/folly/Synchroniz...



It got added to the Concurrency TS during the C++ 23 process. It's not in standard C++ itself.


Ah, OK.


MutexProtected is almost Ada protected objects for C++:

https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...

This IS a very useful concurrency pattern.

Any language that implements concurrent access to shared data should have it or something very similar.


This is basically one of the ways to do RAII in GC based languages, only provide the resources via lambdas, thus kind of simulating arena like resource managment, the optimizer will take care of inlining them anyway.


Another common addition to these lockable objects is having a free function variant of 'with' that allows visiting multiple objects:

    MutexProtected<X> x;
    MutexProtected<Y> y;
    ...
    with(x, y, [](X& x, Y& y) {....} );
Using the scoped_lock protocol this can avoid deadlocks.


I thought this was cool, and I really appreciate the low-level bottom-up description/introduction of how MutexProtected helps keep things safe.

I would have loved also getting a disassembly showing the resulting code, it's very hard for me (as an only-sometimes C++ programmer) to guess how the lambda is compiled.


If you want an mt-safe data structure, then may as well just build it that way in the first place, with a private std::mutex and getters/setters that use std::lock_guard.


In truth, I prefer using explicit lock and unlock operations like in C programming. The visibility of these operations aids in understanding the interdependencies within the code.


I could understand this argument against the RAII-style destructor unlock, but this lambda approach seems actually better in the visibility sense since the mutex access is a separately indented block.


It's obviously subjective, but it looks like as any other indented block. Named "lock" and "unlock" operations are more explicit to me.

Another potential disadvantage, is that you lose the possibility to control the unlock order, when you have multiple nested locks.


An indented scope (that starts with a scoped_lock or some appropriately named function) is more explicit than lock/unlock calls interspersed in the code.

std::scoped_lock allows locking multiple locks.


this lambda approach seems actually better in the visibility sense since the mutex access is a separately indented block.

When I'm writing multithreaded code, the first line of any block containing a RAII lock is the lock. Any subsection of code that needs another lock gets its own block.


I don't trust explicit lock and un lock operations abstractions in C, I always use inline asm to acquire and release my locks.


I don't trust asm. I write an array of machine code and execute that each time I want to lock and unlock a lock.


In C++, RAII locking/unlocking also handles exceptions thrown across the lock seamlessly. Otherwise all lock work would have to be try/catch wrapped and handled correctly.


> Otherwise all lock work would have to be try/catch wrapped and handled correctly.

Alternatively, you could ensure your codebase does not use exceptions. (Which is common for "C-style" programmers like GP.)


Also the C construct is easy to understand. Whereas the template one is yet another challenge to test your knowledge of C++.


> A C++ Pattern for Easier Concurrency

Shouldn't downplay sharing that idea, but come on: Realizing Python-like-with-contexts (and yes, even Python isn't the inventor of that..) in C++ with lambdas is a general pattern (think of scenarios like db transactions, or also stuff where RAII pattern would be useful, except that you need a result from the destructor, or it could even fail..) that shouldn't be too new.. Since C++11 when lambdas where introduced, to be precise?


"Let's completely violate encapsulation by operating on Thing with random lambdas instead of Thing methods---oh but look how nicely the lambdas are forced to run under Thing's lock!"

> However, you can’t access the T directly! The only way we’ll let you access the T is by calling with() and passing it a callback that takes a T& parameter.

OK, but then, given this:

  thing->field.with([&](Field& field) {
    use(field);
 });
what prevents us from just replacing it with:

  use(thing->field);
and, secondly, how does use() acctually use the field? If access is not allowed without using with, then use() has to use with also, and another callback, leading to infinite regression.


How does this compare with:

* Using std::atomic<T> ? https://en.cppreference.com/w/cpp/atomic/atomic

* folly::Synchronized, mentioned by ot57? https://github.com/facebook/folly/blob/main/folly/docs/Synch...

* Boost synchronized data structures, -mentioned by mchicken 53? https://www.boost.org/doc/libs/1_81_0/doc/html/thread/sds.ht...

?

... it looks like the SerenityOS people developed this without considering the C++ ecosystem of today.


My understanding is that Serenity has a philosophy of not using dependencies outside of itself. Perhaps this type drew from external inspiration, but was ultimately written in light of that philosophy?


Well, std::atomic is wholly unrelated. MutexProtected seems fundamentally similar to boost and folly synchronized objects, just without the ergonomics and battle-testing the other major libraries have already gotten.


std::atomic really has different use cases. It is meant to be used for objects that can be modified via atomic RMW primitives and require T to be trivially copyable. While you can in principle implement arbitrary operations on top of those RMWs, it might not be the best fit.

For large Ts, is indeed implemented using a mutex or, more likely, a lock pool, so I guess if you squint hard enough it can be considered related.


Nice pattern, but it only solves the problem of having to use a mutex, which is almost never the problem. Deadlocks are way more nastier to handle.

In order to solve deadlocks, I created a mutex that first does a try lock, then if that fails it unlocks all mutexes of the thread and then relocks them in memory order.

Not very efficient some times, especially if there are many mutexes involved (i.e maybe it does not scale up), but it allows me to have algorithms not deadlock.

Perhaps it could be made more efficient with various tricks, I may have to do research on that idea...


I am working on a left-right concurrency control with multiple threads. You shard your data and define a merge operation that is commutative so you can always merge data of different threads. By sharding data by thread each thread can always read and write to its own shard without synchronization or contention.

I also have a global snapshot per thread which is the aggregated data across all threads that is local to a thread.

So it requires 4× the memory per thread but it is extremely performant due to each thread is independent of every other thread


The "Barebones example in C" is enshrined in Java at the language level as every object has a built-in lock and condition variable, and locking is done through the synchronized keyword. https://en.wikipedia.org/wiki/Monitor_(synchronization)#Impl...


Or you could give `MutexLocker` pointer semantics to access the underlying object only through the lock, which would probably be more idiomatic c++ anyways.


Lifetimes might be a bit trickier, though.


Indeed, with lambdas you don't need to deal with locks outliving the underlying object (though I think std::unique_lock has the same issue with the underlying mutex?)

The downside with lambdas is that you cannot lock across function boundaries e.g. give ownership of a lock to the caller.

There's a similar tradeoff for all "context managers" (in Python parlance) where you could either CPS your way around the control flow (the lambda approach) or use c++'s RAII mechanism.

My impression is that RAII feels a bit more idiomatic and puts the burden on library writers instead of users.


Individual accesses are typically the wrong level of abstraction for synchronization. You’ll avoid simple data races but it’s easy to have accidental reentrancy this way.


We use something quite similar to this. It won’t save you from accidental recursive locking though, but I’m not sure any solution will.


If the problem is really just recursive lock calls on the same thread, then PTHREAD_MUTEX_RECURSIVE should do the trick?


Generally PTHREAD_MUTEX_RECURSIVE is a band-aid, not a solution.


Could you explain? A simple drive-by condemnation doesn't tell us why, or give us any reason to agree with you.


They mean you don't know and can't reason about lock state. To design concurrent systems with good performance and correctness, you need to be able to reason about lock state statically.

Here's some random article I found in 30 seconds of googling that seems to go into more depth on the subject: https://blog.stephencleary.com/2013/04/recursive-re-entrant-...


But you can reason about lock state statically with recursive locks.

I have a function A, that accesses and modifies a data structure X. I have another function, B, that modifies X, but that also calls A to do part of its work. And I have a function C that calls A, but never accesses X except through A.

I can perfectly well statically reason about a recursive mutex that both A and B take to protect access to X. It's not magic.


I mean it exists for a reason, there are some rare cases where it might be the right tradeoff to use it. However, these cases are pretty rare: what’s far more likely is that the critical region is not considered well and the recursive mutex is a patch for “I don’t really know what’s going on here, let’s just use the thing that works”. You can form your own opinion on this but I actually find it common for people to avoid designs like yours specifically to avoid the recursive lock; usually the alternative is about as ergonomic and it reduces the need to reason about recursive locking.



If you find yourself needing recursive locks your design is wrong.

They were invented as a dare.


That's not any easier than RAII, although I wish C++ had the lock syntax from C#

lock (object) { // do stuff }


That's commonly done in C++ with macro, but the lock(object) syntax doesn't prevent doing stuff with object without locking the mutex, while the synchronized pattern does.


How does it handle exceptions?


Internally it uses a MutexLocker which will get destructed as an exception unwinds the stack. The destructor unlocks the mutex.

https://github.com/SerenityOS/serenity/blob/master/Kernel/Lo...

https://github.com/SerenityOS/serenity/blob/master/Kernel/Lo...


maybe add SerenityOS to the title? Almost sure that lock_guard is the ideomatic way to do this on cpp, not buying the "forget to lock".


It’s a general C++ trick.


FWIW, I've encountered "forget to lock" in production C++.


If you want easier concurrency, don't use mutexes, they're a bad pattern.

It's well-established that the safe way to do concurrency is to use channels.


A mutex allows you to move potentially infinite amount of memory in one locked operation in constant time between threads.

Channels imply immutable data structures so you're copying data into buffers and then synchronizing to send it.

BEAM actually copies the data between processes (lightweight threads) heaps.

I don't think it's always the case that channels are superior to mutexes in every case.

I personally try write and understand lockfree algorithms and I've tried avoiding mutexes but I'm not using channels but things similar to left-right concurrency control.


I'm not going to argue for mutex vs channel, but channels don't require immutable data and copying. You can forward unique_ptrs through channels and pass ownership for example, or you can forward async operations to perform remotely instead.


You can hand off data with channels without any copy.

That you think mutexes work in constant time is funny. How long it takes to enter a mutual exclusion section is entirely non-deterministic and unbounded.


Once the mutex is acquired, you can move any amount of data with a pointer. This is what I meant.


While the mutex is held, everybody else that is in contention is waiting, possibly forever.

It's a terrible paradigm not suitable for any serious systems programming, especially if you have any sort of real-time requirements.


mutexes are a primitive that guarantee access to a resource is synchronized, and only one actor is accessing it at a time

this necessarily requires actors in contention to wait, possibly forever

it is a fundamental paradigm that is used extensively throughout every kind of programming, even in real-time systems


It isn't used in real-time systems no, it's fundamentally incompatible.

Check out wait-free algorithms.


synchronization requires serialized access, regardless of the nature of the consumer system (real-time or not)

wait-free algorithms might use spin-locks instead of typical mutexes, but spin-locks aren't like better or faster than mutexes, they just replace syscall/io waits with hot waits


Channels are a good abstraction for message passing and high-level requests in general, which happen a lot in web services for example. But there are other domains (like kernels) and there is not one way to do concurrency. If you need low-latency and small-scope data structure access, you can't afford to synchronize with another worker and have the request processed in a different context. Fine-grained locking (or potentially lock-free concurrency) is much faster here.


That can work well for many programs, but no one has succeeded in writing a multithreaded kernel that way.


I think that DragonflyBSD makes good use of message passing. But sometimes a mutex in the slow path is indeed the easiest solution.


There are no mutexes in any of the major kernels. Mutexes are by definition a userland construct.


> There are no mutexes in any of the major kernels

https://docs.kernel.org/locking/mutex-design.html#when-to-us...

> Unless [specific conditions] ... always prefer [mutexes] to any other locking primitive.


I wouldn't call those mutexes myself, but fair enough, I suppose if linux uses this terminology, then that's what people would call it.

But this sort of pattern isn't valid everywhere in a kernel. For example things handling interrupts etc wouldn't be able to use it.

Also for performance reason you probably don't want to use that for synchronization for anything performance-critical, where you'd try to use something like RCU instead. Probably mostly convenient for initial setup.


What's the problem with mutexes?


mutexes are inherently single-threaded code, it's exclusive. It's a point where all of your threads have to serialize and run one-at-a-time to perform a certain operation.

additionally, common implementations often have bad behavior as the number of threads increases. spinlocks don't work well if you have 32 threads trying to take the same lock, they can consume quite a lot of CPU time just trying to take the lock. they don't work at all if you have thousands of threads (which is a situation that occurs regularly in GPGPU programming!).

you can of course increase the number of locks but that's basically what SQL/RDBMS does with row locks - and now you have the problem of deadlock too.

obviously it all depends on specifics - how many threads doing how much work on their own, compared to how many locks they're trying to take. it's not that they are inherently bad, they are in fact one of the basic primitives in concurrent programming really. they just don't really scale well with increased concurrency.

unfortunately, the deeper solutions involve restructuring your program and your data processing (or the data itself) to expose higher levels of concurrency and less interdependency, which is its own can of worms.


If you have to ask, I suppose you shouldn't touch anything that involves concurrent or parallel execution.


channels almost always use mutexes internally


A pattern for easier slower concurrency as lambdas aren't free. If the code under the mutex is trivial, this cost of using lambdas will be noticeable, especially in fast paths. Might be worth looking at the resulting assembly first if you are looking to adapt this in your codebase.


Lambdas are free. Unless the class type-erases the callback (and MutexProtected doesn't), and the callback is not huge, the compiler is virtually guaranteed to inline it.


It can go both ways.

Clang will inline everything - https://godbolt.org/z/v19P1W9sj

MSVC will not, even with O2 - https://godbolt.org/z/naovTxncs

> Might be worth looking at the resulting assembly first if you are looking to adapt this in your codebase.


AFAIK if you use a templated type for the callback (as you should), it will be inlined by modern compilers at any optimization level other than -O0. :)


Aside from lambdas being cheap abstractions in C++, any code under a mutex is likely not the fast path anyway.


I mean, some code is all slow paths ;)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: