This is how Rust’s mutexes work as well - albeit using RAII rather than a callback, but with the same fundamental innovation of putting the data inside the mutex object. I’ve used it in my C++ code as well; it’s a great pattern.
Crucially the Rust borrow checker means using this wrong from (safe) Rust is impossible, whereas it's relatively easy to use either C++ approach wrongly, especially if nobody has actually sat down and explained why you're doing it. In C++ we can "just" leak the reference to the protected data, in (safe) Rust of course the borrow checker means that programs with this mistake won't build.
Rust's ZSTs (~ unit types have zero storage size) make it reasonable to write Mutex<MyMarker> and then require that people show you their MyMarker (which they can only get at by holding the lock) to do stuff thus enforcing locking even when it's not really data you're protecting. Because these are zero size, the resulting machine code is unchanged but the type checking means if you forget to lock first your program won't even build, which shifts locking mistakes very hard left.
Can you give an example of using a ZST as a marker in a Mutex? I think I understand that you're suggesting this as a way of locking some section of code even if it's not a specific piece of data that you're locking, but I'm wondering how this could "enforce locking" then, since anything you do while locked could just as easily be accidentally done without the lock, right?
The idea is we wrap code that ought to only run with a lock as a function, and we define the function such that one of its required arguments is a reference to the Marker type. When somebody tries to write code to call these functions, they're reminded by its signature that they'll need the Marker type, and the only way to get it is to lock some Mutex provided for the purpose.
You're correct that if they're able to do all the stuff which needs a lock themselves without calling these functions, they can hurt themselves, and if that's easy enough they might do it by accident. However it's not often that we've got something that's dangerous enough to need protecting this way and yet also so easy that people would rather re-implement it wrongly than read how to take the lock.
Rust's POSIX stdio features are protected this way, and obviously on some level you could bypass all that, and unsafely cause write syscalls with the right file descriptor number but like, I'd guess it'd take me an hour to write all that, whereas println!() and co. are right there in front of me, so...
You would have an `fn do_something_that_requires_held_lock<'a>(marker: ZstMarker<'a>)`. To call the function, you need a value of type `ZstMarker<'a>`, and you would structure your API such that the only way of getting that value is by locking the mutex.
Crucially, the lifetime parameter `'a` ensures that the marker can't outlive the muted being unlocked.
One example of this pattern is the `Python<'py>` marker in the `pyo3` library (https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html), which represents holding the Python GIL. The internals of `pyo3` do lots of `unsafe` FFI with the Python runtime, but the API exposed by the library is safe thanks to this pattern.
Neat idea. Though I think having to pass a lambda for anything you want to do with the fields is awful ergonomics.
Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.
No, but Boost is one giant bear of dependency with lots of duplication with STL. Personally I prefer to use tiny includable file / lib for specific functionality that I need rather than keep that bear on my real estate.
Boost is not monolithic, nor are any of the sub-parts especially huge (like ... asio is big because an asynchrony system is a big project; regex is small because it can be). Nor, as other commenters have pointed out, is including those dependencies particularly onerous; many are single-file or just a handful of files.
I suspect your criticism has more to do with "one giant bear of dependency". Which: fair enough. Getting Boost set up is a pain; the pain does not reduce if you only want to use one sub-part of it, because you still have to use b2, their custom build language and associated cmake-ish system (yuck!) to prepare it (there are ways to avoid this, but they are not well-supported). Even CMake's (pretty good!) Boost discovery/inclusion support can only conceal that weirdness to a point, and often breaks in unexpected environments--well, breaks more often than the average CMake setup, which breaks a lot...
Similarly, getting your IDE to understand boost's presence can be tricky since many IDEs (looking at you VS) have overly-opinionated ways of incorporating it that don't jive well with how a CLI/portable project build system pulls it in.
But the first and second paragraphs aren't the same thing. Initial setup pains are real and suck, but the initial setup pains and the overhead/monolithic-ness of actually using boost in your projects once it's set up are another. Initial setup pain is largely a one-time or infrequent cost, and the infrequency of that cost should be weighed against the not-inconsiderable convenience of Boost as a tool.
Not saying it's universally worth it; for some projects (especially ones that need to be built on a wide variety of environments, though this is rarer than most people, even some project authors, think) it's not appropriate. But many of the standard criticisms of Boost's runtime utility are specious.
I'll point out that if you're using header-only portions of Boost, you don't have to build Boost. Lots (most?) of Boost is header-only. You can just download the tarball and include headers out of it. It's barely even a dependency. Of all the deps my C++ app has, Boost is the easiest; I just download and extract and it's done, no build step. I don't need the actual compiled library; only the headers.
Using only the header-only stuff is super easy; there's no deciphering or extracting at all. You just include the headers and _don't_ link the Boost library and see if it builds. If it doesn't, then I guess that wasn't a header-only library. You're not building or installing Boost at all; you're just downloading a copy of the headers. Only the template instantiations you actually use will end up in your binary.
(This sounds flippant but it's literally how I use boost in real life. Why decipher when the compiler can just tell you?)
not to mention that the documentation is generally utterly awful, with mostly only reference-style function documentation without exposition but manually distributed across lots of small 1-page paragraphs.
Doxygen was a great step forwards when it came out, but C++ documentation doesn’t seem to have evolved since, and it only fulfills the “reference” pillar of documentation.
Rust’s books (mdbook?) are amazing. Lots of libraries have good, clear documentation explaining how to use the library, on top of the automatic docs.rs output (which I still sometimes find difficult to navigate, but think is just my incomplete understanding). I have no idea how the community has managed to consistently achieve this.
I once tried to use a library that used boost and since I didn’t want to require my code to need the entirety of boost (which is gigabytes in size!) I tried to extract just the parts of boost that were needed. There are so many interdependencies between sublibraries of boost that after about two hours I decided I will never again use any library that relied on boost.
That's how folly::Synchronized works, it supports both the callback interface as MutexProtected, and an interface that returns a RAII lock holder that gives access to the underlying object. It is generally the preferred synchronization utility at Meta and thus it is widely used.
As someone who had to occasionally write C++ in FBCode but not enough to really feel comfortable with it, `folly::Synchronized` and its straightforward semantics were really nice.
When using locks I'm more paranoid about misusing the lock and don't mind typing a bit more to make the code obvious. Also, it seems that good auto-completion for that pattern is simple to achieve.
It’s a fair price to pay, but I think the suggested change is better because it makes you pay less for the same benefit. It still would be obvious that there’s a lock, but the code would IMO be simpler:
> because it makes you pay less for the same benefit.
Requiring a lambda capturing stuff and be passed around is not what I would call "pay less", when the alternative is just adding a block and instantiate a std::lock_guard
> Neat idea. Though I think having to pass a lambda for anything you want to do with the fields is awful ergonomics.
> Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.
auto x = state.with([](auto& state) { return state.x; });
Becomes this:
auto x = state.locked()->x;
But it also creates this very accessible footgun:
auto& x = state.locked()->x;
Where it's way too easy to bind a reference to something that should be protected by the lock, but now isn't. So I'm not sure this is a great idea anymore.
> But it also creates this very accessible footgun:
Well, you said it yourself in the article though. There's always ways around the locking; C++ doesn't really give you the full ability to guarantee a field is locked when access. You'll need to either trust the users to some extent or use a style guide to disallow the pattern (I'd suggest only allowing use of auto x = state.locked() to avoid lifetime questions around state.locked()->x). You'd need to use compiler annotations to get any better.
Because now you're holding a reference to `x` which is supposed to be protected by a mutex, even after the mutex is unlocked.
With the lambda-only API, it's much harder to make this mistake, since a temporary reference like this will still go out of scope at the end of the lambda expression.
You specifically mentioned that this is a footgun:
> auto& x = state.locked()->x;
But I don't see how the reference here is gonna make a difference unless i am reading the lifetime of the lock here incorrectly. For example, this is perfectly fine right?
```
{
auto& x = state.locked()->x;
}
```
This will only be a problem if you have an outside struct that holds a reference
Holding the reference to a field that is protected by a mutex implies there is another thread out there that will race with your reference in either reading or writing it.
Even just a read is racy, as there is no “atomic read” of any size value if it is not already wrapped as atomic.
Yeah, the syntax/ergonomics is tortured. This is my problem with modern C++ in general. As someone who isn’t a full-time C++ programmer (but who
maintains a library written in C++), I have to shy away from anything too fancy as the syntax is too baroque and hard to recall.
> Yeah, the syntax/ergonomics is tortured. This is my problem with modern C++ in general.
I disagree. I mean, the ergonomics in this case are indeed awful, but this has nothing to do with C++, modern or not. This is an API design problem, possibly made worse by people trying to be too clever for their own good.
I don't agree. It sounds like a higher level path to deadlocks.
> Maybe instead combine the two ideas. MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.
It sounds like you're trying to invent std::lock_guard with extra steps.
> It sounds like ... std::lock_guard with extra steps.
I'm not the previous poster, but I think they are still suggesting that the value not be accessible until the lock is acquired.
Something like this
(Apologies, I haven't done C++ is a long time and this is just off the top of my head. It's based on the example from the lock_guard example for the sake of comparison):
guardthing<int> guarded_value;
void safe_increment()
{
auto lock = guarded_value.lock();
lock.value += 1;
std::cout << "value: " << value << "; in thread #"
<< std::this_thread::get_id() << '\n';
// the mutex in guarded_value is automatically released when lock
// goes out of scope
}
> MutexProtected<T> but can be locked with MutexProtected<T>::lock() which returns a MutexLocked<T> object. That object then cleans up the lock when it goes out of scope, and also provides direct access to the enclosed type.
Another feature of the MutexProtected implementation in SerenityOS is the ability to either obtain an exclusive lock (writable) or a shared lock (read-only). A shared lock provides a const reference to the protected value and therefore enforces read-only access semantics at compile-time by leveraging the C++ type system (specifically, const-correctness).
The initial PR introducing the ancestor to MutexProtected has some details on the motivation behind it and also unearthed a bunch of incorrect locking that the C++ compiler caught when introducing it: https://github.com/SerenityOS/serenity/pull/8851
I added an optional template Validator parameter that can be used to double-check invariants on the protected fields.
I also declared a Protected with condition subclass. Here is one example of how I use it to implement a "barrier" of sorts (the operation instance blocks in the destructor until execution of all callables passed to "Add" has completed): https://github.com/alefore/edge/blob/master/src/concurrent/o...
I found it slightly preferable over Abseil annotations; I think it's slightly more robust (i.e., make errors, like forgetting an annotation, less likely) and I like making this explicit in the type system.
One of the nice things about doing it in Kotlin rather than C++ is the clean syntax using trailing lambda blocks and anonymous objects:
private val state = Locker(object {
var value = 1
var another = "value"
})
val nextValue = state.locked { ++value }
This works because Kotlin lets you define anonymous types that can take part in type inference even though they're not denotable, and then use them as lambda receivers. So "this" inside the final block points to the unnamed singleton.
It has a few nice features:
• It's runtime overhead free. Kotlin frontend can inline the "locked" call so you end up with the equivalent of manual locking.
• There's no way to get to the state without holding the lock unless you deliberately leak it outside the lambda.
• You can control re-entrancy.
• It tells the compiler the lambda is only invoked once, which has various benefits for allowing more natural code constructs.
These days I'd probably code it with an explicit ReadWriteLock so it's Loom compatible and to allow explicit read/write blocks, but the JVM optimizes locks pretty well so if there's never any contention the memory for the lock is never allocated. I'd also experiment with making it value type so the overhead of the Locker object goes away. But it was never necessary so far.
[edit: deleted first half of my comment - it was just duplication of existing top comment]
-----
Even better, when possible (it isn't always), is to avoid using mutexes at all, and instead have the data structure owned by a relevant thread. When you need to access a data structure, pass a message to that thread, and then just access the data structure freely when you get there. (Of course the inter thread queue uses a mutex or some other synchronisation mechanism, and the queue itself effectively acts as a synchronisation mechanism.)
I’ve been toying with these lately, too. My solution was to give each actor an executor where it will run:
actor<int> foo(stlab::default_executor, “name”, 42);
auto f = foo.send([](auto& x){
std::cout << x << ‘\n’; // prints 42
});
Each call to ‘send’ returns a stlab::future which can have a continuation chained, etc.
The nice thing about these is the executor based interface is generic. The actor could be dedicated to a thread, run on an OS-provided thread pool, etc.
At the time the actor is running, the thread it is on is temporarily given the name of the actor to ease debugging.
Almost. If each thread were an Actor (I think that's what you're saying?) then that would structure the code around each type of work e.g. here is the code for all the database operations. That's not a very useful way to group things for the developer. The way I wrote it above, the code is structured around a single activity across all the types of work it has to do.
> (Of course the inter thread queue uses a mutex or some other synchronisation mechanism, and the queue itself effectively acts as a synchronisation mechanism.)
I'd suggest using MPSC queues for this purpose (assuming the recipient is a single thread rather than a pool).
In particular it explains why providing getters or operator* like some other implementation do (not in this case) is not a feature, and it does not use multiple classes (like the linked implementation of SerenityOS does) making the implementation simpler.
Also the "simple" implementation is just 20 lines of code...
folly::Synchronized<T> provides a similar pattern. It works well for very simplistic concurrency schemes.
struct { int a; int b; } A;
folly::Synchronized<A> syncA;
int c;
{
auto aReader = syncA.rlock();
c = aReader->a;
// aReader going out of scope drops the rlock
}
{
auto aWriter = syncA.wlock();
aWriter->b = 15;
// aWriter going out of scope drops the wlock
}
I would argue that the pseudo-pointer accessor objects are more usable than passing lambdas to a `with()` method, but if you want that, folly::Synchronized provides similar `withWLock()` and `withRLock()` methods.
This is basically one of the ways to do RAII in GC based languages, only provide the resources via lambdas, thus kind of simulating arena like resource managment, the optimizer will take care of inlining them anyway.
I thought this was cool, and I really appreciate the low-level bottom-up description/introduction of how MutexProtected helps keep things safe.
I would have loved also getting a disassembly showing the resulting code, it's very hard for me (as an only-sometimes C++ programmer) to guess how the lambda is compiled.
If you want an mt-safe data structure, then may as well just build it that way in the first place, with a private std::mutex and getters/setters that use std::lock_guard.
In truth, I prefer using explicit lock and unlock operations like in C programming. The visibility of these operations aids in understanding the interdependencies within the code.
I could understand this argument against the RAII-style destructor unlock, but this lambda approach seems actually better in the visibility sense since the mutex access is a separately indented block.
An indented scope (that starts with a scoped_lock or some appropriately named function) is more explicit than lock/unlock calls interspersed in the code.
this lambda approach seems actually better in the visibility sense since the mutex access is a separately indented block.
When I'm writing multithreaded code, the first line of any block containing a RAII lock is the lock. Any subsection of code that needs another lock gets its own block.
In C++, RAII locking/unlocking also handles exceptions thrown across the lock seamlessly. Otherwise all lock work would have to be try/catch wrapped and handled correctly.
Shouldn't downplay sharing that idea, but come on: Realizing Python-like-with-contexts (and yes, even Python isn't the inventor of that..) in C++ with lambdas is a general pattern (think of scenarios like db transactions, or also stuff where RAII pattern would be useful, except that you need a result from the destructor, or it could even fail..) that shouldn't be too new.. Since C++11 when lambdas where introduced, to be precise?
"Let's completely violate encapsulation by operating on Thing with random lambdas instead of Thing methods---oh but look how nicely the lambdas are forced to run under Thing's lock!"
> However, you can’t access the T directly! The only way we’ll let you access the T is by calling with() and passing it a callback that takes a T& parameter.
and, secondly, how does use() acctually use the field? If access is not allowed without using with, then use() has to use with also, and another callback, leading to infinite regression.
My understanding is that Serenity has a philosophy of not using dependencies outside of itself. Perhaps this type drew from external inspiration, but was ultimately written in light of that philosophy?
Well, std::atomic is wholly unrelated. MutexProtected seems fundamentally similar to boost and folly synchronized objects, just without the ergonomics and battle-testing the other major libraries have already gotten.
std::atomic really has different use cases. It is meant to be used for objects that can be modified via atomic RMW primitives and require T to be trivially copyable. While you can in principle implement arbitrary operations on top of those RMWs, it might not be the best fit.
For large Ts, is indeed implemented using a mutex or, more likely, a lock pool, so I guess if you squint hard enough it can be considered related.
Nice pattern, but it only solves the problem of having to use a mutex, which is almost never the problem. Deadlocks are way more nastier to handle.
In order to solve deadlocks, I created a mutex that first does a try lock, then if that fails it unlocks all mutexes of the thread and then relocks them in memory order.
Not very efficient some times, especially if there are many mutexes involved (i.e maybe it does not scale up), but it allows me to have algorithms not deadlock.
Perhaps it could be made more efficient with various tricks, I may have to do research on that idea...
I am working on a left-right concurrency control with multiple threads. You shard your data and define a merge operation that is commutative so you can always merge data of different threads. By sharding data by thread each thread can always read and write to its own shard without synchronization or contention.
I also have a global snapshot per thread which is the aggregated data across all threads that is local to a thread.
So it requires 4× the memory per thread but it is extremely performant due to each thread is independent of every other thread
Or you could give `MutexLocker` pointer semantics to access the underlying object only through the lock, which would probably be more idiomatic c++ anyways.
Indeed, with lambdas you don't need to deal with locks outliving the underlying object (though I think std::unique_lock has the same issue with the underlying mutex?)
The downside with lambdas is that you cannot lock across function boundaries e.g. give ownership of a lock to the caller.
There's a similar tradeoff for all "context managers" (in Python parlance) where you could either CPS your way around the control flow (the lambda approach) or use c++'s RAII mechanism.
My impression is that RAII feels a bit more idiomatic and puts the burden on library writers instead of users.
Individual accesses are typically the wrong level of abstraction for synchronization. You’ll avoid simple data races but it’s easy to have accidental reentrancy this way.
They mean you don't know and can't reason about lock state. To design concurrent systems with good performance and correctness, you need to be able to reason about lock state statically.
But you can reason about lock state statically with recursive locks.
I have a function A, that accesses and modifies a data structure X. I have another function, B, that modifies X, but that also calls A to do part of its work. And I have a function C that calls A, but never accesses X except through A.
I can perfectly well statically reason about a recursive mutex that both A and B take to protect access to X. It's not magic.
I mean it exists for a reason, there are some rare cases where it might be the right tradeoff to use it. However, these cases are pretty rare: what’s far more likely is that the critical region is not considered well and the recursive mutex is a patch for “I don’t really know what’s going on here, let’s just use the thing that works”. You can form your own opinion on this but I actually find it common for people to avoid designs like yours specifically to avoid the recursive lock; usually the alternative is about as ergonomic and it reduces the need to reason about recursive locking.
That's commonly done in C++ with macro, but the lock(object) syntax doesn't prevent doing stuff with object without locking the mutex, while the synchronized pattern does.
A mutex allows you to move potentially infinite amount of memory in one locked operation in constant time between threads.
Channels imply immutable data structures so you're copying data into buffers and then synchronizing to send it.
BEAM actually copies the data between processes (lightweight threads) heaps.
I don't think it's always the case that channels are superior to mutexes in every case.
I personally try write and understand lockfree algorithms and I've tried avoiding mutexes but I'm not using channels but things similar to left-right concurrency control.
I'm not going to argue for mutex vs channel, but channels don't require immutable data and copying. You can forward unique_ptrs through channels and pass ownership for example, or you can forward async operations to perform remotely instead.
You can hand off data with channels without any copy.
That you think mutexes work in constant time is funny. How long it takes to enter a mutual exclusion section is entirely non-deterministic and unbounded.
synchronization requires serialized access, regardless of the nature of the consumer system (real-time or not)
wait-free algorithms might use spin-locks instead of typical mutexes, but spin-locks aren't like better or faster than mutexes, they just replace syscall/io waits with hot waits
Channels are a good abstraction for message passing and high-level requests in general, which happen a lot in web services for example. But there are other domains (like kernels) and there is not one way to do concurrency. If you need low-latency and small-scope data structure access, you can't afford to synchronize with another worker and have the request processed in a different context. Fine-grained locking (or potentially lock-free concurrency) is much faster here.
I wouldn't call those mutexes myself, but fair enough, I suppose if linux uses this terminology, then that's what people would call it.
But this sort of pattern isn't valid everywhere in a kernel. For example things handling interrupts etc wouldn't be able to use it.
Also for performance reason you probably don't want to use that for synchronization for anything performance-critical, where you'd try to use something like RCU instead. Probably mostly convenient for initial setup.
mutexes are inherently single-threaded code, it's exclusive. It's a point where all of your threads have to serialize and run one-at-a-time to perform a certain operation.
additionally, common implementations often have bad behavior as the number of threads increases. spinlocks don't work well if you have 32 threads trying to take the same lock, they can consume quite a lot of CPU time just trying to take the lock. they don't work at all if you have thousands of threads (which is a situation that occurs regularly in GPGPU programming!).
you can of course increase the number of locks but that's basically what SQL/RDBMS does with row locks - and now you have the problem of deadlock too.
obviously it all depends on specifics - how many threads doing how much work on their own, compared to how many locks they're trying to take. it's not that they are inherently bad, they are in fact one of the basic primitives in concurrent programming really. they just don't really scale well with increased concurrency.
unfortunately, the deeper solutions involve restructuring your program and your data processing (or the data itself) to expose higher levels of concurrency and less interdependency, which is its own can of worms.
A pattern for easier slower concurrency as lambdas aren't free. If the code under the mutex is trivial, this cost of using lambdas will be noticeable, especially in fast paths. Might be worth looking at the resulting assembly first if you are looking to adapt this in your codebase.
Lambdas are free. Unless the class type-erases the callback (and MutexProtected doesn't), and the callback is not huge, the compiler is virtually guaranteed to inline it.
AFAIK if you use a templated type for the callback (as you should), it will be inlined by modern compilers at any optimization level other than -O0. :)