This year in LLVM

pjmlp · on Dec 21, 2022

Great overview article, thanks for sharing what happened in LLVM.

Regarding C++ compilation speed, and C++20, mentioned on the article, on VC++ when using C++20 modules, or the C++23 preview with import std, the experience and compile times are already quite good, the major issues being the Windows SDKs and Intellisense not always working (depending on how modules are laid out).

Unfortunely clang is quite behind VC and GCC regarding support for C++ modules.

On the positive note, there are other compilers still catching up with C++17.

menaerus · on Dec 21, 2022

OP doesn't provide any details or methodology which was used to get to the conclusion about larger STL headers being the culprit for build-time regression in only 1 out of 11 cases.

I'm personally not convinced and I wonder why other 10 projects didn't suffer from the same problem?

I would love to see the claim supported and checked by recompiling 7zip with -ftime-trace and then post-processed through ninjatracing to get the nice and detailed flamegraph depicting where exactly the build-time is spent. It's surprising that this hasn't been done already.

Also, I think that the project sample distribution is not convincing. 9 out of 11 are rather very very small and roughly the same sized projects. Remaining two are not very very small but are still very small. So experiments are by definition made biased by not making better care of dataset distribution.

nikic · on Dec 21, 2022

> I'm personally not convinced and I wonder why other 10 projects didn't suffer from the same problem?

This has a simple answer: The C++ projects in the test set are kimwitu++, Bullet, tramp3d-v4 and 7zip. kimwitu++ is built in c++14 mode and Bullet in gnu++98 mode, so they obviously aren't affected. tramp3d-v4 does show some impact, but it's a "single 2MB source file" style program, so STL headers don't dominate compile-time. 7zip is the only program that both uses C++ 17 and doesn't have huge source files, so it's the only one showing the major impact this has.

> I would love to see the claim supported and checked by recompiling 7zip with -ftime-trace and then post-processed through ninjatracing to get the nice and detailed flamegraph depicting where exactly the build-time is spent. It's surprising that this hasn't been done already.

This was a single paragraph in a large blog post, why the heck would I be including a detailed analysis of how I reached that conclusion? On one out of dozens of compile-time regressions I investigated? Something I do all the time, and am likely an expert on?

If you really need to know, this claim was based on comparing callgrind profiles between -std=c++14 and -std=c++17 compilations.

> Also, I think that the project sample distribution is not convincing. 9 out of 11 are rather very very small and roughly the same sized projects. Remaining two are not very very small but are still very small. So experiments are by definition made biased by not making better care of dataset distribution.

I would certainly love a larger testing corpus than CTMark currently provides.

menaerus · on Dec 26, 2022

> This was a single paragraph in a large blog post, why the heck would I be including a detailed analysis of how I reached that conclusion? On one out of dozens of compile-time regressions I investigated? Something I do all the time, and am likely an expert on?

Yes, when you make a questionable claim and you do not provide any evidence or whatsoever on how you reached to that conclusion, you should be ready to be challenged. Especially given the fact that this is a _technical_ blog targeted to people with technical expertise.

You also take the occasion in the article to subtly confuse the uneducated readers by saying

> In particular, the large regression on the right is due to enabling C++17 by default.

as it is some sort of an universal truth that all C++ code bases will risk if they switch to C++17, which I will counterpoint later in my comment with a real-world multi-million LoC C++ project that includes virtually every C++ header in existence (*).

You sort of try to support that claim by saying that bigger sized STL implementation from C++17 is the culprit

> The close to two times slowdown in 7zip O0 builds comes down to STL headers becoming 2-3 times as large in C++17.

and at which point you really got me puzzling because I've worked on C++14 codebases which made the transition to C++17, and some even to C++20, and I never recall discussions taking place because of the build time regression as big as implied by the article. This wouldn't go unnoticed on multi-million LoC projects in a real-world.

But you also continue with a snarky comment such as

> While this is sad, and I had to edit out some choice words on the C++ standardization committee (I hear that this gets even worse in C++20), at least this does not affect non-clang compilers (e.g. Rust) and C code.

You don't provide any constructive criticism nor have I seen you opening the bug for further investigation. All those things create a bad impression and it doesn't seem that your comments are coming from a good place either. It reads as a very biased PoV which is sort of a problem given that you're professionally involved with the work on LLVM, clang and Rust, and along the way you're being paid by RedHat for that type of work.

(*) I've cloned https://github.com/mysql/mysql-server. Whole codebase is roughly around 3M lines of C and C++ code (among other types of source code). I did not do a very scientific experiment because initial results weren't matching what the article has shown but I've ran the RelWithDebInfo builds against {clang, gcc} x {C++14, C++17} and I've repeated that experiment with two different versions of gcc and clang. TL;DR neither clang or gcc build variant do not show a "regression" larger than ~5% in wall-clock time.

1a8a111d8f855a31d0aeffc8f02309b2b82dd410 was an actual point in time when MySQL transitioned from C++14 to C++17, and this commit was used as a base for C++17 builds. 1a8a111d8f855a31d0aeffc8f02309b2b82dd410~1 was used for C++14 builds. No ccache was involved or whatsoever. Everything was dockerized.

Build with gcc-7.3.1: 6m33s (C++14) vs 6m55s (C++17)

Build with clang-5.0.2 3m52s (C++14) vs 3m58s (C++17)

Build with gcc-9.3.1 5m6s (C++14) vs 5m22s (C++17)

Build with clang-8.0.0 4m39s (C++14) vs 4m49s (C++17)

synergy20 · on Dec 21, 2022

its lagging on c++20 support made many tools esp clang-tidy useless for c++20 development

gavinray · on Dec 21, 2022

I count only 4, small features from C++20 not supported by Clang here:

https://en.cppreference.com/w/cpp/compiler_support#cpp20

For core library features, you can just clone GCC from master, and use

  --gcc-toolchain=/usr/local/gcc-dev

To use the latest "libstdc++" and get all the C++23 stdlib features from GCC while benefiting from the LLVM compiler toolchain

I haven't found "clangd" lacking at all for C++20 or C++23 development, but it does require that you configure it properly.

menaerus · on Dec 21, 2022

C++20 language support is almost all done. Even C++23 is nearing to the finish line. I see this type of FUD spread more and more around HN which hasn't been the case before. I wonder why.

eklitzke · on Dec 21, 2022

The language support might be nearly done but libc++ definitely does not have full C++20 support. For example, ranges, a HUGE feature in C++20, are disabled by default in libc++ (as of version 15, the latest stable release). As another example for a feature I want, std::source_location isn't implemented in libc++ even though it's existed in GCC/libstdc++ for many years now. In this example Clang has had the language support for implementing std::source_location for a while, but it hasn't been merged into libc++ for various reasons (last I checked there was a diff on phabricator that implemented it but there was some debate about the ABI that was being introduced, or something of that nature).

I don't think it's necessarily FUD, many C++ users don't understand the difference between core language features and standard library features. And even when you do understand the difference, with many features (e.g. std::source_location) you can't actually use the feature if libc++ doesn't have it, so the fact that clang itself supports the language feature is a moot point and irrelevant to end users.

I saw another comment explaining how to use libstdc++ with clang. Sure you can do this, but who actually wants to compile their code this way? Especially since it means you need to upgrade your compiler and standard library separately, on different release cadences. For most people this is way too much headache, they'd rather just not use the new features and wait until they're in libc++. You can also compile llvm with custom options to enable experimental features that aren't fully developed (like ranges), but who really wants to do this? Building a custom compiler/libc++ is a huge headache and the features are disabled by default for a reason anyway.

Just to be clear I love LLVM/Clang/libc++ and am huge advocates of them. But there's nothing wrong with pointing out flaws and where there are gaps compared to other implementations.

nikic · on Dec 21, 2022

FWIW, if you're on a distro with a default GCC toolchain, it's pretty typical to use clang together with libstdc++. E.g. if you use clang on Fedora, you're using libstdc++ by default. You have to explicitly pass -stdlib=libc++ to use libc++ instead. So I wouldn't consider this a particularly unusual build configuration.

Of course, it's different on distros with a default Clang toolchain, like macOS, where building with anything but libc++ would be pretty unusual.

sagarm · on Dec 22, 2022

I use clang with libstdc++ in C++20 mode. Clang does this by default if you have both clang and g++ installed on Ubuntu.

pjmlp · on Dec 21, 2022

What FUD, clang 15 can't certainly not compile my C++ projects written with C++20 modules, and cppreference clearly shows where it is.

menaerus · on Dec 26, 2022

I wasn't referring to your comment but saying that clang is lagging and is unusable with C++20 while there are only 4 features out of 69 of them in total (!) missing, then yes, by my book this is spreading the FUD.

quelsolaar · on Dec 21, 2022

Im curious how making the type of pointer opaque impacts type aliasing optimization.

meinersbur · on Dec 21, 2022

Not at all. In LLVM-IR, all pointer were always allowed to alias each other, regardless of their type. Type-based no-alias information (as by the frontend language semantics) is added by metadata called TBAA [1].

[1] https://llvm.org/docs/LangRef.html#tbaa-metadata

quelsolaar · on Dec 22, 2022

Thanks for the answer!