Unreasonably poor performance with MSVC and Ninja #1420
Replies: 7 comments 11 replies
-
The 5-15% figure comes from builds on Linux. On Windows the overhead is indeed higher and you shouldn't expect similar performance. I think it's slower mainly because executing new processes are slower, but partly also because file systems tend to be slower.
No, there is a big difference: on a cache hit ccache can find a match by just reading files (the direct mode) but on a cache miss it runs the preprocessor once in addition to executing the compiler. As mentioned above, this extra preprocessor call is more costly on Windows in relative terms. However, since version 4.7 the depend mode is available for Windows as well. I suggest trying it out. |
Beta Was this translation helpful? Give feedback.
-
Now I checked ccache.exe binary and it looks like it's built using MinGW :/, can you try to compile ccache using msvc 2022 Release build and do the same tests? Compiling ccache is super easy. Eg.: cmake.exe `
-S O:\Code\c\ccache\ccache `
-B O:\Code\c\ccache\ccache-builds-cmake\release `
-G Ninja `
-D CMAKE_BUILD_TYPE:BOOL=Release `
-D CMAKE_INSTALL_PREFIX:PATH='O:/Code/c/ccache/_install/release'
cmake --build . --target install I would be really curious about perf. difference between MinGW and msvc. |
Beta Was this translation helpful? Give feedback.
-
I'm gonna spam this theard a little bit, I'm sry in advance 🙃 I have very interesting results, I tested it too on my project with ~113 TU and these TU-s are compiling pretty fast what is very important (the quicker the TU-s are compiled, the more accurate the results are between OS-es and different ccache configurations). The conclusion is that on Windows is ccache slower because of the Filesystem has 0% impact on my results, I tested everything also with Redis using But what is most interesting is that msvc is 1-4% faster than Clang 18 with lld linker on Linux if the The
msvc 2022 17.9.7 (ccache folder on 980pro nvme pcie4) and Linux build was Clang 18 with lld linker (ccache folder on 850evo). Build system ResultsRestoring from ccache is ~7s slower on windows. (bottleneck) 🤔 Would be interesting to know how much spawning new processes affects these bottlenecks. At sure spawning a new process is slow on Windows. Instead of just spawning cl.exe, is the ccache.exe spawn, which then spawns cl.exe so 2 spawn processes instead of 1. Question of the day is: Is the msvc preprocessor really that slow? (that is adds 20% to the build?) |
Beta Was this translation helpful? Give feedback.
-
I tried to compile openmw with PCH Debug configuration
This is how I configured: cmake .. -DMYGUI_RENDERSYSTEM=1 -DMYGUI_BUILD_DEMOS=OFF -DMYGUI_BUILD_TOOLS=OFF -DMYGUI_BUILD_PLUGINS=OFF -DCMAKE_INSTALL_PREFIX='E:/tmp/openmw_debug_1' -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DPRECOMPILE_HEADERS_WITH_MSVC=ON
cmake --build . --config Debug Initial ccache overhead 4% which is amazing. |
Beta Was this translation helpful? Give feedback.
-
I want to tell that I'm also using ccache on my CI pipelines, I suppose that the ccache is much slower on these runners and I never examined how much slower it is as I'm expecting that it will be slower. I got to the point when compilation took so long and used so many hours on these GA runners that I had to migrate to self-hosted runners which are hosted on my dev. machine and it was a great move. I recommend migrating to self-hosted also for your project, the benefits are huge. Can look my pipelines here 75% of them are self-hosted and I can invoke all of them as needed using eg. GitHub UI or using |
Beta Was this translation helpful? Give feedback.
-
I've got some new data. GitLab finally changed their base image from MSVC2019 to 2022, so I had to time everything again. None of these builds actually finished in the time limit, so for comparisons, I'll use the number of translation units that every job managed to compile, which was 1030. As I'm using the 4.9.1 release, I can't try the inode cache, and it yells if precompiled headers are used (I guess we must use a flag that's only supported since 4.9.1 landed), so they're off for the ccache tests. All these tests are with Ninja. I wouldn't read too much into small differences as there's noise in the build times.
So it's fair to say that the newer VM images:
If I'd seen these numbers before, I'd probably have just given up instead of bringing it here as a potential bug, which is a big step in a good direction, even if nothing's changed on CCache's end. I'm still left in an annoying situation as if I can't get a build with Ninja to finish in the time limit at all (despite being able to with MSBuild, which should be slower), I'm not going to be able to do it with CCache's initial build overhead. Hopefully if I wait for a new release to happen so I can also use inode cache, and do something on my end so I can get Ninja builds to not be mysteriously slow without CCache, this will become viable. |
Beta Was this translation helpful? Give feedback.
-
You need to debug where the time is spent, I did the same for my build, I did it using measure-Command { ccache debug=true debug_level=1 disable=false depend_mode=false recache=true inode_cache=true cl -c -nologo -Zc:wchar_t -FS -Zc:rvalueCast <...all other build options...> -Fodebug\ O:\Code\c\qMedia\TinyORM\TinyORM\src\orm\basegrammar.cpp } Then I opened
Here we can see that But debug this on CI is even harder, you would need to ssh into run and invoke this compile commands manually and examine where it spends most time, problem or bottleneck can be anywhere. I don't think ccache is doing something wrong, I spent ~1week in ccache code and event it's patched all around because of different compilers (code wasn't designed and correctly refactored for clang-cl, msvc, ...) but still the code has a very good perf. quality, eg. no regex-es (what is great perf. boost) and it's written pretty good from the perf. perspective. But I didn't dive deep into caching and retrieving logic enough. Do perf. tunning would be a good idea, what means enable valgrind and examine results in kcachegrind, but this is for linux and on windows are of course invoked other code branches (I'm also doing this from time to time for my library). On windows isn't that easy. You need to think about better machines that are compiling, you can't expect from the free gcloud instances to be performant enough to compile 1400 TU quickly, they aren't designed for this. It would be interesting to know where the bottleneck is in your case, but I won't even dare to guess what it could be exactly. If you want to here is compiled the latest ccache master branch that contains inode_cache, it's a Release build. |
Beta Was this translation helpful? Give feedback.
-
OpenMW would like to use ccache for our Windows CI builds. We've been using it for other platforms for a long time with great success, and have finally got around to setting it up for MSVC jobs. However, the performance when the cache has not yet been initialised is really poor. Here's some timing data from one of GitLab's Windows runners (a slow two-core VM - I think a Google Cloud n1-standard-2):
When the hit rate is good, the speedup is excellent, but there has to be a successful build to populate the cache first, and because a clean build with ccache takes 2.18 times longer than one without, we can't fit it within the time limit GitLab's CI imposes on us (that's where the 919 files figure comes from - it's what it'd done before the VM was spun down). I managed to coax it through the process by building a quarter of the project at a time, but that's not a viable solution long-term, as cache isn't shared between forks or all branches on a fork.
My understanding is that if everything were working properly, the impact of ccache on a clean build should be around 5-15%, not the 118% I'm seeing. 5-15% would be consistent with the 15.2% cost of the 99.53% hit run compared with a clean non-ccache run, as working out the cache key for a file and copying it if it exists should be about as expensive as working out the cache key for a file and then copying it into the cache because it didn't exist.
One of my jobs to just build part of the project gave the following output for
ccache --show-stats -v
:which doesn't immediately explain to me why it was so much slower than expected.
So far, this has all been done using 4.9.1. I don't see any issues or pull requests discussing bad performance with MSVC/Windows that happened since that release.
Beta Was this translation helpful? Give feedback.
All reactions