"Using Rust in Mercurial
This page describes the plan and status for leveraging the Rust programming language in Mercurial.
Why use Rust?
Today, Mercurial is a Python application. It uses Python C extensions in various places to achieve better performance.
There are many advantages to being a Python application. But, there are significant disadvantages.
Performance is a significant pain point with Python. There are multiple facets to the performance problem:
General performance overhead compared to *native* code
GIL interfering with parallel execution
It takes several dozen milliseconds to start a Python interpreter and load the Mercurial Python modules. If you have many extensions loaded, it could take well over 100ms just to effectively get to a Mercurial command'
;s main function. Reports of over 250ms are known. While the command itself may complete in mere milliseconds, Python overhead has already made hg seem non-instantaneous to end-users.
A few years ago, we measured that CPython interpreter startup overhead amounted to 10-18% of the run time of Mercurial'
;s test harness. 100ms may not sound like a lot. But it is enough to give the perception that Mercurial is slower than tools like Git (which can run commands in under 10ms).
There are also situations like querying hg for shell prompts that require near-instantaneous execution.
Mercurial is also heavily scripted by tools like IDEs. We want these tools to provide results near instantaneously. If people are waiting over 100ms for results from hg, it makes these other tools feel sluggish.
There are workarounds for startup overhead problems: the CommandServer (start a persistent process and issue multiple commands to it) and CHg (a C binary that speaks with a Mercurial command server and enables chg commands to execute without Python startup overhead). chg'
;s very existence is because we need hg to be a native binary in order to avoid Python startup overhead. If hg weren'
;t a Python script, we wouldn'
;t need chg to be a separate program.
Python is also substantially slower than native code. PyPy can deliver substantially better performance than CPython. And some workloads with PyPy might even be faster than native code due to JIT. But overall, Python is slower than native code.
But even with PyPy'
;s magical performance, we still have the GIL. Python doesn'
;t allow you to execute CPU-bound Python code on multiple threads. If you are CPU bound, you need to offload that work to an extension (which releases the GIL when it executes hot code) or you spawn multiple processes. Since Mercurial needs to run on Windows (where new process overhead is ~10x worse than POSIX and is a platform optimized for spawning threads - not processes), many of the potential speedups we can realize via concurrency are offset on Windows by new process overhead and Python startup overhead. We need thread-level concurrency on Windows to help with shorter-lived CPU-bound workloads. This includes things like revlog reading (which happens on nearly every Mercurial operation).
In addition to performance concerns, Python is also hindering us because it is a dynamic programming language. Mercurial is a large project by Python standards. Large projects are harder to maintain. Using a statically typed programming language that finds bugs at compile time will enable us to make wide-sweeping changes more fearlessly. This will improve Mercurial'
;s development velocity.
Today, when performance is an issue, Mercurial developers currently turn to C. But we treat C as a measure of last resort because it is just too brittle. It is too easy to introduce security vulnerabilities, memory leaks, etc. On top of vanilla C, the Python C API is somewhat complicated. It takes significantly longer to develop C components because the barrier to writing bug-free C is much higher..."