Skip to content

Instantly share code, notes, and snippets.

@lifthrasiir
Last active September 27, 2017 14:17
Show Gist options
  • Select an option

  • Save lifthrasiir/b9f8ff94d0431cbe4e549eccba77c855 to your computer and use it in GitHub Desktop.

Select an option

Save lifthrasiir/b9f8ff94d0431cbe4e549eccba77c855 to your computer and use it in GitHub Desktop.
Why is a Rust executable large? (DRAFT)

Suppose that you are a programmer primarily working with compiled languages. Somehow you’ve got tired of those languages, there may be multiple valid reasons, and heard of a trendy new programming language called Rust. Looking at some webpages and the official forum, it looks great and you decides to try it out. It seems that Rust was a bit cumbersome to install in the past, but thanks to rustup the problem seems gone by now. Cargo seems to be great, so you follow the first sections of the Book and put a small greeting to the new language:

fn main() {
    println!("Hello, world!");
}

Amazingly cargo run runs without a hassle. It is kind of a miracle as you used to configure the build script, Makefile, projects or whatever before building things. Impressed, you realize that the executable is available in target/debug/hello. You instinctively type ls -al out (or is it dir?) and you cannot believe your eyes:

$ ls -al target/debug/hello
-rwxrwxr-x 1 lifthrasiir 650711 May 31 20:00 target/debug/hello*

650 kilobytes to print anything?! You remember that Rust is probably a sole language that may possibly displace C++, and C++ is noted of the code bloat; would that mean Rust failed to fix one of C++’s big problems? Out of curiosity, you make the same program in C and compile it. The result is eye-opening:

$ cat hello-c.c
#include <stdio.h>
int main() {
    printf("Hello, World!\n");
}
$ make hello-c
$ ls -al hello-c
-rwxrwxr-x 1 lifthrasiir 8551 May 31 20:03 hello-c*

Maybe C has a benefit of having bare-metal libraries, you think. This time you try a C++ program using iostream, which should be much safer than C’s naive printf. But surprisingly it still seems tiny compared to Rust:

$ cat hello-cpp.cpp
#include <iostream>
using namespace std;
int main() {
    cout << "Hello, World!" << endl;
}
$ make hello-cpp
$ ls -al hello-cpp
-rwxrwxr-x 1 lifthrasiir 9168 May 31 20:06 hello-cpp*

What is wrong with Rust?


It seems that the surprisingly large size of Rust binary is a massive concern for many. This question is by no means new; there is a well-known, albeit year-old, question on StackOverflow, and searching for “why is rust binary large” gives several more. Given the frequency of such questions, it is a bit surprising that we don’t yet have a definitive article or page dealing with them. So this is my attempt to provide one.

Just to be cautious: Is it a valid question to ask after all? We have hundreds of gigabytes of storage, if not some terabytes, and people should be using decent ISPs nowadays, so the binary size should not be a concern, right? The answer is that it still may matter (though not much as before):

  • Akamai State of the Internet shows that, while more than 80% of users enjoy 4Mbps or more in developed countries much less users do in developing countries. The average connection has been improved much (almost every country is now past 1Mbps average), but the entire distribution is still stagnating. I was fortunate that I’m in the country where gigabit ethernet only costs $30/mo (!), but many others may not.

  • Ordinary consumers have only shallow understanding of computing, and they likely to relate any problem with anything they know of: one of the common sentiments is that the executable bloat causes slowdown. That’s unfortunate but true, and you would want to avoid that sentiment.

For wondering readers: All examples are tested in Rust 1.9.0 and 1.11.0-nightly (a967611d8 2016-05-30). Unless noted, the primary operating system used is Linux 3.13.0 on x86-64. Your mileage may vary.

Optimization Level

If one were asked about the above, virtually every experienced Rust user would ask you back:

Have you enabled the release build?

It turns out that Cargo distinguishes the debug build (default) from the release build (--release). The Cargo documentation explains the exact differences between them, but in general the release build gets rid of development-only routines and data and enables tons of optimization. It is not default because, well, the debug build is more frequently requested than the release build. So let’s try that!

$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 646467 May 31 20:10 target/release/hello*

And that didn’t really make a difference! This is because the optimization is only run over the user code, and we don’t have much user code. Almost all of the binary is from the standard library, and that doesn’t seem to be what we can do anything...

Link-time Optimization (LTO)

…except that we can. Enter the world of link-time optimization.

So the story is as follows: We can individually optimize each crate, and in fact all standard libraries ship in the optimized form. Once the compiler produces an optimized binary, it gets assembled to a single executable by a program called the “linker”. But we don’t need the entirety of standard library: a simple “Hello, world” program definitely does not need std::net for example. Yet, the linker is so stupid that it won't try to remove unused parts of crates; it will just paste them.

There is actually a good reason that the traditional linker behaves so. The linker is commonly used in C and C++ languages among others, and each file is compiled individually. This is a sharp difference from Rust where the entire crate is compiled altogether. Unless required functions are scattered throughout files, the linker can fairly easily get rid of unused files at once. It’s not perfect, but reasonably approximate what we want---removing unused functions. One disadvantage is that the compiler is unable to optimize function calls pointing to other files; it simply lacks a required information.

C and C++ folks had been fine with that approximation for decades, but in the recent decade they had enough and started to provide an option to enable the link-time optimization (LTO). In this scheme the compiler produces optimized binaries without looking at others, and the linker actively looks at them and tries to optimize binaries. It is much harder than working with (internally simplified) sources, and it blows the compilation time up, but it is worth trying if the smaller and/or faster executable is needed.

So far we have talked about C and C++, but the LTO is much more beneficial for Rust. Cargo.toml has an option to enable LTO:

[profile.release]
lto = true

Did that work? Well, sort of:

$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 615725 May 31 20:17 target/release/hello*

It had a larger effect than the optimization level, but not much. Maybe it is time to look at the executable itself.

So what’s in my executable?

There are several tools directly working with the executable, but probably the most useful one is GNU binutils. It is available to every Unix-like systems, and also in the Windows (MinGW has a standalone install for example).

There are many utilities in binutils, but strings is probably the simplest. It simply crawls the binary to find a sequence of printable characters terminated by a zero byte, a typical representation of C string. Thus it tries to extract readable strings out of the binary, quite helpful for us. So let’s try that, and prepare for the scroll:

$ strings target/release/hello | head -n 10
/lib64/ld-linux-x86-64.so.2
bki^
 B ,
libpthread.so.0
_ITM_deregisterTMCloneTable
_Jv_RegisterClasses
_ITM_registerTMCloneTable
write
pthread_mutex_destroy
pthread_self

And, wow, it already has something we haven’t expect, pthread. (More on that later, though.) There are indeed tons of strings in our executable:

$ strings target/release/hello | wc
   5893    7027   94339

94339 printable bytes and 5893 zero bytes (one per line) make almost 100 KB. One sixth of our executable is for strings we don’t really use! At the closer inspection, this observation is not correct as strings also give many false positives, but there are some significant strings as well:

  • Those starting with jemalloc_ and je_. These are names from jemalloc, a high-performance memory allocator. So that’s what Rust uses for the memory management, in place of classic malloc/free. It is not a small library however, and we don’t do the dynamic allocation by ourselves anyway.

  • Those starting with backtrace_ and DW_. These are yet another names from libbacktrace, a library to produce stack trace. Rust uses it to print a helpful backtrace on panic (available with RUST_BACKTRACE=1 environment). We don’t panic however.

  • Those starting with _ZN. They are “mangled” names from Rust standard libraries.

Why do we have those strings at first place? They are debug symbols, which give an appropriate (possibly human-readable) name for otherwise machine-processed binary. Do you remember libbacktrace above? It has to use those debug symbols to print any useful information. Yet, since we are really making a release build we may choose not to include them. (Rust does not have this option by itself, since it is typically stripped by an external utility called strip.) So let’s look at what can be done about them.

Get off my lawn!

So we have three goals: no jemalloc, no libbacktrace, and no debug symbols. I’ve mentioned that strip strips debug symbols, so let’s do that first. Note that strip also comes with binutils, so you can just run that.

$ strip target/release/hello
$ target/release/hello
Hello, world!
$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 347648 May 31 20:23 target/release/hello*

Now that IS smaller! About a half of the entire executable was for debugging symbols. Now that, having stripped our symbols, we cannot have a nice backtrace nor panic recovery:

$ sed -i.bak s/println/panic/ src/main.rs
$ cat src/main.rs
fn main() {
    panic!("Hello, world!");
}

$ cargo build --release && strip target/release/hello
$ RUST_BACKTRACE=1 target/release/hello
thread '<main>' panicked at 'Hello, world!', src/main.rs:4
stack backtrace:
   1:     0x7fde451c1e41 - <unknown>
Illegal instruction
$ mv src/main.rs.bak src/main.rs     # tidy it up

…and it somehow aborted. Probably a libbacktrace issue, I don’t know, but that doesn’t harm much anyway.

We have knocked debug symbols down, now let’s get rid of remaining libraries. A bad news: From this point you are entering the realm of nightly Rust. The realm is not as scary as you think, as it does not break at your face, but it may break in smaller ways (that’s why we have nightlies!). That’s the major reason that we don’t yet have nightly features in stable, they may change. Fortunately features we are going to use have been quite stable and you can probably follow the remainder of this post with more recent nightlies. But for the posteriority, I will stick to a particular nightly version.

A good news: Installing nightlies (either the latest or any specific) is very simple with rustup.

$ rustup override set nightly-2016-05-31     # or just `nightly` if you want
$ cargo build --release
$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 620490 May 31 20:35 target/release/hello*
$ strip target/release/hello
$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 351520 May 31 20:35 target/release/hello*

Okay, the size hadn’t changed much. Let’s knock jemalloc down first---it is well documented in the Book, but the gist is that it just takes two additional lines to change an allocator:

#![feature(alloc_system)]
extern crate alloc_system;

fn main() {
    println!("Hello, world!");
}

And that again does make a difference:

$ cargo build --release
$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 210364 May 31 20:39 target/release/hello*
$ strip target/release/hello
$ ls -al target/release/hello
-rwxrwxr-x 1 lifthrasiir 121792 May 31 20:39 target/release/hello*

Okay! We have down from 600 whooping kilobytes to about 120 KB. Jemalloc indeed is a big library; it really has tons of configuration so that you can fine-tune its performance, and that has to go to somewhere.

Linkage

First, I have to admit that I was cheating with the size of C and C++ binaries. The fair comparison would be as follows:

$ make hello-c CFLAGS='-static'
cc -static    hello-c.c   -o hello-c
$ make hello-cpp CXXFLAGS='-static -static-libstdc++'
g++ -static -static-libstdc++    hello-cpp.cpp   -o hello-cpp
$ ls -al hello-c hello-cpp
-rwxrwxr-x 1 lifthrasiir  877175 May 31 20:10 hello-c*
-rwxrwxr-x 1 lifthrasiir 1653135 May 31 20:10 hello-cpp*

Okay, so it seems that Rust is actually far better than C and C++. But… why is it “fair”? Isn’t an 1 MB executable too much for such a simple program regardless of the language?

A binary executable is not a simple data format. It is normally processed and often altered by an OS routine called a “dynamic linker” (not to be confused a “linker”, that combines assembled binaries into a single executable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment