# Understanding the role of `SONAME` in dynamic linkage and dynamic loading Note that this is a Linux-specific feature. ## The primary role of `SONAME` Say *we* are a library `libfoo.so.3.5`, and some dependent binary artifact is being linked against us. - Note that this dependent binary artifact may well be a final executable, _à la_ `./main`; - but it may just as well be another library acting as "middleware", such as `libbar.so -> libfoo.so…`. Such library may be a shared/dynamic library, or a static library. I haven't looked into the latter case, since it is not relevant here (and in general, things can get weird when implicitly mixing up these linkage modes). Then, by having a `SONAME` attribute among our (program headers (`objdump -p`, or `readelf -d`)) metadata, with a value of, say, `libfoo.so.3`, we are telling the linker that the real path for that dependent binary to refer back to us —_later on!_—, ought to be `libfoo.so.3` rather than whatever filesystem path had been used to refer to us (usually, it would have been a `libfoo.so` symlinking back to us, discovered either because we were in one of the system auto-discovered directories, such as `/usr/lib`, or through a specific `-L` directory lookup flag having been specified/provided to the linker. The result of this can be observed in the `NEEDED` property of the dependent binary. - should there not have been a `SONAME`, the value here would have resulted from the filesystem path or whatnot: - if the library was referred to through the typical `-l foo` (+ optional `-L` and/or `/usr/lib` location having been used), then the resulting `NEEDED` "path"/name being used is `libfoo.so`, _unqualified_ (no `/` in that path). - but if the library was referred to _via_ an explicit `some/relative/or/abs/path/to/libfoo.so…`, is to tell the _linker_, _i.e._, during link-time, to override the path it had registered to to oneself", then the resulting `NEEDED` path being used becomes `/some/relative/or/abs/path/to/libfoo.so…`. - otherwise, if there was a `SONAME=` set in `libfoo.so.3.4` (_via_ a `-soname=` _linker_ flag, _i.e._, a `-Wl,-soname=` `cc` linkage flag (`LD_FLAGS`)), then the resulting `NEEDED` path being used is overridden to become ``, independently of whichever filepath and whatever reference method (`-lfoo` _vs._ `libfoo.so…`) had been used to refer to our `libfoo.so.3.4` file. ### The role of `NEEDED` The `NEEDED` entry of a binary is then used in two / two-and-a-half cases. - when _running_/_executing_ a `./main` standalone-executable binary, the "automatic dynamic loading" stemming from the default dynamic linkage runtime of Linux gets automagically and implicitly called, tasked to look up the link-time-specified dynamic libraries on which `./main` depends, such as `libfoo`; it shall **try to locate them using, as their "path", the value of the corresponding `NEEDED` entry** (so as to then _load_ the library for it to be available to the current binary); - similarly, when dynamically loading —either through the mechanism triggered by the previous bullet ("automatic" dynamic loading), or through explicit usage of the _fully_-dynamic loading machinery (`dlopen`, `LoadLibrary`, …)— some `libbar.so` dynamic library, then "automatic" dynamic loading is triggered here as well, to load the next layer of _transitive_ shared library dependencies. For instance, assuming `libbar.so` had been link-time-specified to depend on `libfoo.so` (a.k.a "`libbar.so` had been 'linked against' `foo`"), it means the dynamic loader will be automatically invoked to look up and load some `libfoo.so`, at whichever path specified by the `NEEDED` entry for `foo` within the (program headers) of the `libbar.so` file. ### Rationale The main rationale and objective of this design is to enable some form of "smart SemVer" with shared libraries within a system, based on the following Linux sysadmin assumptions: - every specific `libfoo.so.3.4` kind of artifact, when being produced, is given a `SONAME` of `libfoo.so.3`, that is, with every non-major version number having been stripped. This makes it so this artifact "back-references" an eponymous `libfoo.so.3` file assumed to be in scope. - Whenever such an artifact is "installed" in a proper directory, the following symlinks are to be update: - `ln -sf libfoo.so.3. libfoo.so.3`, with `` representing, at the time of running that command, the highest possible `n` so that `libfoo.so.3.n` exist. - and likewise, `ln -sf libfoo.so. libfoo.so`. That way, we ought to end up with the following setup: ![image](https://gist.github.com/user-attachments/assets/f2a0d34d-9d87-4e16-a7b5-d81b696c33c8) which, in the future, could become: ![image](https://gist.github.com/user-attachments/assets/7c19f4b1-0f42-4a08-905d-d53a8be1ea50) > [!NOTE] > There appears to be a special interaction between changing/updating these things, and the "`ld` cache", which is a memoized resolution of which libraries to load for a given name, and where, across all the mess of different directory priorities and `SONAME`s and whatnot. > I haven't found a single source which clearly explains this, and have not found it worth it to delve into this further. > Just beware, you may neeed to "refresh the cache" when installing new shared libraries, that's all. #### Application ##### (Dynamic) linkage of `./main` at some time `t0`, with `libfoo. = libfoo.so.3.4` Now, say we are to link against `-lfoo`, with `libfoo.so.3.4` being the highest version of `libfoo` installed on the system at this time: ![image](https://gist.github.com/user-attachments/assets/6a09484a-23cb-4c32-88d7-551c30e9ec2d)
Graphviz source ```dot digraph { graph [fontname = "courier" style=dotted]; node [fontname = "courier" shape=box]; edge [fontname = "courier"]; rankdir = BT subgraph cluster_filesystem { labelloc = b label = "\ in the FS (say 3 is the *latest* major version right now (time of linkage))" libfoo [label = "libfoo.so"] libfoo -> "libfoo.so.3.4" [ label = "ln -s\n(2)" style = dashed ] "libfoo.so.3" -> "libfoo.so.3.4" [ color = red label = " SONAME\n(3)" dir = back ] "libfoo.so.3" -> "libfoo.so.3.4" [ label = " ln -s\n(unused atm) " style = dashed ] } subgraph cluster_compilation { label = "linkage of\ndependent binary" main -> libfoo [ label = "-l foo\n(1)" style = dotted ] } main -> "libfoo.so.3" [ color = red label = "NEEDED\n(result!)" ] { node [style = invis] {rank = same; a b } {rank = same; c d } a -> b [ style = dashed label = temporary ] c -> d [ label = permanent ] c -> a [style = invis] } } ```
All this machinery and definitions and assumptions make it so our dependent binary, `./main` in the example, ends up `NEEDED`-referring to `libfoo.so.3`. And now, imagine having installed newer version of `libfoo.so…`, such as `3.5` (minor bump), and `4.…` (major bump). ##### Runtime! Executing `./main` at some later point `t1`, with `libfoo. = libfoo.so.4.1` We end up with: ![image](https://gist.github.com/user-attachments/assets/34dc76e6-acba-4d2a-a5c8-98c8f8af1624)
Graphviz source ```dot digraph { graph [fontname = "courier" style=dotted]; node [fontname = "courier" shape=box]; edge [fontname = "courier"]; rankdir = BT subgraph cluster_runtime {label = "\ at runtime: `./main` Time for the `ld` *loader* to kick in." main } main -> "libfoo.so.3" [ label = " NEEDED\n(1)" color = red ] subgraph cluster_filesystem { labelloc = b "libfoo.so.3.5" "libfoo.so.4.0" label = "\ in the FS (say 4 is the *latest* major version now)" libfoo [label = "libfoo.so"] libfoo -> "libfoo.so.4.1" [ label = "ln -s" style = dashed ] "libfoo.so.3.4" "libfoo.so.3.5" [ color = red ] "libfoo.so.3" -> "libfoo.so.3.5" [ label = "ln -s \n(2)" style = dashed color = red ] "libfoo.so.3" -> "libfoo.so.3.5" [ label = " SONAME" dir = back ] "libfoo.so.3" -> "libfoo.so.3.4" [ label = "SONAME" dir = back ] "libfoo.so.4" -> "libfoo.so.4.0" [ label = "SONAME" dir = back ] "libfoo.so.4" -> "libfoo.so.4.1" [ label = "ln -s" style = dashed ] "libfoo.so.4" -> "libfoo.so.4.1" [ label = "SONAME" dir = back ] } } ```
Notice how the binary properly loads an `minor`-compatible (_i.e._, API and ABI-compatible) bumped version of `libfoo.so`, without falling into the trap of loading a `major`-incompatible (_e.g._, with some API or ABI incompatibility) version thereof! ## The secondary role of `SONAME`, a happy(?) byproduct: "fixing linkage-path sensitivity" Go back and read very carefully the rules of the ``# The primary role of `SONAME` `` section, but focusing on the `-lfoo` (+ optionally, `-L .` or `-L /abs/path`) _vs._ `libfoo.so | ./libfoo.so | /abs/path/to/libfoo.so` difference of specification of the `libfoo.so` dependency. And now, consider the rules of `dlopen()` resolution: - if the string given to `dlopen` is some library name, _identified by the lack of `/` in it!_, then all the shenanigans about dynamic library location ensue (`RPATH`, else `LD_LIBRARY_PATH`, else `RUNPATH`, else system directories, …). - otherwise, the library is assumed to be located at the given path, _resolved relative to the current working directory_. That is, the working directory of whoever ran the original `./main` command, or even elsewhere if it changed before some explicit call to `dlopen()`. This working directory very much does not have to be that of the location of the `main` binary! (_e.g._, consider the caller running a `subdir/main` command, then the working directorty would be the _parent_ of `main`'s.) > [!TIP] > there is a special magical var on Linux, called `$ORIGIN/`, which acts like `./`, but for being resolved _relative to the dependent binary_ (or to the final standalone executable, I don't know, actually…), which is what could allow a deployment to involve a Windows-like pattern of packaging an app within a same-dir bundle _à la_ `dir/{main,libfoo.so}`. And this is where `SONAME` can then come and either ruin the day, or save it. ### Application: how `SONAME` can be able to fix an improper linkage specification having been used Let's consider the case of `libbar.so -> libfoo.so` (we won't care about version numbers here). Now, imagine some app `dlopen()`ing `libbar.so`, using whichever path/reference or w/e which makes _that_ direct layer of loading work and succeed. Since `libbar.so` "had been linked against `libfoo.so`" (_i.e._, since there had been a link-time specification of `libbar` depending on `libfoo` resulting in _some_ `NEEDED` entry inside `libbar.so` referring back to some _path or identifier_ of `libfoo.so`), then it means that the dynamic loader is not finished, it now needs to load `libfoo.so`. And to do so, it "simply" proceeds to do, in pseudo-code, a `dlopen(libbar.dynamic_deps[foo].NEEDED)`, _i.e._, it acts as if `dlopen()`ing the _path or identifier_ specified over its `NEEDED` entry for `foo`. So, in the case `libbar.so` had been "linked against `libfoo`" using some command line along the lines of `relative/path/to/libfoo.so` (rather than `-lfoo`, `-L relative/path/to`), it means that _the default `NEEDED` value for `libfoo` in `libbar.so` is of `relative/path/to/libfoo.so` rather than `libfoo.so`_. Now, we have two possibilities: - #### If `libfoo.so` has no `SONAME` to override the effective/actual `NEEDED` being used then `NEEDED` ends up being `relative/path/to/libfoo.so` So, back to our `dlopen()`ing of `libbar.so`, and its transitive `dlopen()`, we end up with: ```c dlopen("relative/path/to/libfoo.so"); ``` which is Bad™, because, again, this is a relative path, resolved relative to the completely arbitrary working directory of the user. - Best case scenario, `libfoo.so` is not found, `dlopen`ing fails, and the program probably exits there and then, having failed. - Worse scenario, the program did not check for success of `dlopen()`, and it starts using a `NULL` pointer: - it probably segfaults; - it could lead to a remote code execution vulnerability, in some very contrived and sophisticated attack scenario. - Worst case scenario, there is a malicious `libfoo.so` located there, and control of the program is hijacked in an incredibly simple and trivial way. - #### If `libfoo.so` does have a `SONAME` to override the effective/actual `NEEDED` being used and assuming that `SONAME` to be a sane _library identifier_, _i.e._, with no `/`s involved in the path, then we end up with: ```c dlopen("libfoo.so"); // or `dlopen("libfoo.so.3")` or w/e ``` and the usual dynamic-library lookup rules ensue, and all is Fine™.