In order to support its strong typing rules and the ability to provide function overloading, the C++ programming language encodes information about functions and objects, so that conflicts across object files can be detected during linking. (This encoding is also sometimes called, whimsically enough, mangling; the corresponding decoding is sometimes called demangling.) These rules tend to be unique to each individual implementation of C++.
The scheme detailed in the commentary for 7.2.1 of The Annotated Reference Manual offers a description of a possible implementation which happens to closely resemble the cfront
compiler. The design used in gnu C++ differs from this model in a number of ways:
In addition to the basic types void
, char
, short
, int
, long
, float
, double
, and long double
, gnu C++ supports two additional types: wchar_t
, the wide character type, and long long
(if the host supports it). The encodings for these are `w' and `x' respectively.
According to the arm, qualified names (e.g., `foo::bar::baz') are encoded with a leading `Q'. Followed by the number of qualifications (in this case, three) and the respective names, this might be encoded as `Q33foo3bar3baz'. gnu C++ adds a leading underscore to the list, producing `_Q33foo3bar3baz'.
The operator `*=' is encoded as `__aml', not `__amu', to match the normal `*' operator, which is encoded as `__ml'.
In addition to the normal operators, gnu C++ also offers the minimum and maximum operators `>?' and `<?', encoded as `__mx' and `__mn', and the conditional operator `?:', encoded as `__cn'.
Constructors are encoded as simply `__name', where name is the encoded name (e.g., 3foo
for the foo
class constructor). Destructors are encoded as two leading underscores separated by either a period or a dollar sign, depending on the capabilities of the local host, followed by the encoded name. For example, the destructor `foo::~foo' is encoded as `_$_3foo'.
Virtual tables are encoded with a prefix of `_vt', rather than `__vtbl'. The names of their classes are separated by dollar signs (or periods), and not encoded as normal: the virtual table for foo
is `__vt$foo', and the table for foo::bar
is named `__vt$foo$bar'.
Static members are encoded as a leading underscore, followed by the encoded name of the class in which they appear, a separating dollar sign or period, and finally the unencoded name of the variable. For example, if the class foo
contains a static member `bar', its encoding would be `_3foo$bar'.
gnu C++ is not as aggressive as other compilers when it comes to always generating `Fv' for functions with no arguments. In particular, the compiler does not add the sequence to conversion operators. The function `foo::bar()' is encoded as `bar__3foo', not `bar__3fooFv'.
The argument list for methods is not prefixed by a leading `F'; it is considered implied.
gnu C++ approaches the task of saving space in encodings differently from that noted in the arm. It does use the `Tn' and `Nxy' codes to signify copying the nth argument's type, and making the next x arguments be the type of the yth argument, respectively. However, the values for n and y begin at zero with gnu C++, whereas the arm describes them as starting at one. For the function `foo (bartype, bartype)', gnu C++ uses `foo__7bartypeT0', while compilers following the arm example generate `foo__7bartypeT1'.
gnu C++ does not bother using the space-saving methods for types whose encoding is a single character (like an integer, encoded as `i'). This is useful in the most common cases (two int
s would result in using three letters, instead of just `ii').