Name mangling

In software compiler engineering, name mangling (more properly called name decoration, although this term is rarely used in practice) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern programming languages. The need arises where the language allows different entities to be named with the same identifier as long as they occupy a different namespace (where a namespace is typically defined by a module, class, or explicit namespace directive).

Contents

Name mangling in Microsoft Windows

Note: there's also an article on this at Name decoration

While in general C, Pascal and other languages not supporting function overloading do not require or use name mangling, it is used in some cases to provide additional information about function. For example, compilers targeted at Microsoft Windows platforms support a variety of calling conventions, which determine the manner in which parameters are sent to subroutines and results returned. As these are mutually incompatible, compilers mangle symbols with codes detailing the calling convention. The mangling scheme was established by Microsoft, and has been informally followed by other compilers including Borland and GNU gcc. The scheme even applies to other languages, such as pascal, Delphi and C# (allowing subroutines written in those languages to call, or be called by, existing Windows libraries using a calling convention different from their default).

When compiling the following C examples:

int _cdecl    f(int x) { return 0; }
int _stdcall  g(int y) { return 0; }
int _fastcall h(int z) { return 0; }

_cdecl is the default for C functions, if no calling convention is stated explicitly.

32 bit compilers emit, respectively:

_f
_g@4
@h@4

In the stdcall and fastcall mangling schemes, the function is encoded as _name@X, where X is the size of the argument(s) passed on the stack, in bytes.

  • todo: more stuff about parameter sizes and show mangling for different param lists, maybe variadics too
  • todo: try the above with 16 bit compilers: I think those "4"s will become "2"s
  • todo: try _pascal, _safecall, and FAR in 16 bit compilers too

Name mangling in C++

C++ compilers are the most widespread, and yet least standard, users of name mangling. The first C++ compilers were implemented as translators to C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers which produced machine code or assembler directly, the system's linker generally did not support C++ symbols, and mangling was still required.

Simple example

Consider the following two definitions of f() in a C++ program:

int f (void) { return 1; }
int f (int)  { return 0; }
void g (void) { int i = f(), j = f(0); }

These are distinct functions, with no relation to each other apart from the name. If they were naïvely translated into C with no changes, the result would be an error — C does not permit two functions with the same name. The compiler therefore will encode the type information in the symbol name, the result being something resembling:

int __f_v (void) { return 1; }
int __f_i (int)  { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }

Notice that g() is mangled even though there is no conflict; name mangling applies to all symbols.

Complex example

For a more complex example, we'll consider an example of a real-world name mangling implementation: that used by GNU GCC 3.x, and how it mangles the following example class. The mangled symbol is shown below the respective identifier name.

namespace wikipedia {
class article {
public:
       std::string format (void); 
                /* = _ZN9wikipedia7article6formatEv */
 
       bool        print_to (std::ostream&); 
                /* = _ZN9wikipedia7article8print_toERSo */
 
       class wikilink {
       public:
               wikilink (std::string const& name);
            /* = _ZN9wikipedia7article8wikilinkC1ERKSs */
       };
};
}

The name mangling scheme used here is relatively simple. All mangled symbols begin with _Z (note that an underscore followed by a capital is a reserved identifier in C and C++, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by N, then a series of <length,id> pairs (the length being the length of the next identifier), and finally E. For example, wikipedia::article::format becomes

_ZN·9wikipedia·7article·6format·E  

For functions, this is then followed by the type information; as format() is a void function, this is simply v; hence:

_ZN·9wikipedia·7article·6format·E·v

For print_to, a standard type std::ostream (or more properly std::basic_ostream<char, char_traits<char> >) is used, which has the special alias So; a reference to this type is therefore RSo, with the complete name for the function being:

_ZN·9wikipedia·7article·8print_to·E·RSo

How different compilers mangle the same functions

There isn't a standard scheme by which even trivial C++ identifiers are mangled, and consequently different compiler vendors (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:

Compiler void h(int) void h(int, char) void h(void)
GNU GCC 3.x _Z1hi _Z1hic _Z1hv
GNU GCC 2.9x h__Fi h__Fic h__Fv
Intel C++ 8.0 for Linux _Z1hi _Z1hic _Z1hv
Microsoft VC++ v6 ?h@@YAXH@Z ?h@@YAXHD@Z ?h@@YAXXZ
Borland C++ v3.1 @h$qi @h$qizc @h$qv
OpenVMS C++ V6.5 (ARM mode) H__XI H__XIC H__XV
OpenVMS C++ V6.5 (ANSI mode) CXX$__7H__FI0ARG51T CXX$__7H__FIC26CDH77 CXX$__7H__FV2CB06E8
OpenVMS C++ X7.1 IA-64 CXX$_Z1HI2DSQ26A CXX$_Z1HIC2NP3LI4 CXX$_Z1HV0BCA19V
SunPro CC __1cBh6Fi_v_ __1cBh6Fic_v_ __1cBh6F_v_
HP aC++ A.05.55 IA-64 _Z1hi _Z1hic _Z1hv
HP aC++ A.03.45 PA-RISC h__Fi h__Fic h__Fv
Tru64 C++ V6.5 (ARM mode) h__Xi h__Xic h__Xv
Tru64 C++ V6.5 (ANSI mode) __7h__Fi __7h__Fic __7h__Fv

Notes:

  • The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (is that correct?). With the advent of new features in standard C++, particularly templates, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identical mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backwards compatible. todo: the different isn't obvious from the examples. maybe a template or something should be added...
  • On IA-64, a standard ABI exists (see external links), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.x, in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.

Mangling for other C++ constructs

todo: supply a simple C++ class with public members. Show:

  • decoration of inner classes (is that what C++ calls classes within classes?)
  • decoration of methods, including showing two methods with the same name but different parameter strings
  • maybe showing how operator methods are expressed in the binary
  • show how two (or more) C++ compilers differently mangle the same code (I have an older MSVC somewhere, and we can use G++ for comparison). For just this one case, the more C++ compilers we can find the better.
  • show how namespaces influence decoration (heck, I've not written a C++ program since before namespaces existed)
  • show how template instanciations are decorated (is this relevant?)
  • are virtual members decorated specially? why?
    • how in-depth is this going to be? i can probably explain how every different thing in C++ is decorated, but it's compiler-dependent; but, still, it might be useful to demonstrate the concept?
      • maybe show a few examples just in one compiler (say gcc, for simplicity). we should stop once it becomes too dull :)

Handling of C symbols when linking from C++

The job of the common C++ idiom:

#ifdef __cplusplus 
extern "C" {
#endif
    /* ... */
#ifdef __cplusplus
}
#endif

is to ensure that the symbols following are "unmangled" - that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.

For example, the standard strings library, <string.h> usually contains something resembling:

#ifdef __cplusplus
extern "C" {
#endif
 
void *memset (void *, int, size_t);
char *strcat (char *, const char *);
int   strcmp (const char *, const char *);
char *strcpy (char *, const char *);
 
#ifdef __cplusplus
}
#endif

Thus, code such as:

if (strcmp(argv[1], "-x") == 0) 
    strcpy(a, argv[2]);
else 
    memset(a, 0, sizeof(a));

uses the correct, unmangled strcmp and memset. If the extern had not been used, the C++ compiler would produce code equivalent to:

if (__1cGstrcmp6Fpkc1_i_(argv[1], "-x") == 0) 
    __1cGstrcpy6Fpcpkc_0_(a, argv[2]);
else 
    __1cGmemset6FpviI_0_(a, 0, sizeof(a));

Since those symbols do not exist in the C runtime library (e.g. libc), link errors would result.

Useful exlinks (I dunno which should end up in the finished article)

Standardised name mangling in C++

While it is a relatively common belief that standardised name mangling in the C++ language would lead to greater interoperability between implementations, this is not really the case. Name mangling is only one of several ABI issues in a C++ implementation, and other language details like exception handling, virtual table layout, structure padding, etc. would render differing implementations yet incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g. length of symbols) dictate a particular mangling scheme. A standardised requirement for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker which understood the C++ language.

The C++ standard therefore does not attempt to standardise name mangling, leaving this to other, third-party standards.

On the other hand, some implementors (including the ARM cpu vendor) actively encourage compiler vendors to use non-standard name mangling schemes (in the absence of an agreed-upon ABI) to prevent accident appearance of ABI-compatibility, even when their ABIs are not otherwise compatible. Other vendors work in the opposite direction, to standardise the entire ABI including other C++-related issues such as exception handling and virtual table layout.

todo: one of the links I found had a little bit about one of the processor vendors (ARM?) saying you shouldn't have a standard mangling scheme - they wanted name incompatibility, so you wouldn't accidentally be able to run one dll from another which wasn't compiled with the same ABI. Hmm, I wonder if anyone actually has their compiler put the ABI's name (or whatever) into the symbols, just to make sure?

linux, at least, uses "versioned" symbols in some way: func@GLIBC_2_0. I presume this somehow relates to ABI changes, but I'm not sure where/why it's used.

Real-world effects of C++ name mangling

As C++ symbols are routinely exported from DLL and shared object files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g. GNU GCC and the OS vendor's compiler) wished to install the Boost library, it would have to be compiled twice — once for the vendor compiler and once for GCC.

For this reason name decoration is an important aspect of any C++-related ABI.

Name mangling in Java

As the language, compiler, and .class file format were all designed together (and had object-orientation in mind from the start), the primary problem solved by name mangling doesn't exist in implementations of the java runtime. There are, however, cases where an analogous transformation and qualification of names is necessary.

Creating unique names for inner and anonymous classes

The scope of anonymous classes is confined to their parent class, so the compiler must produce a "qualified" public name for the inner class, to avoid conflict where other classes (inner or not) exist in the same namespace. Similarly, anonymous classes must have "fake" public names generated for them (as the concept of anonymous classes exists only in the compiler, not the runtime). So, compiling the following java program

 public class foo {
   class bar {
     public int x;
   }
   public void zark (){
     Object f = new Object () {
       public String toString() {
         return "hello";
       }
     };
   }
 }

will produce three .class files:

  • foo.class, containing the main (outer) class foo
  • foo$bar.class, containing the named inner class foo.bar
  • foo$1.class, containing the anonymous inner class (local to method foo.zark)

All of these class names are valid (as $ symbols are permitted in the JVM specification) and these names are "safe" for the compiler to generate, as the Java language definition prohibits $ symbols in normal java class definitions.

Name resolution in Java is further complicated at runtime, as fully qualified class names are unique only inside a specific classloader instance. In cases where two different classloader instances (e.g. two different applets) contain classes with the same name, the way in which this ambiguity is resolved, and in which unique references between one and the other are made, is done in a runtime-internal (and generally undocumented) manner.

Handling issues with the java to native interface

Java's native method support allows java language programs to call out to programs written in another language (generally either C or C++). There are two name-resolution concerns here, neither of which is implemented in a particularly standard manner.

Name mangling in Python

A python programmer can explicitly designate that an itentifier is a "private name" (i.e. that its scope is confined to the class) by setting the first two characters of the identifier to be underscores. On encountering these, the python compiler turns these private names into global symbols by prepending a string consisting of a single underscore and the name of the enclosing class.

So, for example,

class Test:
    def __privateSymbol():
        pass
    def normalSymbol():
        pass
 
print dir(Test)

will output:

 ['_Test__privateSymbol', 
 '__doc__', 
 '__module__', 
 'normalSymbol']

Name mangling in Borland's Turbo Pascal / Delphi range

todo: borland pascal and delphi (at least) are object oriented, and can call C DLLs, so they likely have this issue too


Name mangling in Objective-C

Essentially two forms of method exist in Objective-C, the class ("static") method, and the instance method. A method declaration in Objective-C is of the following form

+ method name: argument name1:parameter1 ...
- method name: argument name1:parameter1 ...

Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:

+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;

with instance methods looking like

- (id) value;
- (id) setValue: (id) new_value;

Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme:

_c_Class_methodname_name1_name2_ ...

for class methods, and

_i_Class_methodname_name1_name2_ ...

for instance methods.

The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method + (id) initWithX: (int) number andY: (int) number;, if belonging to the Point class would translate as _c_Point_initWithX_andY_, and the instance method (belonging to the same class) - (id) value; would translate to _i_Point_value.

Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a selector. In Objective-C, one can manage selectors directly - they have a specific type in Objective-C - SEL.

During compilation, a table is built that maps the textual representation (such as _i_Point_value) to selectors (which are given a type SEL). Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to - different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too - these are known as implementation pointers, and are given a type also, IMP.

Message sends are encoded by the compiler as calls to the id objc_msgSend(id receiver, SEL selector, ...) function, or one of its cousins, where receiver is the receiver of the message, and SEL determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implemenation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the SEL to IMP lookup tables, the functions are essentially anonymous.

The SEL value for a selector does not vary between classes. This enables polymorphism.

The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.

Since Objective-C does not support namespaces, there is no need for mangling of class names (that do appear as symbols in generated binaries).

External links

  • Linux Itanium ABI for C++ (http://www.codesourcery.com/cxx-abi/abi.html#mangling), including name mangling scheme.
  • c++filt (http://sources.redhat.com/binutils/docs-2.15/binutils/c--filt.html) — filter to demangle encoded C++ symbols
  • Citations from CiteSeer (http://citeseer.ist.psu.edu/cis?q=Name+and+mangling)
  • The Objective-C Runtime System (http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/4objc_runtime_overview/chapter_4_section_1.html#//apple_ref/doc/uid/20001425=) — From Apple's The Objective-C Programming Language (http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/)

Navigation

  • Art and Cultures
    • Art (https://academickids.com/encyclopedia/index.php/Art)
    • Architecture (https://academickids.com/encyclopedia/index.php/Architecture)
    • Cultures (https://www.academickids.com/encyclopedia/index.php/Cultures)
    • Music (https://www.academickids.com/encyclopedia/index.php/Music)
    • Musical Instruments (http://academickids.com/encyclopedia/index.php/List_of_musical_instruments)
  • Biographies (http://www.academickids.com/encyclopedia/index.php/Biographies)
  • Clipart (http://www.academickids.com/encyclopedia/index.php/Clipart)
  • Geography (http://www.academickids.com/encyclopedia/index.php/Geography)
    • Countries of the World (http://www.academickids.com/encyclopedia/index.php/Countries)
    • Maps (http://www.academickids.com/encyclopedia/index.php/Maps)
    • Flags (http://www.academickids.com/encyclopedia/index.php/Flags)
    • Continents (http://www.academickids.com/encyclopedia/index.php/Continents)
  • History (http://www.academickids.com/encyclopedia/index.php/History)
    • Ancient Civilizations (http://www.academickids.com/encyclopedia/index.php/Ancient_Civilizations)
    • Industrial Revolution (http://www.academickids.com/encyclopedia/index.php/Industrial_Revolution)
    • Middle Ages (http://www.academickids.com/encyclopedia/index.php/Middle_Ages)
    • Prehistory (http://www.academickids.com/encyclopedia/index.php/Prehistory)
    • Renaissance (http://www.academickids.com/encyclopedia/index.php/Renaissance)
    • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
    • United States (http://www.academickids.com/encyclopedia/index.php/United_States)
    • Wars (http://www.academickids.com/encyclopedia/index.php/Wars)
    • World History (http://www.academickids.com/encyclopedia/index.php/History_of_the_world)
  • Human Body (http://www.academickids.com/encyclopedia/index.php/Human_Body)
  • Mathematics (http://www.academickids.com/encyclopedia/index.php/Mathematics)
  • Reference (http://www.academickids.com/encyclopedia/index.php/Reference)
  • Science (http://www.academickids.com/encyclopedia/index.php/Science)
    • Animals (http://www.academickids.com/encyclopedia/index.php/Animals)
    • Aviation (http://www.academickids.com/encyclopedia/index.php/Aviation)
    • Dinosaurs (http://www.academickids.com/encyclopedia/index.php/Dinosaurs)
    • Earth (http://www.academickids.com/encyclopedia/index.php/Earth)
    • Inventions (http://www.academickids.com/encyclopedia/index.php/Inventions)
    • Physical Science (http://www.academickids.com/encyclopedia/index.php/Physical_Science)
    • Plants (http://www.academickids.com/encyclopedia/index.php/Plants)
    • Scientists (http://www.academickids.com/encyclopedia/index.php/Scientists)
  • Social Studies (http://www.academickids.com/encyclopedia/index.php/Social_Studies)
    • Anthropology (http://www.academickids.com/encyclopedia/index.php/Anthropology)
    • Economics (http://www.academickids.com/encyclopedia/index.php/Economics)
    • Government (http://www.academickids.com/encyclopedia/index.php/Government)
    • Religion (http://www.academickids.com/encyclopedia/index.php/Religion)
    • Holidays (http://www.academickids.com/encyclopedia/index.php/Holidays)
  • Space and Astronomy
    • Solar System (http://www.academickids.com/encyclopedia/index.php/Solar_System)
    • Planets (http://www.academickids.com/encyclopedia/index.php/Planets)
  • Sports (http://www.academickids.com/encyclopedia/index.php/Sports)
  • Timelines (http://www.academickids.com/encyclopedia/index.php/Timelines)
  • Weather (http://www.academickids.com/encyclopedia/index.php/Weather)
  • US States (http://www.academickids.com/encyclopedia/index.php/US_States)

Information

  • Home Page (http://academickids.com/encyclopedia/index.php)
  • Contact Us (http://www.academickids.com/encyclopedia/index.php/Contactus)

  • Clip Art (http://classroomclipart.com)
Toolbox
Personal tools