vtable and vptrs for C++ run-time polymorphism

A Function Pointer Table is an effective technique in creating O(1) dispatch for a set of functions.


    void (*pDispatchTable[4])(size_t arg) = {
        &FunctionA,
        &FunctionB,
        &FunctionC,
        &FunctionD
    }
  

Think the clockwise spiral rule, this creates an array of 4, of pointers, pointing to function with size_t argument and void return type.
In short, the array is a function jump table allowing function dispatch at run-time.
The vtable (virtual table) is a similar concept to a function pointer table, except that it is generated by the compiler for classes with virtual methods.

A Naive Example for Demonstration


    class Person {
      public:
        Person() = default;
        Person(const Person& p) = delete;
        virtual void print() {};

      protected:
        uint32_t age;
        bool gender;
    };

    class Citizen : public Person {
      public:
        Citizen() = default;
        Citizen(const Citizen& c) = delete;

        virtual void print() override final {};

      protected:
        uint32_t id;
        uint32_t income;
    };
  

Consider the virtual method `print`.
Calling the print method of a `Citizen` object will use the vtable to determine which version of the `print()` function to exercise at run time.

Dump Class Layout at Compile Time - Leveraging Clang's AST and IRgen Dumps


    lhan@luyaosmac src % make dump
    c++ -Xclang -fdump-vtable-layouts -Xclang -fdump-record-layouts  main.cpp

    *** Dumping AST Record Layout
             0 | class Person
             0 |   (Person vtable pointer)
             8 |   uint32_t age
            12 |   _Bool gender
               | [sizeof=16, dsize=13, align=8,
               |  nvsize=13, nvalign=8]

    *** Dumping AST Record Layout
             0 | class Citizen
             0 |   class Person (primary base)
             0 |     (Person vtable pointer)
             8 |     uint32_t age
            12 |     _Bool gender
            16 |   uint32_t id
            20 |   uint32_t income
               | [sizeof=24, dsize=24, align=8,
               |  nvsize=24, nvalign=8]

    *** Dumping IRgen Record Layout
  

The compiler has added a 8-byte vtable pointer (vptr) at the beginning of the `Person` class. The member variables of both classes are laid out sequentially in memory after the vptr.
It is clear that the vptr is a hidden member variable that points to the vtable for the class, as it is not specified in the class declaration.

Where Is the Function Table - Locating the vtable


    lhan@luyaosmac src % objdump -t executable --demangle

    executable:	file format mach-o arm64

    SYMBOL TABLE:
    0000000100000494  w    F __TEXT,__text .hidden Citizen::Citizen()
    00000001000004c0  w    F __TEXT,__text .hidden Citizen::Citizen()
    00000001000004fc  w    F __TEXT,__text .hidden Person::Person()
    0000000100000520  w    F __TEXT,__text .hidden Citizen::print()
    0000000100000530  w    F __TEXT,__text .hidden Person::print()
    0000000100000540  w    O __TEXT,__const .hidden typeinfo name for Citizen
    0000000100000549  w    O __TEXT,__const .hidden typeinfo name for Person
    0000000100004000  w    O __DATA_CONST,__const .hidden vtable for Citizen
    0000000100004018  w    O __DATA_CONST,__const .hidden typeinfo for Person
    0000000100004028  w    O __DATA_CONST,__const .hidden typeinfo for Citizen
    0000000100004040  w    O __DATA_CONST,__const .hidden vtable for Person
    0000000000000000      d  *UND*
    0000000000000000      d  *UND* /Users/lhan/Projects/vtable-dbg/src/
    0000000000000000      d  *UND* main.cpp
    000000006987021f      d  *UND* /var/folders/fw/s396y35s0t5g__pl427xz9yr0000gn/T/main-fc4a8a.o
    0000000100000448      d  *UND*
    0000000100000448      d  *UND* _main
    000000000000004c      d  *UND*
    0000000100000448      d  *UND*
    0000000100000494      d  *UND*
    0000000100000494      d  *UND* Citizen::Citizen()
    000000000000002c      d  *UND*
    0000000100000494      d  *UND*
    00000001000004c0      d  *UND*
    00000001000004c0      d  *UND* Citizen::Citizen()
    000000000000003c      d  *UND*
    00000001000004c0      d  *UND*
    00000001000004fc      d  *UND*
    00000001000004fc      d  *UND* Person::Person()
    0000000000000024      d  *UND*
    00000001000004fc      d  *UND*
    0000000100000520      d  *UND*
    0000000100000520      d  *UND* Citizen::print()
    0000000000000010      d  *UND*
    0000000100000520      d  *UND*
    0000000100000530      d  *UND*
    0000000100000530      d  *UND* Person::print()
    0000000000000010      d  *UND*
    0000000100000530      d  *UND*
    0000000000000000      d  *UND* typeinfo name for Citizen
    0000000000000000      d  *UND* typeinfo name for Person
    0000000000000000      d  *UND* vtable for Citizen
    0000000000000000      d  *UND* typeinfo for Person
    0000000000000000      d  *UND* typeinfo for Citizen
    0000000000000000      d  *UND* vtable for Person
    0000000000000000      d  *UND*
    0000000100000000 g     F __TEXT,__text __mh_execute_header
    0000000100000448 g     F __TEXT,__text _main
    0000000000000000         *UND* vtable for __cxxabiv1::__class_type_info
    0000000000000000         *UND* vtable for __cxxabiv1::__si_class_type_info
  

vtable for both classes are located in the __DATA_CONST segment, which is a read-only data segment.
The vtable for `Citizen` is located at address 0x0000000100004000, and the vtable for `Person` is located at address 0x0000000100004040.
Another observation is that the virtual function `Citizen::print()` is located at address 0x0000000100000520, and the virtual function `Person::print()` is located at address 0x0000000100000530.

Exploring vtable on A Debugger


    Citizen c;
    Person* p = &c;
    p->print();
  

    (lldb) target create "executable"
    Current executable set to '/Users/lhan/Projects/vtable-dbg/src/executable' (arm64).
    (lldb) b main
    Breakpoint 1: where = executable`main + 32 at main.cpp:5:13, address = 0x0000000100000468
    (lldb) n
    Process 22678 stopped
    * thread #1, queue = 'com.apple.main-thread', stop reason = step over
        frame #0: 0x0000000100000474 executable`main at main.cpp:7:5
       4   	int main() {
       5   	    Citizen c;
       6   	    Person* p = &c;
    -> 7   	    p->print();
       8
       9   	    return 0;
       10  	}
  

Now print the Person pointer `p` in this frame.


    (lldb) frame variable -L *p                                                                                           
      0x000000016fdfee30: (Citizen) *p = {
      0x000000016fdfee30:   Person = {
      0x000000016fdfee38:     age = 4000517568
      0x000000016fdfee3c:     gender = true
        }
      0x000000016fdfee40:   id = 4003310024
      0x000000016fdfee44:   income = 1
      }
  

The person pointer contains the vptr at address 0x000000016fdfee30.
From previous observation vptr is a pointer at the zero offset of the class layout.
Now dereference the vptr to get the address of the vtable.


    (lldb) p/x *(uintptr_t*)0x000000016fdfee30
    (uintptr_t) 0x0000000100004010
  

Note that the vptr points to address 0x0000000100004010. However, in the symbol table, the vtable for `Citizen` is located at address 0x0000000100004000.
Let's revisit this discrepancy in the next section.


    (lldb) image lookup -a 0x0000000100004010
    Address: executable[0x0000000100004010] (executable.__DATA_CONST.__const + 16)
    Summary: executable`vtable for Citizen + 16
  

The vtable for `Citizen` is located at address 0x0000000100004010.
This is the addresses of the table itself. Think again this example, void (*pDispatchTable[4])(size_t arg) = { &FunctionA, &FunctionB, &FunctionC, &FunctionD }
Visiting the first address of the vtable will give us the address of the first virtual function in the vtable, `Citizen::print()`.


    (lldb) x/g 0x0000000100004010
    0x100004010: 0x0000000100000520

    (lldb) image lookup -a 0x0000000100000520
      Address: executable[0x0000000100000520] (executable.__TEXT.__text + 216)
      Summary: executable`Citizen::print() at Citizen.hpp:12
  

0x0000000100000520 matches the symbol table entry for `Citizen::print()`.

Why is the vtable address different from the symbol table entry?

From the debugger, it seems the vtable is located at address 0x0000000100004010
However, in the symbol table, the vtable for `Citizen` is located at address 0x0000000100004000.
There is a 0x10 (16 bytes) offset between the vtable address in the symbol table and the address that the vptr points to.

The 16 Bytes Offset

Leverging the vtable layout dump from Clang, we can see further the layout of the vtable for both `Person` and `Citizen` classes.


    lhan@luyaosmac src % make dump_vtable_layout
    c++ -Xclang -fdump-vtable-layouts main.cpp
    Original map
    Vtable for 'Person' (3 entries).
       0 | offset_to_top (0)
       1 | Person RTTI
           -- (Person, 0) vtable address --
       2 | void Person::print()

    VTable indices for 'Person' (1 entries).
       0 | void Person::print()

    Original map
     void Citizen::print() -> void Person::print()
    Vtable for 'Citizen' (3 entries).
       0 | offset_to_top (0)
       1 | Citizen RTTI
           -- (Citizen, 0) vtable address --
           -- (Person, 0) vtable address --
       2 | void Citizen::print()

    VTable indices for 'Citizen' (1 entries).
       0 | void Citizen::print()
  

For each class, the Citizen::print() function is located at index 2 in the vtable. This translates to an offset of 0x10 (16 bytes) from the start of the vtable since I have compiled for arm64 architecture where each pointer is 8 bytes.
The first two entries in the vtable are reserved for internal use by the C++ runtime: the offset_to_top and the RTTI (Run-Time Type Information) pointer.
The symbol table records the starting address of the entire vtable symbol, while the vptr points to the location of the first virtual function inside the vtable.

How are the offset_to_top and RTTI pointer used?


    // This needs offset_to_top
    Person* p = new Citizen();
    Citizen* c = dynamic_cast(p);  // Uses offset_to_top

    // This needs RTTI pointer
    typeid(*p);  // Uses RTTI entry
  

Conclusion

1. If a class contains virtual methods, the compiler generates a hidden class member at the 0 address offset of a class instance.
2. Each class instance has the vptr pointer.
3. The vptr pointer is initialized on class construction, to point to the vtable of the class.
4. The vtable is a read-only data structure that contains pointers to the virtual functions of the class, located at the __DATA_CONST (RODATA) segment of the executable.
5. The first two entries in the vtable are reserved for internal use by the C++ runtime: offset_to_top and RTTI pointer. The actual virtual function pointers start at an offset (0x10 in this case) from the start of the vtable.

Depending on the compiler and platform, the layout of the vtable may vary. Multiple inheritance, virtual inheritance, and virtual base classes will complicate the vtable structure (potentially resulting in multiple vtables and offset adjustments), but the core principles remain unchanged.