Descriptor Table Approach: Technical Specification¶
Specification Status¶
This document provides a detailed technical specification for the descriptor table approach to the Itanium C++ Contracts ABI. It incorporates robustness and portability considerations:
Header expanded: explicit data_size, header_size, flags, native-endian definition
Wider counts: 16-bit num_entries and 32-bit offsets (removes hard caps)
Alignment rules defined; padding in static_data allowed/required as needed
Bounds and alignment validation rules for runtimes
Corrected field ID ranges and type assignments
Vendor field namespacing defined to avoid collisions
Optional sorted entries and optional index for faster lookup
Clarified deduplication scope (within link unit; not across DSOs)
Reentrancy/termination rules for handlers
Minimal dynamic_data TLV defined for future-proofing
Overview¶
This document specifies the descriptor table approach for the Itanium C++ Contracts ABI. The approach separates metadata describing data layout from the field data itself, enabling ABI-stable evolution, efficient field omission, and vendor extensibility without coordination.
Core Concept¶
Contract violation data consists of two components:
Metadata (descriptor table): Describes what fields exist and where they are located within the static data blob
Data (static_data blob): Tightly packed field values (pointers, strings, scalars), with padding as required for alignment
Properties:
ABI stability: Metadata can describe any layout without breaking compatibility
Efficiency: Metadata is shared across contracts; data is compact and aligned
Extensibility: New field types are added without changing existing structures
ABI Interface¶
Entrypoint Function¶
namespace __cxxabiv1 {
// Primary entrypoint - all parameters explicit
[[noreturn]]
void __cxa_contract_violation_entrypoint(
const __cxa_descriptor_table_t* static_descriptor,
const void* static_data,
__cxa_detection_mode_t mode,
__cxa_evaluation_semantic_t semantic,
const __cxa_runtime_data_t* dynamic_data, // see TLV below (may be nullptr)
void* reserved
);
} // namespace __cxxabiv1
Parameters:
static_descriptor: Pointer to descriptor table (compile-time constant)
static_data: Pointer to packed field data (compile-time constant)
mode: How violation was detected (predicate_false or evaluation_exception)
semantic: Evaluation semantic (enforced or observed)
dynamic_data: Optional runtime-generated TLV data (nullptr if none)
reserved: Reserved for future use
Runtime behavior requirements:
The entrypoint constructs std::contract_violation and invokes the registered violation handler.
If the handler returns or throws, the runtime must force termination (e.g., std::terminate()).
A reentrancy guard must prevent infinite recursion if a contract fails within the handler; on reentry, terminate immediately.
Binary Format¶
Endianness¶
All multi-byte integer fields in the descriptor and entries are encoded in the platform’s native endianness.
Examples below use little-endian for illustration.
Header (v2)¶
The descriptor begins with a fixed-size header followed by an entry array. The header includes a header_size field to allow future extension.
struct __cxa_descriptor_table_t {
uint8_t version; // = 2 for this revision
uint8_t vendor_id; // 0=standard, 1=GCC, 2=Clang, ...
uint8_t flags; // bit0: entries_sorted_by_type
// bit1: has_optional_index (after entries)
// other bits: reserved (0)
uint8_t reserved0; // must be 0 for v2
uint16_t num_entries; // number of descriptor entries
uint16_t header_size; // size of this header in bytes (>= 16)
uint32_t data_size; // total size in bytes of static_data blob
uint8_t data_alignment; // required base alignment for static_data in bytes (power of two)
uint8_t reserved1[3]; // must be 0
// Followed by: __cxa_descriptor_entry_t entries[num_entries];
// Optionally followed by an index section if flags.has_optional_index=1
};
// sizeof(__cxa_descriptor_table_t) == 16 bytes for v2
Descriptor Entry (v2)¶
Each entry maps a field identifier to an offset within the static_data blob. Offsets are 32-bit and must respect alignment constraints (see below).
struct __cxa_descriptor_entry_t {
uint16_t field_type; // field identifier, see Field Type Encoding
uint16_t reserved; // must be 0 for v2 (alignment/flags future use)
uint32_t offset; // byte offset from start of static_data
};
// sizeof(__cxa_descriptor_entry_t) == 8 bytes
Optional Index (v2)¶
If flags.has_optional_index=1, an index immediately follows the entries to accelerate lookups. The index format is intentionally simple and optional; a runtime may ignore it.
uint16_t index_count // number of index records
struct index_record {
uint16_t field_type // key
uint16_t entry_start // start (inclusive) in entries[] for this key or key range
uint16_t entry_count // count of entries for this key or key range
}[index_count]
When entries_sorted_by_type=1, index_count is typically small (e.g., one record per distinct field_type present). Runtimes may binary-search either the entries or the index.
Field Type Encoding¶
Standard fields occupy the 0x0001–0x00FF range. Vendor-specific fields are namespaced to avoid collisions.
Standard field (namespace 0): 0x0001–0x00FF
Vendor-specific field (namespace vendor_id): 0x8000 | (vendor_id << 8) | local_id - vendor_id is the same 8-bit code stored in header.vendor_id - local_id is vendor-local (0x01–0xFF) - Runtimes must only interpret vendor fields if the embedded vendor_id matches
header.vendor_id. Otherwise, ignore the entry.
Recommended standard field assignments (v2):
enum class __cxa_contract_violation_field_t : uint16_t {
// Pointers
source_location_ptr = 0x0001, // const __cxa_source_location*
source_text_ptr = 0x0002, // const char*
contract_label_ptr = 0x0003, // const char*
// Scalars
assertion_kind_u8 = 0x0011, // uint8_t
// detection_mode and evaluation_semantic are passed as entrypoint params
// and are not stored in static_data
};
Type ranges:
0x0000: Reserved (invalid)
0x0001–0x00FF: Standard fields (Itanium ABI committee)
0x0100–0x7FFF: Reserved for future standard expansion
0x8000–0xFFFF: Vendor fields with embedded vendor_id
Static Data Layout and Alignment¶
The static_data blob contains field values at specified offsets. Compilers may insert padding to satisfy alignment. Offsets must respect the natural alignment requirements of the referenced types on the target platform.
Rules:
The static_data base address must be aligned to header.data_alignment bytes (a power of two). Recommended minimum is alignof(void*).
For each entry, offset % alignof(field_type) == 0 must hold. - For pointer fields: alignof(void*) - For uint8_t scalars: alignof(uint8_t) (i.e., 1)
The runtime may validate these constraints for standard fields.
Offsets + known field sizes for standard fields must be within [0, data_size].
Vendors are responsible for alignment of their own field types; unknown vendor fields are ignored by standard runtimes.
Example layout (location + text + kind):
struct {
const __cxa_source_location* location; // offset 0, size 8 (on LP64)
const char* source_text; // offset 8, size 8
uint8_t assertion_kind; // offset 16, size 1
// padding may be present but is not required by the format
};
Binary example (little-endian, LP64):
static_data @ 0x6000 (data_size=17):
0x6000: 00 70 00 00 00 00 00 00 // location ptr → 0x7000
0x6008: 00 80 00 00 00 00 00 00 // text ptr → 0x8000
0x6010: 01 // kind = precondition
Descriptor Examples¶
Example: Three-entry descriptor (v2)¶
Header (16 bytes):
0x5000: 02 // version=2
0x5001: 02 // vendor_id=2 (Clang, example)
0x5002: 01 // flags: entries_sorted_by_type=1
0x5003: 00 // reserved0
0x5004: 03 00 // num_entries=3
0x5006: 10 00 // header_size=16
0x5008: 11 00 00 00 // data_size=17
0x500C: 08 // data_alignment=8
0x500D: 00 00 00 // reserved1
Entries (3 × 8 bytes):
0x5010: 01 00 00 00 00 00 00 00 // field_type=0x0001 (source_location_ptr), offset=0
0x5018: 02 00 00 00 08 00 00 00 // field_type=0x0002 (source_text_ptr), offset=8
0x5020: 11 00 00 00 10 00 00 00 // field_type=0x0011 (assertion_kind_u8), offset=16
Total size: 16 + 24 = 40 bytes (descriptor)
Code Generation¶
Basic Example¶
// Source
void withdraw(int amount)
pre(amount > 0)
{
balance -= amount;
}
Compiler Output (Pseudo-assembly)¶
withdraw:
cmp edi, 0
jg .L_contract_passed
.L_contract_failed:
lea rdi, [rip + .L_descriptor_v2] # static_descriptor
lea rsi, [rip + .L_static_data] # static_data
xor edx, edx # mode = predicate_false
xor ecx, ecx # semantic = enforced
xor r8d, r8d # dynamic_data = nullptr
xor r9d, r9d # reserved = nullptr
call __cxa_contract_violation_entrypoint
ud2
.L_contract_passed:
# ...
ret
.section .rodata
.p2align 4
.L_descriptor_v2:
.byte 2 # version
.byte 2 # vendor_id (Clang, exemplar)
.byte 1 # flags: entries_sorted_by_type
.byte 0 # reserved0
.short 3 # num_entries
.short 16 # header_size
.long 17 # data_size
.byte 8 # data_alignment
.byte 0,0,0 # reserved1
# entries
.short 0x0001 # source_location_ptr
.short 0 # reserved
.long 0 # offset
.short 0x0002 # source_text_ptr
.short 0
.long 8
.short 0x0011 # assertion_kind_u8
.short 0
.long 16
.p2align 3
.L_static_data:
.quad .L_source_location
.quad .L_source_text
.byte 0x01 # precondition
.L_source_location:
.quad .L_file_name # file_name pointer
.quad .L_function_name # function_name pointer
.long 42 # line
.long 8 # column
.L_source_text:
.asciz "amount > 0"
.L_file_name:
.asciz "bank.cpp"
.L_function_name:
.asciz "withdraw"
Field Omission Example¶
With -fno-contract-source-text:
.L_descriptor_v2_no_text:
.byte 2
.byte 2
.byte 1
.byte 0
.short 2 # 2 entries
.short 16
.long 9 # data_size
.byte 8 # data_alignment
.byte 0,0,0
.short 0x0001 # source_location_ptr
.short 0
.long 0
.short 0x0011 # assertion_kind_u8
.short 0
.long 8
.p2align 3
.L_static_data_no_text:
.quad .L_source_location
.byte 0x01
Savings: 8 bytes in static_data and one entry (8 bytes) in descriptor.
Vendor Extensions¶
Vendor field namespace¶
Vendor fields must use field_type = 0x8000 | (vendor_id << 8) | local_id.
Runtimes must only interpret vendor fields when the embedded vendor_id equals header.vendor_id; otherwise ignore.
Standard fields always use 0x0001–0x00FF.
Example (GCC local field 0x05):
# vendor_id=1 (GCC)
.byte 2 # version
.byte 1 # vendor_id
...
.short 0x8105 # 0x8000 | (1 << 8) | 0x05
.short 0
.long 0x11
Runtime Implementation¶
Descriptor Parsing and Validation (v2)¶
Runtimes should validate descriptors in hardened or debug builds:
header_size >= 16 and <= reasonable upper bound
flags reserved bits are zero
data_alignment is power-of-two and >= 1
num_entries * sizeof(entry) does not overflow
For each standard field entry: - offset + known_size <= data_size - offset % alignof(field_type) == 0
Optionally verify non-decreasing offsets (advisory)
Entry lookup¶
If flags.entries_sorted_by_type=1, use binary search; else linear scan.
Runtimes may cache resolved offsets per std::contract_violation instance to avoid repeated scans when multiple accessors are called.
Entrypoint¶
namespace __cxxabiv1 {
[[noreturn]]
void __cxa_contract_violation_entrypoint(
const __cxa_descriptor_table_t* static_descriptor,
const void* static_data,
__cxa_detection_mode_t mode,
__cxa_evaluation_semantic_t semantic,
const __cxa_runtime_data_t* dynamic_data,
void*)
{
// (Optional) reentrancy guard here
__cxa_contract_violation_info_t cv_info{
.static_descriptor = static_descriptor,
.static_data = static_data,
.mode = mode,
.semantic = semantic,
.dynamic_data = dynamic_data
};
std::contract_violation cv(&cv_info);
auto handler = get_contract_violation_handler();
handler(cv);
std::terminate();
}
} // namespace __cxxabiv1
Field Accessor API¶
namespace __cxxabiv1 {
bool __cxa_get_contract_violation_field(
const __cxa_contract_violation_info_t* cv_info,
__cxa_contract_violation_field_t field,
void* out_ptr);
} // namespace __cxxabiv1
Implementation sketch (v2):
static bool read_entry(const __cxa_descriptor_table_t* desc,
__cxa_contract_violation_field_t field,
const __cxa_descriptor_entry_t*& out) {
const auto* entries = reinterpret_cast<const __cxa_descriptor_entry_t*>(
reinterpret_cast<const unsigned char*>(desc) + desc->header_size);
uint16_t n = desc->num_entries;
if (desc->flags & 0x01) { // sorted
// binary search by field_type
uint16_t key = static_cast<uint16_t>(field);
int l = 0, r = n - 1;
while (l <= r) {
int m = (l + r) >> 1;
uint16_t t = entries[m].field_type;
if (t < key) l = m + 1; else if (t > key) r = m - 1; else {
out = &entries[m];
return true;
}
}
return false;
} else {
for (uint16_t i = 0; i < n; ++i) {
if (entries[i].field_type == static_cast<uint16_t>(field)) {
out = &entries[i];
return true;
}
}
return false;
}
}
bool __cxa_get_contract_violation_field(
const __cxa_contract_violation_info_t* cv_info,
__cxa_contract_violation_field_t field,
void* output_ptr)
{
// Direct parameters (not in descriptor)
switch (field) {
// If you expose detection_mode or evaluation_semantic via this API,
// handle them here as direct outputs.
default: break;
}
const auto* desc = cv_info->static_descriptor;
const auto* base = static_cast<const unsigned char*>(cv_info->static_data);
const __cxa_descriptor_entry_t* e = nullptr;
if (!read_entry(desc, field, e)) return false;
// Bounds check for standard fixed-size fields
auto within = [&](uint32_t size) {
return (e->offset <= desc->data_size) &&
(desc->data_size - e->offset >= size);
};
switch (field) {
case __cxa_contract_violation_field_t::source_location_ptr: {
if (!within(sizeof(void*))) return false;
auto ptr = *reinterpret_cast<const __cxa_source_location* const*>(base + e->offset);
*static_cast<const __cxa_source_location**>(output_ptr) = ptr;
return true;
}
case __cxa_contract_violation_field_t::source_text_ptr:
case __cxa_contract_violation_field_t::contract_label_ptr: {
if (!within(sizeof(void*))) return false;
auto ptr = *reinterpret_cast<const char* const*>(base + e->offset);
*static_cast<const char**>(output_ptr) = ptr;
return true;
}
case __cxa_contract_violation_field_t::assertion_kind_u8: {
if (!within(sizeof(uint8_t))) return false;
*static_cast<uint8_t*>(output_ptr) = *(base + e->offset);
return true;
}
default:
return false; // Unknown standard field
}
}
std::contract_violation¶
namespace std {
class contract_violation {
public:
explicit contract_violation(const __cxxabiv1::__cxa_contract_violation_info_t* info)
: m_info(info) {}
source_location location() const {
const __cxxabiv1::__cxa_source_location* loc = nullptr;
if (__cxxabiv1::__cxa_get_contract_violation_field(
m_info, __cxxabiv1::__cxa_contract_violation_field_t::source_location_ptr, &loc)
&& loc) {
return source_location{ loc->line, loc->column, loc->file_name, loc->function_name };
}
return source_location{};
}
string_view comment() const {
const char* text = nullptr;
if (__cxxabiv1::__cxa_get_contract_violation_field(
m_info, __cxxabiv1::__cxa_contract_violation_field_t::source_text_ptr, &text)
&& text) {
return string_view{text};
}
return string_view{};
}
// ... other accessors ...
private:
const __cxxabiv1::__cxa_contract_violation_info_t* m_info;
// Implementations may memoize looked-up offsets for repeated access
};
} // namespace std
Dynamic Data (TLV, optional)¶
Runtimes may pass additional transient data using a simple TLV encoding via entrypoint dynamic_data. This avoids future ABI surface changes.
struct __cxa_tlv_header {
uint16_t field_type; // same encoding rules as descriptor field_type
uint16_t length; // length of payload in bytes (0 == terminator when field_type==0)
// uint8_t payload[length];
};
struct __cxa_runtime_data_t {
const uint8_t* base; // points to a sequence of (header, payload) ... terminated by (0,0)
};
Consumers should ignore unknown TLVs.
The TLV stream is independent of static_data; it is not covered by data_size.
Advantages¶
ABI-stable evolution with field-level extensibility
True field omission (0 bytes for omitted fields)
Vendor extensibility without collisions via namespacing
Compact descriptor; static_data tightly packed but aligned
Optional indexing for faster lookups; otherwise linear/binary search
Hardened with bounds and alignment checks
Disadvantages and Trade-offs¶
Slightly larger header (16 bytes) versus v1, but negligible compared to code
More specification detail (alignment, validation)
Optional index increases complexity, but is purely optional
Deduplication Scope¶
Descriptors may be deduplicated by the linker within a link unit (e.g., via COMDAT/section folding).
Deduplication does not occur across DSOs at runtime.
Validation and Debugging¶
Runtimes should provide debug-mode validation and assert failures on malformed descriptors.
Pretty-printers can show header, entries, and decode standard fields.
Recommendation¶
Adopt the v2 descriptor with native-endian encoding, 32-bit offsets, explicit sizes, and alignment semantics. This preserves the core benefits of the descriptor approach while addressing portability, safety, and extensibility concerns.