Calling Conventions
Before we go any further… It is important to understand that this section isn’t a general purpose description of the present calling
conventions. It merely explains the calling conventions for the parameter/return types supported by
dyncall (not for e.g. unsupported types like SIMD data types (__m64, __m128, __m128i, __m128d),
etc.).
We strongly advise the reader not to use this document as a general purpose calling convention
reference.
x86 Calling Conventions
Overview On this processor, a word is defined to be 16 bits in size, a dword 32 bits and a qword 64
bits.
There are numerous different calling conventions on the x86 processor architecture, like cdecl [8], MS
fastcall [10], GNU fastcall [11], Borland fastcall [12], Watcom fastcall [13], Win32 stdcall [9], MS thiscall [14],
GNU thiscall [15], the pascal calling convention [16] and a cdecl-like version for Plan9 [17] (dubbed plan9call
by us), etc.
# of regs | # regs to | cleanup | 64bit args | ||
Name | for params | # preserve | push order | by | via regs? |
cdecl | 0 | 4 | ← | caller | - |
MS fastcall | 2 | 4 | ← | callee | Y |
GNU fastcall | 2 | 4 | ← | callee | N |
Borland fastcall | 3 | 4 | → | callee | N |
Watcom fastcall | 4 | 2-6 | ← | callee | N |
win32 stdcall | 0 | 4 | ← | callee | - |
MS thiscall | 1 | 4 | ← | callee | N |
GNU thiscall | 0 | 4 | ← | caller | - |
pascal | 0 | 4 | → | callee | - |
plan9call | 0 | 0 | ← | caller | - |
Currently cdecl, stdcall, fastcall (MS and GNU), thiscall (MS and GNU) and plan9call are
supported.
Dyncall can also be used to issue syscalls on Linux and *BSD by using the syscall number as target parameter
and selecting the correct mode.
cdecl
Registers and register usage- stack parameter order: right-to-left
- caller cleans up the stack
- all arguments are pushed onto the stack (as dwords)
- arguments > 64 bits are pushed as a sequence of dwords
- aggregates (structs, unions) are pushed as a sequence of dwords
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning via the stack), and callee writes return value to this space; the ptr to the aggregate is returned in eax
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers and aggregates (structs, unions) > 32 and <= 64 bits are returned via the eax and edx registers
- return values > 64 bits (e.g. aggregates) are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit first parameter (this means, on the stack)
- floating point types are returned via the st0 register (except on Minix, where they are returned as integers are)
Stack directly after function prolog:
MS fastcall
Registers and register usageName | Brief description |
eax | scratch, return value |
ebx | preserve |
ecx | scratch, parameter 0 |
edx | scratch, parameter 1, return value |
esi | preserve |
edi | preserve |
ebp | preserve |
esp | stack pointer |
st0 | scratch, floating point return value |
st1-st7 | scratch |
- stack parameter order: right-to-left
- called function cleans up the stack
- first two integers/pointers (<= 32bit) are passed via ecx and edx (even if preceded by other arguments)
- if first argument is a 64 bit integer, it is passed via ecx and edx
- all other parameters are pushed onto the stack (as dwords)
- arguments > 64 bits are pushed as a sequence of dwords
- aggregates (structs, unions) are pushed as a sequence of dwords, but are never split between registers and stack (if registers are still available and aggregate doesn’t fit entirely into ecx and edx, it is passed via the stack and remaining registers are free for subsequent arguments)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- return values of pointer or integral type, as well as aggregates (structs, unions) <= 64 are returned via the eax and edx registers
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning via ecx), and callee writes return value to this space; the ptr to the aggregate is returned in eax
- return values > 64 bits (e.g. aggregates) are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit first parameter (always via the stack, never via a register)
- floating point types are returned via the st0 register
Stack directly after function prolog:
GNU fastcall
Registers and register usageName | Brief description |
eax | scratch, return value |
ebx | preserve |
ecx | scratch, parameter 0 |
edx | scratch, parameter 1, return value |
esi | preserve |
edi | preserve |
ebp | preserve |
esp | stack pointer |
st0 | scratch, floating point return value |
st1-st7 | scratch |
- stack parameter order: right-to-left
- called function cleans up the stack
- first two integers/pointers (<= 32bit) are passed via ecx and edx (even if preceded by other arguments)
- arguments > 32 bits are pushed onto the stack as a sequence of dwords (never passed via registers, any respective register is skipped and not used for subsequent args)
- all other parameters are pushed onto the stack (as dwords)
- aggregates (structs, unions) are pushed as a sequence of dwords, and never passed via registers (no matter their size, any respective register is skipped and not used for subsequent args)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- varargs are always passed via the stack
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers > 32 and <= 64 bits are returned via the eax and edx registers
- aggregates (structs, unions) of any size are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit first parameter (always via ecx), that same pointer is returned in eax
- floating point types are returned via the st0
Stack directly after function prolog:
Borland fastcall
Also called register convention by Borland. Registers and register usage
Name | Brief description |
eax | scratch, parameter 0, return value |
ebx | preserve |
ecx | scratch, parameter 2 |
edx | scratch, parameter 1, return value |
esi | preserve |
edi | preserve |
ebp | preserve |
esp | stack pointer |
st0 | scratch, floating point return value |
st1-st7 | scratch |
- stack parameter order: left-to-right
- called function cleans up the stack
- first three integers/pointers (with exception of method pointers) (<= 32bit) are passed via eax, ecx and edx (preceding or interleaved arguments that are not passed via registers are pushed onto the stack)
- arguments > 32 bits are passed as a pointer to the value
- aggregates (structs, unions) are pushed as a sequence of dwords, and never passed via registers (no matter their size)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- varargs are always passed via the stack
- all other parameters are pushed onto the stack
- the direction flag is clear on entry and must be returned clear
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning via ecx), and callee writes return value to this space; the ptr to the aggregate is returned in eax
- integers and aggregates (structs, unions) > 32 and <= 64 bits are returned via the eax and edx registers
- floating point types are returned via the st0 register
- return values > 32 bits (e.g. aggregates, long long, ...) are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit last parameter
Stack directly after function prolog:
Watcom fastcall
Registers and register usageName | Brief description |
eax | scratch, parameter 0, return value |
ebx | scratch when used for parameter, otherwise preserve, parameter 2 |
ecx | scratch when used for parameter, otherwise preserve, parameter 3 |
edx | scratch when used for parameter, otherwise preserve, parameter 1, return value |
esi | scratch when used for return pointer, otherwise preserve |
edi | preserve |
ebp | preserve |
esp | stack pointer |
st0 | scratch, floating point return value |
st1-st7 | scratch |
- stack parameter order: right-to-left
- called function cleans up the stack
- first four integers/pointers (<= 32bit) are passed via eax, edx, ebx and ecx (even if preceded by other arguments)
- arguments > 32 bits, as well as all subsequent arguments, are passed via the stack
- aggregates (structs, unions) are passed as a pointer to the aggregate (a copy, if needed, to guarantee by-value semantics)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- all other parameters are pushed onto the stack
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers > 32 bits and <= 64 bits are returned via the eax and edx registers
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee via esi, and callee writes return value to this space; the ptr to the aggregate is returned in eax
- aggregates (structs, unions) <= 32 bits are returned in eax
- aggregates (structs, unions) > 32 bits are returned by the caller allocating the space and passing a pointer to the callee via esi, that same pointer is returned in eax
Stack directly after function prolog:
win32 stdcall
Registers and register usage- stack parameter order: right-to-left
- called function cleans up the stack
- all parameters are pushed onto the stack (as dwords)
- arguments > 64 bits are pushed as a sequence of dwords
- aggregates (structs, unions) are pushed as a sequence of dwords
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- stack is usually 4 byte aligned (GCC >= 3.x seems to use a 16byte alignment)
- the direction flag is clear on entry and must be returned clear
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers > 32 and <= 64 bits are returned via the eax and edx registers
- for aggregates and integer return values > 64 bits, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning via stack), and callee writes return value to this space; the ptr to the aggregate is returned in eax
- floating point types are returned via the st0 register
Stack directly after function prolog:
MS thiscall
Registers and register usage- stack parameter order: right-to-left
- called function cleans up the stack (except for variadic functions where the caller cleans up)
- first parameter (this pointer) is passed via ecx
- all other parameters are pushed onto the stack
- arguments > 64 bits are pushed as a sequence of dwords
- aggregates (structs, unions) are pushed as a sequence of dwords
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers > 32 bits and <= 64 bits are returned via the eax and edx
- aggregates (structs, unions) of any size are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit first parameter, that same pointer is returned in eax
- floating point types are returned via the st0 register
Stack directly after function prolog:
GNU thiscall
This is equivalent to the cdecl calling convention, with the first parameter being the this pointer.
pascal
The best known uses of the pascal calling convention are the 16 bit OS/2 APIs, Microsoft Windows 3.x and Borland Delphi 1.x. It is a variation of stdcall, however, arguments are passed from left-to-right. Since this calling convention is for 16-bit APIs, it is not discussed in further detail, here.
plan9call
Registers and register usage- stack parameter order: right-to-left
- caller cleans up the stack
- all parameters are pushed onto the stack
- all parameters are pushed onto the stack (as dwords)
- arguments > 64 bits are pushed as a sequence of dwords
- aggregates (structs, unions) are pushed as a sequence of dwords
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- return values of pointer or integral type (<= 32 bits) are returned via the eax register
- integers > 32 bits and aggregates (structs, unions) of any size are returned by the caller allocating the space and passing a pointer to the callee as a new, implicit first parameter, that same pointer is returned in eax
- floating point types are returned via the st0 register (called F0 in plan9 8a’s terms)
Note there is no register save area at all. Stack directly after function prolog:
Linux syscalls
Parameter passing- syscall is issued by triggering interrupt 80h
- syscall number is set in eax
- params are passed in the following registers in this order: ebx, ecx, edx, esi, edi, ebp
- for more than six arguments, ebx points to the list of further arguments (not used in practice, as Linux syscalls use a maximum of 5 arguments)
- register eax holds the return value
*BSD syscalls
Parameter passing- syscall is issued by triggering interrupt 80h
- syscall number is set in eax
- params are passed on the stack as with the cdecl calling convention
x64 Calling Conventions
Overview The x64 (64bit) architecture designed by AMD is based on Intel’s x86 (32bit) architecture,
supporting it natively. It is sometimes referred to as x86-64, AMD64, or, cloned by Intel, EM64T or
Intel64.
On this processor, a word is defined to be 16 bits in size, a dword 32 bits and a qword 64 bits. Note that this
is due to historical reasons (terminology didn’t change with the introduction of 32 and 64 bit
processors).
The x64 calling convention for MS Windows [25] differs from the SystemV x64 calling convention [26] used by
Linux/*BSD/... Note that this is not the only difference between these operating systems. The
64 bit programming model in use by 64 bit windows is LLP64, meaning that the C types int
and long remain 32 bits in size, whereas long long becomes 64 bits. Under Linux/*BSD/... it’s
LP64.
Compared to the x86 architecture, the 64 bit versions of the registers are called rax, rbx, etc.. Furthermore,
there are eight new general purpose registers r8-r15.
dyncall support
Currently, the MS Windows and System V calling conventions are supported.
Dyncall can also be used to issue syscalls on System V platforms by using the syscall number as target
parameter and selecting the correct mode.
MS Windows
Registers and register usageName | Brief description |
rax | scratch, return value |
rbx | permanent |
rcx | scratch, parameter 0 if integer or pointer |
rdx | scratch, parameter 1 if integer or pointer |
rdi | permanent |
rsi | permanent |
rbp | permanent, may be used as frame pointer |
rsp | stack pointer |
r8-r9 | scratch, parameter 2 and 3 if integer or pointer |
r10-r11 | scratch, permanent if required by caller (used for syscall/sysret) |
r12-r15 | permanent |
xmm0 | scratch, floating point parameter 0, floating point return value |
xmm1-xmm3 | scratch, floating point parameters 1-3 |
xmm4-xmm5 | scratch, permanent if required by caller |
xmm6-xmm15 | permanent |
- stack parameter order: right-to-left
- caller cleans up the stack
- first 4 integer/pointer parameters are passed via rcx, rdx, r8, r9 (from left to right), others are pushed on stack (there is a spill area for the first 4)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- aggregates (structs and unions) < 64 bits are passed like equal-sized integers
- float and double parameters are passed via xmm0l-xmm3l
- first 4 parameters are passed via the correct register depending on the parameter type - with mixed float and int parameters, some registers are left out (e.g. first parameter ends up in rcx or xmm0, second in rdx or xmm1, etc.)
- parameters in registers are right justified
- parameters < 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always passed as a qword)
- parameters > 64 bits are passed by via a pointer to a copy (for aggregate types, that caller-allocated memory must be 16-byte aligned)
- if callee takes address of a parameter, first 4 parameters must be dumped (to the reserved space on the stack) - for floating point parameters, value must be stored in integer AND floating point register
- caller cleans up the stack, not the callee (like cdecl)
- stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are already aligned
- ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision as required by ellipsis calls)
- if size of parameters > 1 page of memory (usually between 4k and 64k), chkstk must be called
- return values of pointer, integral or aggregate (structs and unions) type (<= 64 bits) are returned via the rax register
- floating point types are returned via the xmm0 register
- for any other type > 64 bits (or for non-trivial C++ aggregates of any size), a hidden first parameter, with an address to the return value is passed (for C++ thiscalls it is passed as second parameter, after the this pointer)
Stack frame is always 16-byte aligned. Stack directly after function prolog:
System V (Linux / *BSD / MacOS X)
Registers and register usageName | Brief description |
rax | scratch, return value, special use for varargs (in al, see below) |
rbx | permanent |
rcx | scratch, parameter 3 if integer or pointer |
rdx | scratch, parameter 2 if integer or pointer, return value |
rdi | scratch, parameter 0 if integer or pointer |
rsi | scratch, parameter 1 if integer or pointer |
rbp | permanent, may be used as frame pointer |
rsp | stack pointer |
r8-r9 | scratch, parameter 4 and 5 if integer or pointer |
r10-r11 | scratch |
r12-r15 | permanent |
xmm0-xmm1 | scratch, floating point parameters 0-1, floating point return value |
xmm2-xmm7 | scratch, floating point parameters 2-7 |
xmm8-xmm15 | scratch |
st0-st1 | scratch, 16 byte floating point return value |
st2-st7 | scratch |
- stack parameter order: right-to-left
- caller cleans up the stack
- first 6 integer/pointer parameters are passed via rdi, rsi, rdx, rcx, r8, r9
- first 8 floating point parameters <= 64 bits are passed via xmm0l-xmm7l
- parameters in registers are right justified
- parameters that are not passed via registers are pushed onto the stack (with their sizes rounded up to qwords)
- parameters < 64bits are not zero extended - zero the upper bits contiaining garbage if needed (but they are always passed as a qword)
- integer/pointer parameters > 64 bit are passed via 2 registers
- if callee takes address of a parameter, number of used xmm registers is passed silently in al (passed number doesn’t need to be exact but an upper bound on the number of used xmm registers)
- aggregates (structs, unions (and arrays within those)) follow a more complicated logic (the following
only considers field types supported by dyncall):
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- aggregates > 16 bytes are always passed entirely via the stack
- all other aggregates are classified per qword, by looking at all fields occupying all or part of that
qword, recursively
- if any field would be passed via the stack, the entire qword will
- otherwise, if any field would be passed like an integer/pointer value, the entire qword will
- otherwise the qword is passed like a floating point value
- after qword classification, the logic is:
- if any qword is classified to be passed via the stack, the entire aggregate will
- if the size of the aggregate is > 2 qwords, it is passed via the stack (except for single floating point values > 128bits)
- all others are passed qword by qword according to their classification, like individual arguments
- however, an aggregate is never split between registers and the stack, if it doesn’t fit into available registers it is entirely passed via the stack (freeing such registers for subsequent arguments)
- stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are already aligned
- no spill area is used on stack, iterating over varargs requires a specific va_list implementation
- return values of pointer or integral type are returned via the rax register (and rdx if needed)
- floating point types are returned via the xmm0 register (and xmm1 if needed)
- aggregates are first classified in the same way as when passing them by value, then:
- for aggregates that would be passed via the stack (or for non-trivial C++ aggregates of any size), a hidden pointer to a non-shared, caller provided space is passed as hidden, first argument; this pointer will be returned via rax
- otherwise, qword by qword is passed, using rax and rdx for integer/pointer qwords, and xmm0 and xmm1 for floating point ones
- floating point values > 64 bits are returned via st0 and st1
Stack frame is always 16-byte aligned. A 128 byte large zone beyond the location pointed to by the stack
pointer is referred to as ”red zone”, considered to be reserved and not be modified by signal or interrupt
handlers (useful for temporary data not needed to be preserved across calls, and for optimizations for leaf
functions). Stack directly after function prolog:
System V syscalls
Parameter passing- syscall is issued via the syscall instruction
- kernel destroys registers rcx and r11
- syscall number is set in rax
- params are passed in the following registers in this order: rdi, rsi, rdx, rcx, r8, r9
- no stack in use, meaning syscalls are in theory limited to six arguments
- register rax holds the return value (values in between -4095 and -1 indicate errors)
PowerPC (32bit) Calling Conventions
Overview- Word size is 32 bits
- Big endian (MSB) and litte endian (LSB) operating modes.
- Processor operates on floats in double precision floating point arithmetc (IEEE-754) values directly (single precision is converted on the fly)
- Apple macos/Mac OS X/Darwin PPC is specified in ”Mac OS X ABI Function Call Guide”[32]. It uses Big Endian (MSB)
- Linux PPC 32-bit ABI is specified in ”LSB for PPC”[33] which is based on ”System V ABI”. It uses Big Endian (MSB)
- PowerPC EABI is defined in the ”PowerPC Embedded Application Binary Interface 32-Bit Implementation”[34]
- There is also the ”PowerOpen ABI”[36], a nearly identical version of it is used in AIX
Dyncall
and dyncallback are supported for PowerPC (32bit) Big Endian (MSB), for Darwin’s and System V’s calling convention.Dyncall can also be used to issue syscalls by using the syscall number as target parameter and selecting the correct mode.
Mac OS X/Darwin
Registers and register usageName | Brief description |
gpr0 | scratch |
gpr1 | stack pointer |
gpr2 | scratch |
gpr3,gpr4 | return value, parameter 0 and 1 for integer or pointer, scratch |
gpr5-gpr10 | parameter 2-7 for integer or pointer parameters, scratch |
gpr11 | preserve |
gpr12 | branch target for dynamic code generation |
gpr13-31 | preserve |
fpr0 | scratch |
fpr1 | floating point return value, floating point parameter 0 (always double precision) |
fpr2-fpr13 | floating point parameters 1-12 (always double precision) |
fpr14-fpr31 | preserve |
v0-v1 | scratch |
v2-v13 | vector parameters |
v14-v19 | scratch |
v20-v31 | preserve |
lr | link-register, scratch |
ctr | count-register, scratch |
cr0-cr7 | conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch) |
- stack grows down
- stack parameter order: right-to-left
- caller cleans up the stack
- the first 8 integer parameters are passed in registers gpr3-gpr10
- the first 13 floating point parameters are passed in registers fpr1-fpr13
- 64 bit arguments are passed as if they were two 32 bit arguments, without skipping registers for alignment (this means passing half via a register and half via the stack is allowed)
- if a float parameter is passed via a register, gpr registers are skipped for subsequent integer parameters (based on the size of the float - 1 register for single precision and 2 for double precision floating point values)
- the caller pushes subsequent parameters onto the stack
- for every parameter passed via a register, space is reserved in the stack parameter area (in order to spill the parameters if needed - e.g. varargs)
- ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision as required by ellipsis calls)
- all nonvector parameters are aligned on 4-byte boundaries
- vector parameters are aligned on 16-byte boundaries
- composite parameters with size of 1 or 2 bytes occupy low-order bytes of their 4-byte area. INCONSISTENT with other 32-bit PPC binary interfaces. In AIX and mac OS 9, padding bytes always follow the data structure
- composite parameters 3 bytes or larger in size occupy high-order bytes
- integer parameters < 32 bit are right-justified (meaning occupy higher-address bytes) in their 4-byte slot on the stack, requiring extra-care for big-endian targets
- aggregates (struct, union) with only one (non-aggregate / non-array) field are passed as if the field itself would be passed
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- all other aggregates are passed as a sequence of words (like integer parameters)
- return values of integer <= 32bit or pointer type use gpr3
- 64 bit integers use gpr3 and gpr4 (hiword in gpr3, loword in gpr4)
- floating point values are returned via fpr1
- for all aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in gpr3), and callee writes return value to this space; the ptr to the aggregate is returned in gpr3
Stack frame is always 16-byte aligned. Prolog opens frame with additional, fixed space for a linkage area,
to hold a number of values (not all of them are required to be saved, though). Stack directly after function
prolog:
System V PPC 32-bit
Status Registers and register usageName | Brief description |
r0 | scratch |
r1 | stack pointer, preserve |
r2 | system-reserved |
r3-r4 | parameter passing and return value, scratch |
r5-r10 | parameter passing, scratch |
r11-r12 | scratch |
r13 | small data area pointer register |
r14-r30 | local variables, preserve |
r31 | used for local variables or environment pointer, preserve |
f0 | scratch |
f1 | parameter passing and return value, scratch |
f2-f8 | parameter passing, scratch |
f9-13 | scratch |
f14-f31 | local variables, preserve |
cr0-cr7 | conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch) |
lr | link register, scratch |
ctr | count register, scratch |
xer | fixed-point exception register, scratch |
fpscr | floating-point Status and Control Register |
- Stack pointer (r1) is always 16-byte aligned. The EABI differs here - it is 8-byte alignment
- 8 general-purpose registers (r3-r10) for integer and pointer types
- 8 floating-pointer registers (f1-f8) for float (promoted to double) and double types
- Additional arguments are passed on the stack directly after the back-chain and saved return address (8 bytes structure) on the callers stack frame
- 64-bit integer data types are passed in general-purpose registers as a whole in two 32-bit general purpose registers (an odd and an even e.g. r3 and r4), skipping an even integer register or passed on the stack; they are never splitted into a register and stack part
- Ellipsis calls set CR bit 6
- integer parameters < 32 bit are right-justified (meaning occupy high-order bytes) in their 4-byte area, requiring extra-care for big-endian targets
- no spill area is used on stack, iterating over varargs requires a specific va_list implementation
- aggregates (struct, union) and types > 64 bits are passed indirectly, as a pointer to the data (or a copy of it, if necessary to avoid modification)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- 32-bit integers use register r3, 64-bit use registers r3 and r4 (hiword in r3, loword in r4)
- floating-point values are returned using register f1
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in gpr3), and callee writes return value to this space; the ptr to the aggregate is returned in gpr3
- aggregates (struct, union) <= 64 bits use gpr3 and gpr4
- for all other aggregates and types > 64 bits, a secret first parameter with an address to a caller allocated space is passed to the function (in gpr3), which is written to by the callee
Stack frame is always 16-byte aligned. Stack directly after function prolog:
System V PPC 32-bit / Linux Standard Base version
This is in essence the same as the System V PPC 32-bit calling convention, but differs for aggregate return values:
- for all aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in gpr3), and callee writes return value to this space; the ptr to the aggregate is returned in gpr3
System V syscalls
Parameter passing- syscall is issued via the sc instruction
- kernel destroys registers r13
- syscall number is set in r0
- params are passed in registers r3 through r10
- no stack in use, meaning syscalls are in theory limited to eight arguments
- register r3 holds the return value, overflow flag in conditional register cr0 signals errors in syscall
PowerPC (64bit) Calling Conventions
Overview- Word size is 32 bits for historical reasons
- Doublework size is 64 bits.
- Big endian (MSB) and litte endian (LSB) operating modes.
- Apple Mac OS X/Darwin PPC is specified in ”Mac OS X ABI Function Call Guide”[32]. It uses Big Endian (MSB).
- Linux PPC 64-bit ABI is specified in ”64-bit PowerPC ELF Application Binary Interface Supplement”[37] which is based on ”System V ABI”.
Dyncall
and dyncallback are supported for PowerPC (64bit) Big Endian and Little Endian ELF ABIs on System V systems. Mac OS X is not supported.Dyncall can also be used to issue syscalls by using the syscall number as target parameter and selecting the correct mode.
PPC64 ELF ABI
Registers and register usageName | Brief description |
gpr0 | scratch |
gpr1 | stack pointer |
gpr2 | TOC base ptr (offset table and data for position independent code), scratch |
gpr3 | return value, parameter 0 for integer or pointer, scratch |
gpr4-gpr10 | parameter 1-7 for integer or pointer parameters, scratch |
gpr11 | env pointer if needed, scratch |
gpr12 | used for exception handling and glink code, scratch |
gpr13 | used for system thread ID, preserve |
gpr14-31 | preserve |
fpr0 | scratch |
fpr1-fpr4 | floating point return value, floating point parameter 0-3 (always double precision) |
fpr5-fpr13 | floating point parameters 4-12 (always double precision) |
fpr14-fpr31 | preserve |
v0-v1 | scratch |
v2-v13 | vector parameters |
v14-v19 | scratch |
v20-v31 | preserve |
lr | link-register, scratch |
ctr | count-register, scratch |
xer | fixed point exception register, scratch |
fpscr | floating point status and control register, scratch |
cr0-cr7 | conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch) |
- stack grows down
- stack parameter order: right-to-left
- caller cleans up the stack
- stack is always 16 byte aligned
- the stack pointer must be atomically updated (to avoid any timing window in which an interrupt can occur with a partially updated stack), usually with the stdu (store doubleword with update) instruction
- the first 8 integer parameters are passed in registers gpr3-gpr10
- the first 13 floating point parameters are passed in registers fpr1-fpr13
- preserved registers are saved using a defined order (from high to low addresses): fpr* (64bit aligned), gpr*, VRSAVE save word (32 bits), padding for alignment (4 or 12 bytes), v* (128bit aligned)
- if a floating point parameter is passed via a register, a gpr registers is skipped for subsequent integer parameters
- the caller pushes subsequent parameters onto the stack
- single precision floating point values use the second word in a doubleword
- a quad precision floating point argument is passed as two consecutive double precision ones
- integer types < 64 bit are sign or zero extended and use a doubleword
- ellipsis calls take floating point values in int and float registers (single precision floats are promoted to double precision as required by ellipsis calls)
- space for all potential gpr* register passed arguments is reserved in the stack parameter area (in order to spill the parameters if needed - e.g. varargs), meaning a minimum of 64 bytes to hold gpr3-gpr10
- all nonvector parameters are aligned on 8-byte boundaries
- vector parameters are aligned on 16-byte boundaries
- integer parameters < 64 bit are right-justified (meaning occupy higher-address bytes) in their 8-byte slot on the stack, requiring extra-care for big-endian targets
- aggregates (struct, union) are passed as a sequence of doublewords (following above rules for doublewords)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- return values of integer <= 32bit or pointer type use gpr3 and are zero or sign extended depending on their type
- 64 bit integers use gpr3
- floating point values are returned via fpr1
- for any aggregate (struct, union), the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in gpr3), and callee writes return value to this space; the ptr to the aggregate is returned in gpr3
Stack frame is always 16-byte aligned. Stack directly after function prolog:
System V syscalls
Parameter passing- syscall is issued via the sc instruction
- kernel destroys registers r13
- syscall number is set in r0
- params are passed in registers r3 through r10
- no stack in use, meaning syscalls are in theory limited to eight arguments
- register r3 holds the return value, overflow flag in conditional register cr0 signals errors in syscall
ARM32 Calling Conventions
Overview The ARM32 family of processors is based on the Advanced RISC Machines (ARM) processor architecture
(32 bit RISC). The word size is 32 bits (and the programming model is LLP64).
Basically, this family of microprocessors can be run in 2 major modes:
Mode | Description |
ARM | 32bit instruction set |
THUMB | compressed instruction set using 16bit wide instruction encoding |
For more details, take a look at the ARM-THUMB Procedure Call Standard (ATPCS) [18], the Procedure Call Standard for the ARM Architecture (AAPCS) [19], as well as Debian’s ARM EABI port [23] and hard-float [24] wiki pages.
dyncall support
Currently, the dyncall library supports the ARM and THUMB mode of the ARM32 family (ATPCS [18],
EABI [23], the ARM hard-float (armhf) [23] varian, as well as Apple’s calling convention based on the
ATPCS), excluding manually triggered ARM-THUMB interworking calls.
Also supported is armhf, a calling convention with register support to pass floating point numbers. FPA and
the VFP (scalar mode) procedure call standards, as well as some instruction sets accelerating DSP and
multimedia application like the ARM Jazelle Technology (direct Java bytecode execution, providing
acceleration for some bytecodes while calling software code for others), etc., are not supported by the dyncall
library.
ATPCS ARM mode
Registers and register usage In ARM mode, the ARM32 processor has sixteen 32 bit general purpose registers, namely
r0-r15:
Name | Alias | Brief description |
r0 | a1 | parameter 0, scratch, return value |
r1 | a2 | parameter 1, scratch, return value |
r2,r3 | a3,a4 | parameters 2 and 3, scratch |
r4-r9 | v1-v6 | permanent |
r10 | sl | permanent |
r11 | fp | frame pointer, permanent |
r12 | ip | scratch |
r13 | sp | stack pointer, permanent |
r14 | lr | link register, permanent |
r15 | pc | program counter (note: due to pipeline, r15 points to 2 instructions ahead) |
- stack parameter order: right-to-left
- caller cleans up the stack
- first four words are passed using r0-r3
- subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
- if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
- parameters <= 32 bits are passed as 32 bit words
- 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn’t seem to be specified in the ATPCS)
- aggregates (struct, union) are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words (splitting across registers and stack is allowed)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
- return values <= 32 bits use r0
- 64 bit return values use r0 and r1
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
- aggregates (struct, union) <= 32 bits are returned like an integer (in r0)
- aggregates (struct, union) > 32 bits the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
- for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
Stack directly after function prolog:
ATPCS THUMB mode
Status Registers and register usage In THUMB mode, the ARM32 processor family supports eight 32 bit general purpose registers r0-r7 and
access to high order registers r8-r15:
Name | Alias | Brief description |
r0 | a1 | parameter 0, scratch, return value |
r1 | a2 | parameter 1, scratch, return value |
r2,r3 | a3,a4 | parameters 2 and 3, scratch |
r4-r6 | v1-v3 | permanent |
r7 | v4 | frame pointer, permanent |
r8-r11 | v5-v8 | permanent |
r12 | ip | scratch |
r13 | sp | stack pointer, permanent |
r14 | lr | link register, permanent |
r15 | pc | program counter (note: due to pipeline, r15 points to 2 instructions ahead) |
- stack parameter order: right-to-left
- caller cleans up the stack
- first four words are passed using r0-r3
- subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
- if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
- parameters <= 32 bits are passed as 32 bit words
- 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn’t seem to be specified in the ATPCS)
- aggregates (struct, union) are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words (splitting across registers and stack is allowed)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
- return values <= 32 bits use r0
- 64 bit return values use r0 and r1
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
- aggregates (struct, union) <= 32 bits are returned like an integer (in r0)
- aggregates (struct, union) > 32 bits the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
- for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
Stack directly after function prolog:
EABI (ARM and THUMB mode)
The ARM EABI is very similar to the ABI outlined in ARM-THUMB procedure call standard (ATPCS) [18]
- however, the EABI requires the stack to be 8-byte aligned at function entries, as well as for 64 bit
parameters. The latter are aligned on 8-byte boundaries on the stack and 2-registers for a parameter passed
via register. In order to achieve such an alignment, a register might have to be skipped for parameters passed
via registers, or 4-bytes on the stack for parameters passed via the stack. Refer to the Debian ARM EABI port
wiki for more information [23].
ARM on Apple’s iOS (Darwin) Platform (ARM and THUMB mode)
The iOS runs on ARMv6 (iOS 2.0) and ARMv7 (iOS 3.0) architectures. Both, ARM and THUMB are
available, code is usually compiled in THUMB mode.
Register usage
Name | Alias | Brief description |
r0 | parameter 0, scratch, return value | |
r1 | parameter 1, scratch, return value | |
r2,r3 | parameters 2 and 3, scratch | |
r4-r6 | permanent | |
r7 | frame pointer, permanent | |
r8 | permanent | |
r9 | permanent (iOS 2.0) / scratch (since iOS 3.0) | |
r10-r11 | permanent | |
r12 | scratch, intra-procedure scratch register (IP) used by dynamic linker | |
r13 | sp | stack pointer, permanent |
r14 | lr | link register, permanent |
r15 | pc | program counter (note: due to pipeline, r15 points to 2 instructions ahead) |
cpsr | program status register | |
d0-d7 | scratch, aliases s0-s15, on ARMv7 also as q0-q3; not accessible from Thumb mode on ARMv6 | |
d8-d15 | permanent, aliases s16-s31, on ARMv7 also as q4-q7; not accesible from Thumb mode on ARMv6 | |
d16-d31 | only available in ARMv7, aliases q8-q15 | |
fpscr | VFP status register | |
The ABI is based on the AAPCS but with the following important differences:
- in ARM mode, r7 is used as frame pointer instead of r11 (so both, ARM and THUMB mode use the same convention)
- r9 does not need to be preserved on iOS 3.0 and greater
Stack directly after function prolog:
ARM hard float (armhf)
Most debian-based Linux systems on ARMv7 (or ARMv6 with FPU) platforms use a calling convention referred to as armhf, using 16 32-bit floating point registers of the FPU of the VFPv3-D16 extension to the ARM architecture. Refer to the debian wiki for more information [24].
Code is little-endian, rest is similar to EABI with an 8-byte aligned stack, etc..
Register usage
Name | Alias | Brief description |
r0 | a1 | parameter 0, scratch, non floating point return value |
r1 | a2 | parameter 1, scratch, non floating point return value |
r2,r3 | a3,a4 | parameters 2 and 3, scratch |
r4-r9 | v1-v6 | permanent |
r10 | sl | permanent |
r11 | fp | frame pointer, permanent |
r12 | ip | scratch, intra-procedure scratch register (IP) used by dynamic linker |
r13 | sp | stack pointer, permanent |
r14 | lr | link register, permanent |
r15 | pc | program counter (note: due to pipeline, r15 points to 2 instructions ahead) |
cpsr | program status register | |
s0 | floating point argument, floating point return value, single precision | |
d0 | floating point argument, floating point return value, double precision, aliases s0-s1 | |
s1-s15 | floating point arguments, single precision | |
d1-d7 | aliases s2-s15, floating point arguments, double precision | |
fpscr | VFP status register | |
- stack parameter order: right-to-left
- caller cleans up the stack
- first four non-floating-point words are passed using r0-r3
- out of those, 64bit parameters use 2 registers, either r0,r1 or r2,r3 (skipped registers are left unused)
- first 16 single-precision, or 8 double-precision arguments are passed via s0-s15 or d0-d7, respectively (note that since s and d registers are aliased, already used ones are skipped)
- subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
- note that as soon one floating point parameter is passed via the stack, subsequent single precision floating point parameters are also pushed onto the stack even if there are still free S* registers
- float and double vararg function parameters (no matter if in ellipsis part of function, or not) are passed like int or long long parameters, vfp registers aren’t used
- if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words (for first 4 integer arguments) to a reserved stack area adjacent to the other parameters on the stack
- parameters <= 32 bits are passed as 32 bit words
- aggregates (struct, union) with 1 to 4 identical floating-point members (either float or double) are passed field-by-field, except if passed as a vararg
- aggregates that could be passed via floating point register are never split across those and the stack, so if not enough registers are available an aggregate is passed entirely via the stack (implying above rule that any still unused float registers will be skipped for any subsequent arg)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- all other aggregates (struct, union), after rounding up the size to the nearest multiple of 4, are passed as a sequence of dwords, like integers (splitting across registers and stack is allowed)
- callee spills, caller reserves spill area space, though
- non floating point return values <= 32 bits use r0
- non floating point 64-bit return values use r0 and r1
- floating point return value uses s0 (for float) or d0 (for double), respectively
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
- aggregates (struct, union) with 1 to 4 identical floating-point members are returned in s0-s3 (for float) or d0-d3 (for double), respectively
- all other aggregates <= 32 bits are returned via r0
- for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
Stack directly after function prolog:
Architectures
The ARM architecture family contains several revisions with capabilities and extensions (such as thumb-interworking, more vector registers, ...) The following table sums up the most important properties of the various architecture standards, from a calling convention perspective.
Arch | Platforms | Details |
ARMv4 | ||
ARMv4T | ARM 7, ARM 9, Neo FreeRunner (OpenMoko) | |
ARMv5 | ARM 9E | BLX instruction available |
ARMv6 | No vector registers available in thumb | |
ARMv7 | iPod touch, iPhone 3GS/4, Raspberry Pi 2 | VFP, armhf convention on some platforms |
ARMv8 | iPhone 6 and higher | 64bit support |
ARM64 Calling Conventions
Overview ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not
by the same process. Interworking is only intra-process.
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology
didn’t change from ARM32).
For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture
[20].
dyncall support
The dyncall library supports the ARM 64-bit AArch64 PCS ABI, as well as Apple’s and Microsoft’s conventions which are derived from it, for both, calls and callbacks.
AAPCS64 Calling Convention
Registers and register usage ARM64 features thirty-one 64 bit general purpose registers, namely r0-r30, which are referred to as either
x0-x30 for 64bit access, or w0-w30 for 32bit access (with upper bits either cleared or sign extended on
load).
Also, there is sp/xzr/wzr, a register with restricted use, used for the stack pointer in instructions
dealing with the stack (sp) or a hardware zero register for all other instructions xzr/wzr, and
pc, the program counter. Additionally, there are thirty-two 128 bit registers v0-v31, to be used
as SIMD and floating point registers, referred to as q0-q31, d0-d31 and s0-s31, respectively
(in contrast to AArch32, those do not overlap multiple narrower registers), depending on their
use:
Name | Brief description |
x0-x7 | parameters, scratch, return value |
x8 | indirect result location pointer |
x9-x15 | scratch |
x16 | permanent in some cases, can have special function (IP0), see doc |
x17 | permanent in some cases, can have special function (IP1), see doc |
x18 | reserved as platform register, advised not to be used for handwritten, portable asm, see doc |
x19-x28 | permanent |
x29 | permanent, frame pointer |
x30 | permanent, link register |
sp | permanent, stack pointer |
pc | program counter |
v0-v7 | scratch, float parameters, return value |
v8-v15 | lower 64 bits are permanent, scratch |
v16-v31 | scratch |
xzr | zero register, always zero |
- stack parameter order: right-to-left
- caller cleans up the stack
- first 8 integer arguments are passed using x0-x7
- first 8 floating point arguments are passed using d0-d7
- subsequent parameters are pushed onto the stack
- if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
- aggregates (struct, union) with 1 to 4 identical floating-point members (either float or double) are passed field-by-field (8-byte aligned if passed via stack), except if passed as a vararg
- other aggregates (struct, union) > 16 bytes in size are passed indirectly, as a pointer to a copy (if needed)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- all other aggregates (struct, union), after rounding up the size to the nearest multiple of 8, are passed as a sequence of dwords, like integers
- aggregates are never split across registers and stack, so if not enough registers are available an aggregated is passed via the stack (for aggregates that would’ve been passed as floating point values, any still unused float registers will be skipped for any subsequent arg)
- stack is required throughout to be eight-byte aligned
- integer return values use x0
- floating-point return values use d0
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee via x8, and callee writes return value to this space; the ptr to the aggregate is returned in x0
- aggregates (struct, union) that would be passed via registers if passed as a first param, are returned via those registers
- for aggregates not returnable via registers (e.g. if regs exhausted, or > 16b, ...), the caller allocates space, passes pointer to it to the callee through x8, and callee writes return value to this space (note that this is not a hidden first param, as x8 is not used for passing params); the ptr to the aggregate is returned in x0
Stack directly after function prolog:
Apple’s ARM64 Function Calling Convention
OverviewApple’s ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways. Only the differences are listed here, for more details, take a look at Apple’s official documentation [21].
- arguments passed via stack use only the space they need, but are subject to type alignment requirements (which is 1 byte for char and bool, 2 for short, 4 for int and 8 for every other type)
- caller is required to sign and zero-extend arguments smaller than 32bits
- empty aggregates (allowed in C++, but non-standard in C, however compiler extensions exist) as
parameters:
- allowed to be ignored in C
- allowed to be ignored in C++, if aggregate is trivial, otherwise it’s treated as an aggregate with one byte field
Microsoft’s ARM64 Function Calling Convention
OverviewMicrosoft’s ARM64 calling convention is based on the AAPCS64 standard, however, diverges for variadic functions. Only the differences are listed here, for more details, take a look at Microsoft’s official documentation [22].
- variadic function calls do not use any SIMD or floating point registers (for fixed and variable args), meaning first 8 params are passed via x0-x7, the rest via the stack
- a function that returns an aggregate indirectly via a pointer passed to via x8 does not seem to be required to put that address in x0 on return (but should be safe to do so)
MIPS32 Calling Conventions
Overview Multiple revisions of the MIPS Instruction set exist, namely MIPS I, MIPS II, MIPS III, MIPS IV, MIPS32
and MIPS64. Nowadays, MIPS32 and MIPS64 are the main ones used for 32-bit and 64-bit instruction sets,
respectively.
Given MIPS processors are often used for embedded devices, several add-on extensions exist for the MIPS
family, for example:
- MIPS-3D
- simple floating-point SIMD instructions dedicated to common 3D tasks.
- MDMX
- (MaDMaX) more extensive integer SIMD instruction set using 64 bit floating-point registers.
- MIPS16e
- adds compression to the instruction stream to make programs take up less room (allegedly a response to the THUMB instruction set of the ARM architecture).
- MIPS MT
- multithreading additions to the system similar to HyperThreading.
Unfortunately, there is actually no such thing as ”The MIPS Calling Convention”. Many possible
conventions are used by many different environments such as O32[38], O64[39], N32[40], N64[40], EABI[41]
and NUBI[42].
dyncall support
Currently, dyncall supports for MIPS 32-bit architectures the widely-used O32 calling convention (for all four combinations of big/little-endian, and soft/hard-float targets), as well as EABI (little-endian/hard-float, which is used on the Homebrew SDK for the Playstation Portable). dyncall currently does not support MIPS16e (contrary to the like-minded ARM-THUMB, which is supported). Both, calls and callbacks are supported.
MIPS EABI 32-bit Calling Convention
Register usageName | Alias | Brief description |
$0 | $zero | hardware zero, scratch |
$1 | $at | assembler temporary, scratch |
$2-$3 | $v0-$v1 | integer results, scratch |
$4-$11 | $a0-$a7 | integer arguments, or double precision float arguments, scratch |
$12-$15,$24 | $t4-$t7,$t8 | integer temporaries, scratch |
$25 | $t9 | integer temporary, address of callee for PIC calls (by convention), scratch |
$16-$23 | $s0-$s7 | preserve |
$26,$27 | $kt0,$kt1 | reserved for kernel |
$28 | $gp | global pointer, preserve |
$29 | $sp | stack pointer, preserve |
$30 | $s8/$fp | frame pointer (some assemblers name it $fp), preserve |
$31 | $ra | return address, preserve |
hi, lo | multiply/divide special registers | |
$f0,$f2 | float results, scratch | |
$f1,$f3,$f4-$f11,$f20-$f23 | float temporaries, scratch | |
$f12-$f19 | single precision float arguments, scratch | |
- Stack grows down
- Stack parameter order: right-to-left
- Caller cleans up the stack
- first 8 integers (<= 32bit) are passed in registers $a0-$a7
- first 8 single precision floating point arguments are passed in registers $f12-$f19
- 64-bit stack arguments are always aligned to 8 bytes
- 64-bit integers or double precision floats are passed in two general purpose registers starting at an even register number, skipping one odd register
- if either integer or float registers are used up, the stack is used
- if the callee takes the address of one of the parameters and uses it to address other unnamed parameters (e.g. varargs) it has to copy - in its prolog - the the argument registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
- float registers don’t seem to ever need to be saved that way, because floats passed to an ellipsis function are promoted to doubles, which in turn are passed in a? register pairs, so only $a0-$a7 are need to be spilled
- aggregates (struct, union) <= 32bit are passed like an integer
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- all other aggregates (struct, union) are passed indirectly, as a pointer to a copy (if needed, and for vararg arguments required to be copied by the caller) of the struct
- results are returned in $v0 (32-bit), $v0 and $v1 (64-bit), $f0 or $f0 and $f2 (2 × 32 bit float e.g. complex)
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in %a0), and callee writes return value to this space; the ptr to the aggregate is returned in %v0
- aggregates (struct, union) <= 64bit are returned like an integer (aligned within the register according to endianness)
- all other aggregates (struct, union) are returned in a space allocated by the caller, with a pointer to it passed as first parameter to the function called (meaning in %a0); the ptr to the aggregate is returned in %v0
Stack directly after function prolog:
MIPS O32 32-bit Calling Convention
Register usageName | Alias | Brief description |
$0 | $zero | hardware zero |
$1 | $at | assembler temporary |
$2-$3 | $v0-$v1 | return value (only integer on hard-float targets), scratch |
$4-$7 | $a0-$a3 | first arguments (only integer on hard-float targets), scratch |
$8-$15,$24 | $t0-$t7,$t8 | temporaries, scratch |
$25 | $t9 | temporary, holds address of called function for PIC calls (by convention) |
$16-$23 | $s0-$s7 | preserved |
$26,$27 | $k0,$k1 | reserved for kernel |
$28 | $gp | global pointer, preserved by caller |
$29 | $sp | stack pointer, preserve |
$30 | $s8/$fp | frame pointer (some assemblers name it $fp), preserve |
$31 | $ra | return address, preserve |
hi, lo | multiply/divide special registers | |
$f0-$f3 | only on hard-float targets: float return value, scratch | |
$f4-$f11,$f16-$f19 | only on hard-float targets: float temporaries, scratch | |
$f12-$f15 | only on hard-float targets: first floating point arguments, scratch | |
$f20-$f31 | only on hard-float targets: preserved | |
- Stack grows down
- Stack parameter order: right-to-left
- Caller cleans up the stack
- Caller is required to always leave a 16-byte spill area for $a0-$a3 at the end of its frame, to be used and spilled to by the callee, if needed
- The different stack areas (local data, register save area, parameter area) are each aligned to 8 bytes
- generally, first four 32bit arguments are passed in registers $a0-$a3, respectively (only on hard-float targets: see below for exceptions if first arg is a float)
- subsequent parameters are passed vie the stack
- 64-bit params passed via registers are passed using either two registers (starting at an even register number, skipping an odd one if necessary), or via the stack using an 8-byte alignment
- only on hard-float targets: if the very first call argument is a float, up to 2 floats or doubles can be passed via $f12 and $f14, respectively, for first and second argument
- only on hard-float targets: if any arguments are passed via float registers, skip $a0-$a3 for subsequent arguments as if the values were passed via them
- only on hard-float targets: note that if the first argument is not a float, but the second, it’ll get passed via the $a? registers
- single precision float parameters (32 bit) are right-justified in their 8-byte slot on the stack on big endian targets, as they aren’t promoted
- aggregates (struct, union) are passed as a sequence of words like integers, no matter the fields or if hard-float target (splitting across registers and stack is allowed)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- results are returned in $v0 and $v1, with $v0 for all values < 64bit (only integer on hard-float targets)
- only on hard-float targets: floating point results are returned in $f0 (32-bit float), or $f0 and $f3 (64bit float)
- aggregates (struct, union) are returned in a space allocated by the caller, with a pointer to it passed as first parameter to the function called (meaning in %a0); the ptr to the aggregate is returned in %v0
Stack directly after function prolog:
MIPS64 Calling Conventions
Overview There are two main ABIs in use for MIPS64 chips, N64[40] and N32[40]. Both are basically the same,
except that N32 uses ILP32 as programming model (32-bit pointers and long integers), whereas N64 uses
LP64 (64-bit pointers and long integers). All registers of a MIPS64 chip are considered to be 64-bit wide, even
for the N32 calling convention.
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology
didn’t change from MIPS32).
Other than that there are correspoding 64-bit versions other MIPS32 ABIs, e.g. the EABI[41] and
O64[39].
dyncall support
For MIPS 64-bit machines, dyncall supports the N64 calling conventions for calls and callbacks (for all four combinations of big/little-endian, and soft/hard-float targets). The N32 calling convention might work - it used to, but hasn’t been tested, recently.
MIPS N64 Calling Convention
Register usageName | Alias | Brief description |
$0 | $zero | hardware zero |
$1 | $at | assembler temporary, scratch |
$2-$3 | $v0-$v1 | return value (only integers on hard-float targets), scratch |
$4-$11 | $a0-$a7 | first arguments (only integers on hard-float targets), scratch |
$12-$15,$24 | $t4-$t7,$t8 | temporaries, scratch |
$25 | $t9 | temporary, address callee for all PIC calls (by convention), scratch |
$16-$23 | $s0-$s7 | preserve |
$26,$27 | $kt0,$kt1 | reserved for kernel |
$28 | $gp | global pointer, preserve |
$29 | $sp | stack pointer, preserve |
$30 | $s8 | frame pointer, preserve |
$31 | $ra | return address, preserve |
hi, lo | multiply/divide special registers | |
$f0,$f2 | only on hard-float targets: float return values, scratch | |
$f1,$f3,$f4-$f11 | only on hard-float targets: float temporaries, scratch | |
$f12-$f19 | only on hard-float targets: float arguments, scratch | |
$f20-$f23 | only on hard-float targets: float temporaries, scratch | |
$f24-$f31 | only on hard-float targets: preserved | |
- Stack grows down
- Stack parameter order: right-to-left
- Caller cleans up the stack
- generally, first 8 params >= 64-bit are passed via registers
- for hard-float targets: register arguments are passed via $a0-$a7 for integers and $f12-$f19 for floats - with mixed float and int parameters, some registers are left out (e.g. first parameter ends up in $a0 or $f12, second in $a1 or $f13, etc.)
- for soft-float targets: register arguments are passed via $a0-$a7
- subsequent arguments are pushed onto the stack
- all stack entries are 64-bit aligned
- all stack regions are 16-byte aligned
- if the callee takes the address of one of the parameters and uses it to address other unnamed parameters (e.g. varargs) it has to copy - in its prolog - the the argument registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
- float arguments passed in the variable part of a vararg call are passed like integers, meaning float registers don’t ever need to be saved that way, so only $a0-$a7 are need to be spilled
- quad precision float arguments are passed in even-odd register pairs, skipping one register if needed
- integer parameters < 64 bit are right-justified (meaning occupy higher-address bytes) in their 8-byte slot on the stack, requiring extra-care for big-endian targets
- single precision float parameters (32 bit) are left-justified in their 8-byte slot on the stack, but are right justified in fp-registers on big endian targets, as they aren’t promoted (actually, official docs says ”undecided”, but real world implementations seem to use what is described here)
- aggregates (struct, union) are passed as a sequence of dwords in (integer registers and the stack), with the following particularities:
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer
to a copy of the aggregate
- if a dword happens to be a double precision floating point struct field, it is passed in a floating point register
- array and union fields are always passed like integers (even if their type is float or double)
- splitting an argument across registers and the stack is fine
- results are returned in $v0, and for a second one $v1 is used
- only on hard-float targets: floating point results are returned in $f0 (and $f2 if needed)
- only on hard-float targets: structs with only one or two floating point fields are returned in $f0 (and $f2 if necessary), field-by-field
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in %a0), and callee writes return value to this space; the ptr to the aggregate is returned in %v0
- any other aggregates (struct, union) <= 16 bytes are returned via registers $v0 (and $v1 if necessary), dword-by-dword
- all other aggregates (struct, union) >16 bytes are returned in a space allocated by the caller, with a pointer to it passed as first parameter to the function called (meaning in %a0); the ptr to the aggregate is returned in %v0
Stack directly after function prolog:
MIPS N32 Calling Convention
Despite what one might think given the name, this is a MIPS 64-bit calling convention. As mentioned in the overview of this chapter, it is nearly identical to the N64 one, the differences being:
- uses ILP32 as programming model instead of LP64
- floating point registers $f20-$f23 are to be preserved
SPARC Calling Conventions
Overview The SPARC family of processors is based on the SPARC instruction set architecture, which comes in
basically three revisions, V7, V8[29][27] and V9[30][28]. The former two are 32-bit whereas the
latter refers to the 64-bit SPARC architecture (see next chapter). SPARC uses big endian byte
order. dyncall
The word size is defined to be 32 bits.
dyncall support
SPARC (32-bit) Calling Convention
Register usage- 32 single floating point registers (f0-f31, usable as 8 quad precision q0,q4,q8,...,q28, 16 double precision d0,d2,d4,...,d30)
- 32 32-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
- calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
Name | Alias | Brief description |
%g0 | %r0 | Read-only, hardwired to 0 |
%g1-%g7 | %r1-%r7 | Global |
%o0,%o1 and %i0,%i1 | %r8,%r9 and %r24,%r25 | Output and input argument registers, return value |
%o2-%o5 and %i2-%i5 | %r10-%r13 and %r26-%r29 | Output and input argument registers |
%o6 and %i6 | %r14 and %r30, %sp and %fp | Stack and frame pointer |
%o7 and %i7 | %r15 and %r31 | Return address (caller writes to o7, callee uses i7) |
%l0-%l7 | %r16-%r23 | preserve |
%f0,%f1 | Floating point return value | |
%f2-%f31 | scratch | |
- stack grows down
- stack parameter order: right-to-left
- caller cleans up the stack
- stack always aligned to 8 bytes
- first 6 integers/pointers and floats are passed independently in registers using %o0-%o5
- for every other argument the stack is used
- all arguments <= 32 bit are passed as 32 bit values
- 64 bit arguments are passed like two consecutive <= 32 bit values (which allows for an argument to be split between the stack and %i5)
- aggregates (struct, union) of any size, as well as quad precision values are passed indirectly as a pointer to a copy of the aggregate (like: struct s2 = s; callee(&s2);)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- minimum stack size is 64 bytes, b/c stack pointer must always point at enough space to store all %i* and %l* registers, used when running out of register windows
- if needed, register spill area is adjacent to parameters
- results are expected by caller to be returned in %o0/%o1 (after reg window restore, meaning callee writes to %i0/%i1) for integers
- %f0/%f1 are used for floating point values
- aggregates (struct, union) and quad precision values are returned in a space allocated by the caller, with a pointer to it passed as an additional, hidden stack parameter (always at %sp+64 for the caller, see below); that pointer is returned in %o0
Stack directly after function prolog:
SPARC64 Calling Conventions
Overview The SPARC family of processors is based on the SPARC instruction set architecture, which comes in
basically three revisions, V7, V8[29][27][31] and V9[30][28][31]. The former two are 32-bit (see previous
chapter) whereas the latter refers to the 64-bit SPARC architecture. SPARC uses big endian byte
order, however, V9 supports also little endian byte order, but for data access only, not instruction
access. dyncall
There are two proposals, one from Sun and one from Hal, which disagree on how to handle some aspects of
this calling convention.
dyncall support
SPARC (64-bit) Calling Convention
- 32 double precision floating point registers (d0,d2,d4,...,d62, usable as 16 quad precision ones q0,q4,q8,...q60, and also first half of them are usable as 32 single precision registers f0-f31)
- 32 64-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
- calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
- stack and frame pointer are offset by a BIAS of 2047 (see official doc for reasons)
Name | Alias | Brief description |
%g0 | %r0 | Read-only, hardwired to 0 |
%g1-%g7 | %r1-%r7 | Global |
%o0-%o3 and %i0-%i3 | %r8-%r11 and %r24-%r27 | Output and input argument registers, return value |
%o4,%o5 and %i4,%i5 | %r12,%r13 and %r28,%r29 | Output and input argument registers |
%o6 and %i6 | %r14 and %r30, %sp and %fp | Stack and frame pointer (NOTE, offset with a BIAS of 2047) |
%o7 and %i7 | %r15 and %r31 | Return address (caller writes to o7, callee uses i7) |
%l0-%l7 | %r16-%r23 | preserve |
%d0,%d2,%d4,%d6 | scratch, Floating point arguments, return value | |
%d8,%d10,...,%d14 | scratch, Floating point arguments | |
%d16,%d18,...,%d30 | scratch (preserve for Hal), Floating point arguments | |
%d32,%d34,...,%d62 | scratch (preserve for Hal) | |
- stack grows down
- stack parameter order: right-to-left
- caller cleans up the stack
- stack frame is always aligned to 16 bytes
- first 6 integers are passed in registers using %o0-%o5
- first 8 quad precision floating point args (or 16 double precision, or 32 single precision) are passed in floating point registers (%q0,%q4,...,%q28 or %d0,%d2,...,%d30 or %f0-%f31, respectively)
- for every other argument the stack is used
- single precision floating point args are passed in odd %f* registers, and are ”right aligned” in their 8-byte space on the stack
- for every argument passed, corresponding %o*, %f* register or stack space is skipped (e.g. passing a double as 3rd call argument, %d4 is used and %o2 is skipped)
- all arguments <= 64 bit are passed as 64 bit values
- minimum stack size is 128 bytes, b/c stack pointer must always point at enough space to store all %i* and %l* registers, used when running out of register windows
- if needed, register spill area (both, integer and float arguments are spilled in order) is adjacent to parameters
- structs with only one field are passed as if the param would be the field itself
- structs <= 16 bytes (which have more than one field) are passed field-by-field, however evaluated as a
sequence of 8-byte parameter slots
- note that due to aggregate alignment rules, any floating point value is either the entire slot (for double precision) or exactly one half
- fields are left justified in register or stack slots
- integers in a slot are passed as such (either via %o* registers or the stack)
- single precision floats (using half of the slot) use even numbered %f* registers when they occupy the left half, odd numbered ones otherwise (no register skipping logic applied within a slot)
- splitting struct fields between registers and stack is allowed
- unions <= 16 bytes passed by-value are passed like integers in left-justified 8-byte slots (either via %o* registers or the stack)
- aggregates (struct, union) and types > 16 bytes are passed indirectly, as a pointer to a correctly aligned copy of the data (that copy can be avoided under certain conditions)
- non-trivial C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
- results are expected by caller to be returned in %o0-%o3 (after reg window restore, meaning callee writes to %i0-%i3) for integers
- %d0,%d2,%d4,%d6 are used for floating point values
- for non-trivial C++ aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in %o0), and callee writes return value to this space; the ptr to the aggregate is returned in the same register (after reg window restore)
- the fields of aggregates (struct, union) <= 32 bytes are returned via registers mentioned above (which are assigned following the same logic as when passing the aggregate as a first argument to a function)
- aggregates (struct, union) >32 bytes are returned in a space allocated by the caller, with a pointer to it passed as first parameter to the function called (meaning in %o0)
Stack directly after function prolog: