diff doc/manual/callconvs/callconv_arm32.tex @ 481:0fc22b5feac7

- arm related doc addition about aggregates
author Tassilo Philipp
date Wed, 02 Mar 2022 17:30:51 +0100
parents b47168dacba6
children d160046da104
line wrap: on
line diff
--- a/doc/manual/callconvs/callconv_arm32.tex	Tue Mar 01 21:02:10 2022 +0100
+++ b/doc/manual/callconvs/callconv_arm32.tex	Wed Mar 02 17:30:51 2022 +0100
@@ -1,6 +1,6 @@
 %//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>,
+% Copyright (c) 2007-2022 Daniel Adler <dadler@uni-goettingen.de>,
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -17,9 +17,6 @@
 %
 %//////////////////////////////////////////////////////////////////////////////
 
-% ==================================================
-% ARM32
-% ==================================================
 \subsection{ARM32 Calling Conventions}
 
 \paragraph{Overview}
@@ -91,16 +88,18 @@
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
 \item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn't seem to be specified in the ATPCS)
-\item structures and unions are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words
-\item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
+\item aggregates (struct, union) are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words (splitting across registers and stack is allowed)
 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
 \end{itemize}
 
 \paragraph{Return values}
+
 \begin{itemize}
 \item return values \textless=\ 32 bits use r0
 \item 64 bit return values use r0 and r1
-\item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
+\item aggregates (struct, union) \textless=\ 32 bits are returned like an integer (in r0)
+\item aggregates (struct, union) \textgreater\ 32 bits the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
+\item for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
 \end{itemize}
 
 \paragraph{Stack layout}
@@ -180,17 +179,19 @@
 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
-\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack), although this doesn't seem to be specified in the ATPCS)
-\item structures and unions are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words
-\item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values})
+\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn't seem to be specified in the ATPCS)
+\item aggregates (struct, union) are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words (splitting across registers and stack is allowed)
 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
 \end{itemize}
 
 \paragraph{Return values}
+
 \begin{itemize}
 \item return values \textless=\ 32 bits use r0
 \item 64 bit return values use r0 and r1
-\item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
+\item aggregates (struct, union) \textless=\ 32 bits are returned like an integer (in r0)
+\item aggregates (struct, union) \textgreater\ 32 bits the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
+\item for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param (meaning in r0), and callee writes return value to this space; the ptr to the aggregate is returned in r0
 \end{itemize}
 
 \paragraph{Stack layout}
@@ -379,18 +380,23 @@
 \item float and double vararg function parameters (no matter if in ellipsis part of function, or not) are passed like int or long long parameters, vfp registers aren't used
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words (for first 4 integer arguments) to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
-\item structures and unions are passed by value (after rounding up the size to the nearest multiple of 4), as a sequence of words
-\item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values})
+\item aggregates (struct, union) with 1 to 4 identical floating-point members (either float or double) are passed field-by-field, except if passed as a vararg
+\item aggregates that could be passed via floating point register are never split across those and the stack, so if not enough registers are available an aggregate is
+passed entirely via the stack (implying above rule that any still unused float registers will be skipped for any subsequent arg)
+\item all other aggregates (struct, union), after rounding up the size to the nearest multiple of 4, are passed as a sequence of dwords, like integers (splitting across registers and stack is allowed)
 \item callee spills, caller reserves spill area space, though
 \end{itemize}
 
 \paragraph{Return values}
+
 \begin{itemize}
 \item non floating point return values \textless=\ 32 bits use r0
 \item non floating point 64-bit return values use r0 and r1
-\item single precision floating point return value uses s0
-\item double precision floating point return value uses d0
-\item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0
+\item floating point return value uses s0 (for float) or d0 (for double), respectively
+\item aggregates (struct, union) with 1 to 4 identical floating-point members are returned in s0-s3 (for float) or d0-d3 (for double), respectively
+\item all other aggregates \textless=\ 32 bits are returned via r0
+\item for all other aggregates, the caller allocates space, passes pointer to it to the callee as a hidden first param
+(meanin in r0), and callee writes return value to this space; the ptr to the aggregate is returned in x0
 \end{itemize}
 
 \paragraph{Stack layout}