view doc/manual/callconvs/callconv_arm64.tex @ 499:fc614cb865c6

- doc and disasexample additions specific to non-trivial C++ aggregates as return values (incl. fixes to doc and additional LSB specific PPC32 section)
author Tassilo Philipp
date Mon, 04 Apr 2022 15:50:52 +0200
parents 0fc22b5feac7
children
line wrap: on
line source

%//////////////////////////////////////////////////////////////////////////////
%
% Copyright (c) 2014-2022 Daniel Adler <dadler@uni-goettingen.de>, 
%                         Tassilo Philipp <tphilipp@potion-studios.com>
%
% Permission to use, copy, modify, and distribute this software for any
% purpose with or without fee is hereby granted, provided that the above
% copyright notice and this permission notice appear in all copies.
%
% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
%
%//////////////////////////////////////////////////////////////////////////////

\subsection{ARM64 Calling Conventions}

\paragraph{Overview}

ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intra-process.\\
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn't change from ARM32).\\
For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture \cite{AAPCS64}.\\

\paragraph{\product{dyncall} support}

The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, as well as Apple's and Microsoft's conventions which are derived from it, for both, calls and callbacks.

\subsubsection{AAPCS64 Calling Convention}

\paragraph{Registers and register usage}

ARM64 features thirty-one 64 bit general purpose registers, namely {\bf r0-r30},
which are referred to as either {\bf x0-x30} for 64bit access, or {\bf w0-w30}
for 32bit access (with upper bits either cleared or sign extended on load).\\
Also, there is {\bf sp/xzr/wzr}, a register with restricted use, used for the
stack pointer in instructions dealing with the stack ({\bf sp}) or a hardware
zero register for all other instructions {\bf xzr/wzr}, and {\bf pc}, the
program counter. Additionally, there are thirty-two 128 bit registers {\bf v0-v31},
to be used as SIMD and floating point registers, referred to as {\bf q0-q31}, {\bf d0-d31}
and {\bf s0-s31}, respectively (in contrast to AArch32, those do not overlap multiple
narrower registers), depending on their use:\\

\begin{table}[h]
\begin{tabular*}{0.95\textwidth}{3 B}
Name          & Brief description\\
\hline        
{\bf x0-x7}   & parameters, scratch, return value\\
{\bf x8}      & indirect result location pointer\\
{\bf x9-x15}  & scratch\\
{\bf x16}     & permanent in some cases, can have special function (IP0), see doc\\
{\bf x17}     & permanent in some cases, can have special function (IP1), see doc\\
{\bf x18}     & reserved as platform register, advised not to be used for handwritten, portable asm, see doc \\
{\bf x19-x28} & permanent\\
{\bf x29}     & permanent, frame pointer\\
{\bf x30}     & permanent, link register\\
{\bf sp}      & permanent, stack pointer\\
{\bf pc}      & program counter\\
{\bf v0-v7}   & scratch, float parameters, return value\\
{\bf v8-v15}  & lower 64 bits are permanent, scratch\\
{\bf v16-v31} & scratch\\
{\bf xzr}     & zero register, always zero\\
\end{tabular*}
\caption{Register usage on arm64}
\end{table}

\paragraph{Parameter passing}

\begin{itemize}
\item stack parameter order: right-to-left
\item caller cleans up the stack
\item first 8 integer arguments are passed using x0-x7
\item first 8 floating point arguments are passed using d0-d7
\item subsequent parameters are pushed onto the stack
\item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer
and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
\item aggregates (struct, union) with 1 to 4 identical floating-point members (either float or double) are passed field-by-field (8-byte aligned if passed via stack), except if passed as a vararg
\item other aggregates (struct, union) \textgreater\ 16 bytes in size are passed indirectly, as a pointer to a copy (if needed)
\item {\it non-trivial} C++ aggregates (as defined by the language) of any size, are passed indirectly via a pointer to a copy of the aggregate
\item all other aggregates (struct, union), after rounding up the size to the nearest multiple of 8, are passed as a sequence of dwords, like integers
\item aggregates are never split across registers and stack, so if not enough registers are available an aggregated is passed via the stack (for aggregates that
would've been passed as floating point values, any still unused float registers will be skipped for any subsequent arg)
\item stack is required throughout to be eight-byte aligned
\end{itemize}

\paragraph{Return values}

\begin{itemize}
\item integer return values use x0
\item floating-point return values use d0
\item for {\it non-trivial} C++ aggregates, the caller allocates space, passes pointer to it to the callee via x8, and callee writes return value to this space; the ptr to the aggregate is returned in x0
\item aggregates (struct, union) that would be passed via registers if passed as a first param, are returned via those registers
\item for aggregates not returnable via registers (e.g. if regs exhausted, or \textgreater\ 16b, ...), the caller allocates space, passes pointer to it to the callee through
x8, and callee writes return value to this space (note that this is not a hidden first param, as x8 is not used for passing params); the ptr to the aggregate is returned in x0
\end{itemize}

\paragraph{Stack layout}

% verified/amended: TP nov 2019 (see also doc/disas_examples/arm64.aapcs.disas)
Stack directly after function prolog:\\

\begin{figure}[h]
\begin{tabular}{5|3|1 1}
                                         & \vdots                 &                                      &                              \\
\hhline{~=~~}                                                                            
register save area                       & \hspace{4cm}           &                                      & \mrrbrace{5}{caller's frame} \\
\hhline{~-~~}                                                                            
local data                               &                        &                                      &                              \\
\hhline{~-~~}                                                                             
\mrlbrace{9}{parameter area}             & arg n-1                & \mrrbrace{3}{stack parameters}       &                              \\
                                         & \ldots                 &                                      &                              \\
                                         & arg 8                  &                                      &                              \\
\hhline{~=~~}                                     
                                         & x7                     & \mrrbrace{6}{spill area (if needed)} & \mrrbrace{9}{current frame}  \\
                                         & \ldots                 &                                      &                              \\
                                         & x? (first unnamed reg) &                                      &                              \\
                                         & q7                     &                                      &                              \\
                                         & \ldots                 &                                      &                              \\
                                         & q0                     &                                      &                              \\
\hhline{~-~~}                                                                             
register save area (with return address) &                        &                                      &                              \\ % fp will point here (to 1st arg)
\hhline{~-~~}                                                                             
local data                               &                        &                                      &                              \\
\hhline{~-~~}                                                                             
parameter area                           & \vdots                 &                                      &                              \\
\end{tabular}
\caption{Stack layout on arm64}
\end{figure}

\clearpage


\subsubsection{Apple's ARM64 Function Calling Convention}

\paragraph{Overview}

Apple's ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways.
Only the differences are listed here, for more details, take a look at Apple's official documentation \cite{AppleARM64}.

\begin{itemize}
\item arguments passed via stack use only the space they need, but are subject to type alignment requirements (which is 1 byte for char and bool, 2 for short, 4 for int and 8 for every other type)
\item caller is required to sign and zero-extend arguments smaller than 32bits
\item empty aggregates (allowed in C++, but non-standard in C, however compiler extensions exist) as parameters:
\begin{itemize}
\item allowed to be ignored in C
\item allowed to be ignored in C++, if aggregate is trivial, otherwise it's treated as an aggregate with one byte field
\end{itemize}
\end{itemize}


\subsubsection{Microsoft's ARM64 Function Calling Convention}

\paragraph{Overview}

Microsoft's ARM64 calling convention is based on the AAPCS64 standard, however, diverges for variadic functions.
Only the differences are listed here, for more details, take a look at Microsoft's official documentation \cite{MicrosoftARM64}.

\begin{itemize}
\item variadic function calls do not use any SIMD or floating point registers (for fixed and variable args), meaning first 8 params are passed via x0-x7, the rest via the stack
\item a function that returns an aggregate indirectly via a pointer passed to via x8 does not seem to be required to put that address in x0 on return (but should be safe to do so)
\end{itemize}