view doc/manual/callconvs/callconv_arm64.tex @ 328:276eb8c87aa0

- review and fixes, cleanup, amendments to calling convention appendix of manual
author Tassilo Philipp
date Fri, 22 Nov 2019 23:11:56 +0100
parents 4a64b733dc76
children bac52ab8869f
line wrap: on
line source

%//////////////////////////////////////////////////////////////////////////////
%
% Copyright (c) 2014-2019 Daniel Adler <dadler@uni-goettingen.de>, 
%                         Tassilo Philipp <tphilipp@potion-studios.com>
%
% Permission to use, copy, modify, and distribute this software for any
% purpose with or without fee is hereby granted, provided that the above
% copyright notice and this permission notice appear in all copies.
%
% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
%
%//////////////////////////////////////////////////////////////////////////////

% ==================================================
% ARM64
% ==================================================
\subsection{ARM64 Calling Conventions}

\paragraph{Overview}

ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intra-process.\\
The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn't change from ARM32).\\
For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture \cite{AAPCS64}.\\

\paragraph{\product{dyncall} support}

The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, as well as Apple's convention derived from it, for calls and callbacks.

\subsubsection{AAPCS64 Calling Convention}

\paragraph{Registers and register usage}

ARM64 features thirty-one 64 bit general purpose registers, namely {\bf r0-r30},
which are referred to as either {\bf x0-x30} for 64bit access, or {\bf w0-w30}
for 32bit access (with upper bits either cleared or sign extended on load).\\
Also, there is {\bf sp/xzr/wzr}, a register with restricted use, used for the
stack pointer in instructions dealing with the stack ({\bf sp}) or a hardware
zero register for all other instructions {\bf xzr/wzr}, and {\bf pc}, the
program counter. Additionally, there are thirty-two 128 bit registers {\bf v0-v31},
to be used as SIMD and floating point registers, referred to as {\bf q0-q31}, {\bf d0-d31}
and {\bf s0-s31}, respectively, depending on their use:\\

\begin{table}[h]
\begin{tabular*}{0.95\textwidth}{3 B}
Name          & Brief description\\
\hline        
{\bf x0-x7}   & parameters, scratch, return value\\
{\bf x8}      & indirect result location pointer\\
{\bf x9-x15}  & scratch\\
{\bf x16}     & permanent in some cases, can have special function (IP0), see doc\\
{\bf x17}     & permanent in some cases, can have special function (IP1), see doc\\
{\bf x18}     & reserved as platform register, advised not to be used for handwritten, portable asm, see doc \\
{\bf x19-x28} & permanent\\
{\bf x29}     & permanent, frame pointer\\
{\bf x30}     & permanent, link register\\
{\bf sp}      & permanent, stack pointer\\
{\bf pc}      & program counter\\
\end{tabular*}
\caption{Register usage on arm64}
\end{table}

\paragraph{Parameter passing}

\begin{itemize}
\item stack parameter order: right-to-left
\item caller cleans up the stack
\item first 8 integer arguments are passed using x0-x7
\item first 8 floating point arguments are passed using d0-d7
\item subsequent parameters are pushed onto the stack
\item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
\item structures and unions are passed by value, with the first four words of the parameters in r0-r3
\item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
\item stack is required to be throughout eight-byte aligned
\end{itemize}

\paragraph{Return values}
\begin{itemize}
\item integer return values use x0
\item floating-point return values use d0
\item otherwise, the caller allocates space, passes pointer to it to the callee through x8, and callee writes return value to this space
\end{itemize}

\paragraph{Stack layout}

% verified/amended: TP nov 2019 (see also doc/disas_examples/arm64.aapcs.disas)
Stack directly after function prolog:\\

\begin{figure}[h]
\begin{tabular}{5|3|1 1}
                                         & \vdots                 &                                      &                              \\
\hhline{~=~~}                                                                            
register save area                       & \hspace{4cm}           &                                      & \mrrbrace{5}{caller's frame} \\
\hhline{~-~~}                                                                            
local data                               &                        &                                      &                              \\
\hhline{~-~~}                                                                             
\mrlbrace{9}{parameter area}             & arg n-1                & \mrrbrace{3}{stack parameters}       &                              \\
                                         & \ldots                 &                                      &                              \\
                                         & arg 8                  &                                      &                              \\
\hhline{~=~~}                                     
                                         & x7                     & \mrrbrace{6}{spill area (if needed)} & \mrrbrace{9}{current frame}  \\
                                         & \ldots                 &                                      &                              \\
                                         & x? (first unnamed reg) &                                      &                              \\
                                         & q7                     &                                      &                              \\
                                         & \ldots                 &                                      &                              \\
                                         & q0                     &                                      &                              \\
\hhline{~-~~}                                                                             
register save area (with return address) &                        &                                      &                              \\ % fp will point here (to 1st arg) @@@ verify
\hhline{~-~~}                                                                             
local data                               &                        &                                      &                              \\
\hhline{~-~~}                                                                             
parameter area                           & \vdots                 &                                      &                              \\
\end{tabular}
\caption{Stack layout on arm64}
\end{figure}

\newpage


\subsubsection{Apple's ARM64 Function Calling Conventions}

\paragraph{Overview}

Apple's ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways.
Only the differences are listed here, for more details, take a look at Apple's official documentation \cite{AppleARM64}.

\begin{itemize}
\item arguments passed via stack use only the space they need, but are subject to the type alignment requirements (which is 1 byte for char and bool, 2 for short, 4 for int and 8 for every other type)
\item caller is required to sign and zero-extend arguments smaller than 32bits
\end{itemize}