0
|
1 %
|
|
2 % Copyright (c) 2014,2015 Daniel Adler <dadler@uni-goettingen.de>,
|
|
3 % Tassilo Philipp <tphilipp@potion-studios.com>
|
|
4 %
|
|
5 % Permission to use, copy, modify, and distribute this software for any
|
|
6 % purpose with or without fee is hereby granted, provided that the above
|
|
7 % copyright notice and this permission notice appear in all copies.
|
|
8 %
|
|
9 % THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
|
10 % WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
|
11 % MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
|
12 % ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
|
13 % WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
|
14 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
|
15 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
16 %
|
|
17
|
|
18 % ==================================================
|
|
19 % ARM64
|
|
20 % ==================================================
|
|
21 \subsection{ARM64 Calling Convention}
|
|
22
|
|
23 \paragraph{Overview}
|
|
24
|
|
25 ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intre-process.\\
|
|
26 The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology
|
|
27 didn't change from ARM32).\\
|
|
28 For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture \cite{AAPCS64}.
|
|
29
|
|
30
|
|
31 \paragraph{\product{dyncall} support}
|
|
32
|
|
33 The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, for calls and callbacks.
|
|
34
|
|
35 \subsubsection{AAPCS64 Calling Convention}
|
|
36
|
|
37 \paragraph{Registers and register usage}
|
|
38
|
|
39 ARM64 features thirty-one 64 bit general purpose registers, namely x0-x30. Also, there is SP, a register with restricted use, used for the stack pointer, and PC dedicated as program counter. Additionally, there are thirty-two 128 bit registers v0-v31, to be used as SIMD and floating point registers, referred to as q0-q31, d0-d31 and s0-s31, respectively, depending on their use:\\
|
|
40
|
|
41 \begin{table}[h]
|
|
42 \begin{tabular}{3 B}
|
|
43 \hline
|
|
44 Name & Brief description\\
|
|
45 \hline
|
|
46 {\bf x0-x7} & parameters, scratch, return value\\
|
|
47 {\bf x8} & indirect result location pointer\\
|
|
48 {\bf x9-x15} & scratch\\
|
|
49 {\bf x16} & permanent in some cases, can have special function (IP0), see doc\\
|
|
50 {\bf x17} & permanent in some cases, can have special function (IP1), see doc\\
|
|
51 {\bf x18} & reserved as platform register, advised not to be used for handwritten, portable asm, see doc \\
|
|
52 {\bf x19-x28} & permanent\\
|
|
53 {\bf x29} & permanent, frame pointer\\
|
|
54 {\bf x30} & permanent, link register\\
|
|
55 {\bf SP} & permanent, stack pointer\\
|
|
56 {\bf PC} & program counter\\
|
|
57 \hline
|
|
58 \end{tabular}
|
|
59 \caption{Register usage on arm64}
|
|
60 \end{table}
|
|
61
|
|
62 \paragraph{Parameter passing}
|
|
63
|
|
64 \begin{itemize}
|
|
65 \item stack parameter order: right-to-left
|
|
66 \item caller cleans up the stack
|
|
67 \item first 8 integer arguments are passed using x0-x7
|
|
68 \item first 8 floating point arguments are passed using d0-d7
|
|
69 \item subsequent parameters are pushed onto the stack
|
|
70 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed parameters require saving, though)
|
|
71 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3
|
|
72 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
|
|
73 \item stack is required to be throughout eight-byte aligned
|
|
74 \end{itemize}
|
|
75
|
|
76 \paragraph{Return values}
|
|
77 \begin{itemize}
|
|
78 \item integer return values use x0
|
|
79 \item floating-point return values use d0
|
|
80 \item otherwise, the caller allocates space, passes pointer to it to the callee through x8, and callee writes return value to this space
|
|
81 \end{itemize}
|
|
82
|
|
83 \paragraph{Stack layout}
|
|
84
|
|
85 Stack directly after function prolog:\\
|
|
86
|
|
87 \begin{figure}[h]
|
|
88 \begin{tabular}{5|3|1 1}
|
|
89 \hhline{~-~~}
|
|
90 & \vdots & & \\
|
|
91 \hhline{~=~~}
|
|
92 register save area & & & \mrrbrace{5}{caller's frame} \\
|
|
93 \hhline{~-~~}
|
|
94 local data & & & \\
|
|
95 \hhline{~-~~}
|
|
96 \mrlbrace{13}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\
|
|
97 & \ldots & & \\
|
|
98 & \ldots & & \\
|
|
99 \hhline{~=~~}
|
|
100 & x0 & \mrrbrace{10}{spill area (if needed)} & \mrrbrace{15}{current frame} \\
|
|
101 & x1 & & \\
|
|
102 & \ldots & & \\
|
|
103 & x2 & & \\
|
|
104 & x7 & & \\
|
|
105 & d0 & & \\
|
|
106 & d1 & & \\
|
|
107 & \ldots & & \\
|
|
108 & d2 & & \\
|
|
109 & d7 & & \\
|
|
110 \hhline{~-~~}
|
|
111 register save area & & & \\
|
|
112 \hhline{~-~~}
|
|
113 local data & & & \\
|
|
114 \hhline{~-~~}
|
|
115 link and frame register & x30 & & \\
|
|
116 & x29 & & \\
|
|
117 \hhline{~-~~}
|
|
118 parameter area & \vdots & & \\
|
|
119 \hhline{~-~~}
|
|
120 \end{tabular}
|
|
121 \caption{Stack layout on arm64}
|
|
122 \end{figure}
|
|
123
|
|
124 \newpage
|
|
125
|
|
126
|
|
127 \subsubsection{Apple's ARM64 Function Calling Conventions}
|
|
128
|
|
129 \paragraph{Overview}
|
|
130
|
|
131 Apple's ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways.
|
|
132 Only the differences are listed here, for more details, take a look at Apple's official documentation \cite{AppleARM64}.
|
|
133
|
|
134 \begin{itemize}
|
|
135 \item arguments passed via stack use only the space they need, but are subject to the type alignment requirements (which is 1 byte for char and bool, 2 for short, 4 for int and 8 for every other type)
|
|
136 \item caller is required to sign and zero-extend arguments smaller than 32bits
|
|
137 \end{itemize}
|
|
138
|