0
|
1 %
|
|
2 % Copyright (c) 2014,2015 Daniel Adler <dadler@uni-goettingen.de>,
|
|
3 % Tassilo Philipp <tphilipp@potion-studios.com>
|
|
4 %
|
|
5 % Permission to use, copy, modify, and distribute this software for any
|
|
6 % purpose with or without fee is hereby granted, provided that the above
|
|
7 % copyright notice and this permission notice appear in all copies.
|
|
8 %
|
|
9 % THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
|
|
10 % WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
|
|
11 % MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
|
|
12 % ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
|
|
13 % WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
|
|
14 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
|
|
15 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
|
|
16 %
|
|
17
|
|
18 % ==================================================
|
|
19 % ARM64
|
|
20 % ==================================================
|
|
21 \subsection{ARM64 Calling Convention}
|
|
22
|
|
23 \paragraph{Overview}
|
|
24
|
|
25 ARMv8 introduced the AArch64 calling convention. ARM64 chips can be run in 64 or 32bit mode, but not by the same process. Interworking is only intre-process.\\
|
|
26 The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology
|
|
27 didn't change from ARM32).\\
|
95
|
28 For more details, take a look at the Procedure Call Standard for the ARM 64-bit Architecture \cite{AAPCS64}.\\
|
0
|
29
|
|
30 \paragraph{\product{dyncall} support}
|
|
31
|
|
32 The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, for calls and callbacks.
|
|
33
|
|
34 \subsubsection{AAPCS64 Calling Convention}
|
|
35
|
|
36 \paragraph{Registers and register usage}
|
|
37
|
68
|
38 ARM64 features thirty-one 64 bit general purpose registers, namely x0-x30.
|
|
39 Also, there is SP, a register with restricted use, used for the stack pointer,
|
|
40 and PC dedicated as program counter. Additionally, there are thirty-two 128 bit
|
|
41 registers v0-v31, to be used as SIMD and floating point registers, referred to
|
|
42 as q0-q31, d0-d31 and s0-s31, respectively, depending on their use:\\
|
0
|
43
|
|
44 \begin{table}[h]
|
77
|
45 \begin{tabular*}{0.95\textwidth}{3 B}
|
0
|
46 Name & Brief description\\
|
|
47 \hline
|
|
48 {\bf x0-x7} & parameters, scratch, return value\\
|
|
49 {\bf x8} & indirect result location pointer\\
|
|
50 {\bf x9-x15} & scratch\\
|
|
51 {\bf x16} & permanent in some cases, can have special function (IP0), see doc\\
|
|
52 {\bf x17} & permanent in some cases, can have special function (IP1), see doc\\
|
|
53 {\bf x18} & reserved as platform register, advised not to be used for handwritten, portable asm, see doc \\
|
|
54 {\bf x19-x28} & permanent\\
|
|
55 {\bf x29} & permanent, frame pointer\\
|
|
56 {\bf x30} & permanent, link register\\
|
|
57 {\bf SP} & permanent, stack pointer\\
|
|
58 {\bf PC} & program counter\\
|
76
|
59 \end{tabular*}
|
0
|
60 \caption{Register usage on arm64}
|
|
61 \end{table}
|
|
62
|
|
63 \paragraph{Parameter passing}
|
|
64
|
|
65 \begin{itemize}
|
|
66 \item stack parameter order: right-to-left
|
|
67 \item caller cleans up the stack
|
|
68 \item first 8 integer arguments are passed using x0-x7
|
|
69 \item first 8 floating point arguments are passed using d0-d7
|
|
70 \item subsequent parameters are pushed onto the stack
|
|
71 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed parameters require saving, though)
|
|
72 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3
|
|
73 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
|
|
74 \item stack is required to be throughout eight-byte aligned
|
|
75 \end{itemize}
|
|
76
|
|
77 \paragraph{Return values}
|
|
78 \begin{itemize}
|
|
79 \item integer return values use x0
|
|
80 \item floating-point return values use d0
|
|
81 \item otherwise, the caller allocates space, passes pointer to it to the callee through x8, and callee writes return value to this space
|
|
82 \end{itemize}
|
|
83
|
|
84 \paragraph{Stack layout}
|
|
85
|
|
86 Stack directly after function prolog:\\
|
|
87
|
|
88 \begin{figure}[h]
|
|
89 \begin{tabular}{5|3|1 1}
|
|
90 \hhline{~-~~}
|
92
|
91 & \vdots & & \\
|
|
92 \hhline{~=~~}
|
|
93 register save area & \hspace{4cm} & & \mrrbrace{5}{caller's frame} \\
|
|
94 \hhline{~-~~}
|
|
95 local data & & & \\
|
|
96 \hhline{~-~~}
|
|
97 \mrlbrace{13}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\
|
|
98 & \ldots & & \\
|
|
99 & \ldots & & \\
|
|
100 \hhline{~=~~}
|
|
101 & x0 & \mrrbrace{10}{spill area (if needed)} & \mrrbrace{15}{current frame} \\
|
|
102 & x1 & & \\
|
|
103 & \ldots & & \\
|
|
104 & x2 & & \\
|
|
105 & x7 & & \\
|
|
106 & d0 & & \\
|
|
107 & d1 & & \\
|
|
108 & \ldots & & \\
|
|
109 & d2 & & \\
|
|
110 & d7 & & \\
|
|
111 \hhline{~-~~}
|
|
112 register save area & & & \\
|
|
113 \hhline{~-~~}
|
|
114 local data & & & \\
|
|
115 \hhline{~-~~}
|
|
116 link and frame register & x30 & & \\
|
|
117 & x29 & & \\
|
|
118 \hhline{~-~~}
|
|
119 parameter area & \vdots & & \\
|
0
|
120 \hhline{~-~~}
|
|
121 \end{tabular}
|
|
122 \caption{Stack layout on arm64}
|
|
123 \end{figure}
|
|
124
|
|
125 \newpage
|
|
126
|
|
127
|
|
128 \subsubsection{Apple's ARM64 Function Calling Conventions}
|
|
129
|
|
130 \paragraph{Overview}
|
|
131
|
|
132 Apple's ARM64 calling convention is based on the AAPCS64 standard, however, diverges in some ways.
|
|
133 Only the differences are listed here, for more details, take a look at Apple's official documentation \cite{AppleARM64}.
|
|
134
|
|
135 \begin{itemize}
|
|
136 \item arguments passed via stack use only the space they need, but are subject to the type alignment requirements (which is 1 byte for char and bool, 2 for short, 4 for int and 8 for every other type)
|
|
137 \item caller is required to sign and zero-extend arguments smaller than 32bits
|
|
138 \end{itemize}
|
|
139
|