Mercurial > pub > dyncall > dyncall
annotate doc/manual/callconvs/callconv_arm32.tex @ 45:e5cdf4b4d813
- armhf callback fix for calls with >= 64byte of floating point params where d7 is filled before all args are pushed
author | cslag |
---|---|
date | Sat, 19 Dec 2015 23:24:35 +0100 |
parents | ecc9403e214a |
children | c4de113dc1e9 |
rev | line source |
---|---|
0 | 1 % |
2 % Copyright (c) 2007,2010 Daniel Adler <dadler@uni-goettingen.de>, | |
3 % Tassilo Philipp <tphilipp@potion-studios.com> | |
4 % | |
5 % Permission to use, copy, modify, and distribute this software for any | |
6 % purpose with or without fee is hereby granted, provided that the above | |
7 % copyright notice and this permission notice appear in all copies. | |
8 % | |
9 % THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES | |
10 % WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF | |
11 % MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR | |
12 % ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES | |
13 % WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN | |
14 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF | |
15 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. | |
16 % | |
17 | |
18 % ================================================== | |
19 % ARM32 | |
20 % ================================================== | |
21 \subsection{ARM32 Calling Convention} | |
22 | |
23 \paragraph{Overview} | |
24 | |
25 The ARM32 family of processors is based on | |
26 the Advanced RISC Machines (ARM) processor architecture (32 bit RISC). | |
27 The word size is 32 bits (and the programming model is LLP64).\\ | |
28 Basically, this family of microprocessors can be run in 2 major modes:\\ | |
29 \\ | |
30 \begin{tabular}{2 B} | |
31 \hline | |
32 Mode & Description\\ | |
33 \hline | |
34 {\bf ARM} & 32bit instruction set\\ | |
35 {\bf THUMB} & compressed instruction set using 16bit wide instruction encoding\\ | |
36 \hline | |
37 \end{tabular} | |
38 \\ | |
39 \\ | |
40 For more details, take a look at the ARM-THUMB Procedure Call Standard (ATPCS) \cite{ATPCS}, the Procedure Call Standard for the ARM Architecture (AAPCS) \cite{AAPCS}, as well as the Debian ARM EABI port wiki \cite{armeabi}. | |
41 | |
42 | |
43 \paragraph{\product{dyncall} support} | |
44 | |
45 Currently, the \product{dyncall} library supports the ARM and THUMB mode of the ARM32 family (ATPCS \cite{ATPCS} and EABI \cite{armeabi}), excluding manually triggered ARM-THUMB interworking calls. Although it's quite possible that the current implementation runs on other ARM processor families as well, please note that only the ARMv4t family has been thoroughly tested at the time of writing. Please report if the code runs on other ARM families, too.\\ | |
46 It is important to note, that dyncall supports the ARM architecture calling convention variant {\bf with floating point hardware disabled} (meaning that the FPA and the VFP (scalar mode) procedure call standards are not supported). | |
47 This processor family features some instruction sets accelerating DSP and multimedia application like the ARM Jazelle Technology (direct Java bytecode execution, providing acceleration for some bytecodes while calling software code for others), etc. that are not supported by the dyncall library.\\ | |
48 | |
49 | |
50 \subsubsection{ATPCS ARM mode} | |
51 | |
52 | |
53 \paragraph{Registers and register usage} | |
54 | |
55 In ARM mode, the ARM32 processor has sixteen 32 bit general purpose registers, namely r0-r15:\\ | |
56 \\ | |
57 \begin{table}[h] | |
58 \begin{tabular}{3 B} | |
59 \hline | |
60 Name & Brief description\\ | |
61 \hline | |
62 {\bf r0} & parameter 0, scratch, return value\\ | |
63 {\bf r1} & parameter 1, scratch, return value\\ | |
64 {\bf r2-r3} & parameters 2 and 3, scratch\\ | |
65 {\bf r4-r10} & permanent\\ | |
66 {\bf r11} & frame pointer, permanent\\ | |
67 {\bf r12} & scratch\\ | |
68 {\bf r13} & stack pointer, permanent\\ | |
69 {\bf r14} & link register, permanent\\ | |
70 {\bf r15} & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\ | |
71 \hline | |
72 \end{tabular} | |
73 \caption{Register usage on arm32} | |
74 \end{table} | |
75 | |
76 \paragraph{Parameter passing} | |
77 | |
78 \begin{itemize} | |
79 \item stack parameter order: right-to-left | |
80 \item caller cleans up the stack | |
81 \item first four words are passed using r0-r3 | |
82 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters) | |
83 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack | |
84 \item parameters \textless=\ 32 bits are passed as 32 bit words | |
35 | 85 \item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn't seem to be specified in the ATPCS), with the loword coming first |
0 | 86 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3 |
87 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values}) | |
88 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis) | |
89 \end{itemize} | |
90 | |
91 \paragraph{Return values} | |
92 \begin{itemize} | |
93 \item return values \textless=\ 32 bits use r0 | |
94 \item 64 bit return values use r0 and r1 | |
95 \item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0 | |
96 \end{itemize} | |
97 | |
98 \paragraph{Stack layout} | |
99 | |
100 Stack directly after function prolog:\\ | |
101 | |
102 \begin{figure}[h] | |
103 \begin{tabular}{5|3|1 1} | |
104 \hhline{~-~~} | |
105 & \vdots & & \\ | |
106 \hhline{~=~~} | |
107 register save area & & & \mrrbrace{5}{caller's frame} \\ | |
108 \hhline{~-~~} | |
109 local data & & & \\ | |
110 \hhline{~-~~} | |
111 \mrlbrace{7}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\ | |
112 & \ldots & & \\ | |
113 & \ldots & & \\ | |
114 \hhline{~=~~} | |
115 & r3 & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame} \\ | |
116 & r2 & & \\ | |
117 & r1 & & \\ | |
118 & r0 & & \\ | |
119 \hhline{~-~~} | |
120 register save area (with return address) & & & \\ | |
121 \hhline{~-~~} | |
122 local data & & & \\ | |
123 \hhline{~-~~} | |
124 parameter area & \vdots & & \\ | |
125 \hhline{~-~~} | |
126 \end{tabular} | |
127 \caption{Stack layout on arm32} | |
128 \end{figure} | |
129 | |
130 | |
131 \newpage | |
132 | |
133 \subsubsection{ATPCS THUMB mode} | |
134 | |
135 | |
136 \paragraph{Status} | |
137 | |
138 \begin{itemize} | |
139 \item The ATPCS THUMB mode is untested. | |
140 \item Ellipse calls may not work. | |
141 \item C++ this calls do not work. | |
142 \end{itemize} | |
143 | |
144 \paragraph{Registers and register usage} | |
145 | |
146 In THUMB mode, the ARM32 processor family supports eight 32 bit general purpose registers r0-r7 and access to high order registers r8-r15:\\ | |
147 \\ | |
148 \begin{table}[h] | |
149 \begin{tabular}{3 B} | |
150 \hline | |
151 Name & Brief description\\ | |
152 \hline | |
153 {\bf r0} & parameter 0, scratch, return value\\ | |
154 {\bf r1} & parameter 1, scratch, return value\\ | |
35 | 155 {\bf r2,r3} & parameters 2 and 3, scratch\\ |
0 | 156 {\bf r4-r6} & permanent\\ |
157 {\bf r7} & frame pointer, permanent\\ | |
158 {\bf r8-r11} & permanent\\ | |
159 {\bf r12} & scratch\\ | |
160 {\bf r13} & stack pointer, permanent\\ | |
161 {\bf r14} & link register, permanent\\ | |
162 {\bf r15} & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\ | |
163 \hline | |
164 \end{tabular} | |
165 \caption{Register usage on arm32 thumb mode} | |
166 \end{table} | |
167 | |
168 \paragraph{Parameter passing} | |
169 | |
170 \begin{itemize} | |
171 \item stack parameter order: right-to-left | |
172 \item caller cleans up the stack | |
173 \item first four words are passed using r0-r3 | |
174 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters) | |
175 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack | |
176 \item parameters \textless=\ 32 bits are passed as 32 bit words | |
177 \item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack), although this doesn't seem to be specified in the ATPCS), with the loword coming first | |
178 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3 | |
179 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values}) | |
180 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis) | |
181 \end{itemize} | |
182 | |
183 \paragraph{Return values} | |
184 \begin{itemize} | |
185 \item return values \textless=\ 32 bits use r0 | |
186 \item 64 bit return values use r0 and r1 | |
187 \item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0 | |
188 \end{itemize} | |
189 | |
190 \paragraph{Stack layout} | |
191 | |
192 Stack directly after function prolog:\\ | |
193 | |
194 \begin{figure}[h] | |
195 \begin{tabular}{5|3|1 1} | |
196 \hhline{~-~~} | |
197 & \vdots & & \\ | |
198 \hhline{~=~~} | |
199 register save area & & & \mrrbrace{5}{caller's frame} \\ | |
200 \hhline{~-~~} | |
201 local data & & & \\ | |
202 \hhline{~-~~} | |
203 \mrlbrace{7}{parameter area} & \ldots & \mrrbrace{3}{stack parameters} & \\ | |
204 & \ldots & & \\ | |
205 & \ldots & & \\ | |
206 \hhline{~=~~} | |
207 & r3 & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame} \\ | |
208 & r2 & & \\ | |
209 & r1 & & \\ | |
210 & r0 & & \\ | |
211 \hhline{~-~~} | |
212 register save area (with return address) & & & \\ | |
213 \hhline{~-~~} | |
214 local data & & & \\ | |
215 \hhline{~-~~} | |
216 parameter area & \vdots & & \\ | |
217 \hhline{~-~~} | |
218 \end{tabular} | |
219 \caption{Stack layout on arm32 thumb mode} | |
220 \end{figure} | |
221 | |
222 | |
223 | |
224 \newpage | |
225 | |
226 \subsubsection{EABI (ARM and THUMB mode)} | |
227 | |
228 | |
229 The ARM EABI is very similar to the ABI outlined in ARM-THUMB procedure call | |
230 standard (ATPCS) \cite{ATPCS} - however, the EABI requires the stack to be | |
41 | 231 8-byte aligned at function entries, as well as for 64 bit parameters. The latter |
232 are aligned on 8-byte boundaries on the stack and 2-registers for a parameter | |
0 | 233 passed via register. In order to achieve such an alignment, a register might |
234 have to be skipped for parameters passed via registers, or 4-bytes on the stack | |
35 | 235 for parameters passed via the stack. Refer to the Debian ARM EABI port wiki |
236 for more information \cite{armeabi}. | |
0 | 237 |
238 | |
239 \paragraph{Status} | |
240 | |
241 \begin{itemize} | |
242 \item The EABI THUMB mode is tested and works fine (contrary to the ATPCS). | |
243 \item Ellipse calls do not work. | |
244 \item C++ this calls do not work. | |
245 \end{itemize} | |
246 | |
247 \newpage | |
248 | |
249 \subsubsection{ARM on Apple's iOS (Darwin) Platform} | |
250 | |
251 | |
252 The iOS runs on ARMv6 (iOS 2.0) and ARMv7 (iOS 3.0) architectures. | |
253 Typically code is compiled in Thumb mode. | |
254 | |
255 \paragraph{Register usage} | |
256 | |
257 \begin{table}[h] | |
258 \begin{tabular}{3 B} | |
259 \hline | |
260 Name & Brief description\\ | |
261 \hline | |
262 {\bf R0} & parameter 0, scratch, return value\\ | |
263 {\bf R1} & parameter 1, scratch, return value\\ | |
35 | 264 {\bf R2,R3} & parameters 2 and 3, scratch\\ |
0 | 265 {\bf R4-R6} & permanent\\ |
266 {\bf R7} & frame pointer, permanent\\ | |
267 {\bf R8} & permanent\\ | |
268 {\bf R9} & permanent(iOS 2.0) and scratch (since iOS 3.0)\\ | |
269 {\bf R10-R11}& permanent\\ | |
270 {\bf R12} & scratch, intra-procedure scratch register (IP) used by dynamic linker\\ | |
271 {\bf R13} & stack pointer, permanent\\ | |
272 {\bf R14} & link register, permanent\\ | |
273 {\bf R15} & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\ | |
274 {\bf CPSR} & Program status register\\ | |
35 | 275 {\bf D0-D7} & scratch. aliases S0-S15, on ARMv7 also as Q0-Q3. Not accessible from Thumb mode on ARMv6.\\ |
276 {\bf D8-D15} & permanent, aliases S16-S31, on ARMv7 also as Q4-A7. Not accesible from Thumb mode on ARMv6.\\ | |
0 | 277 {\bf D16-D31}& Only available in ARMv7, aliases Q8-Q15.\\ |
278 {\bf FPSCR} & VFP status register.\\ | |
279 \hline | |
280 \end{tabular} | |
281 \caption{Register usage on ARM Apple iOS} | |
282 \end{table} | |
283 | |
284 The ABI is based on the AAPCS but with some important differences listed below: | |
285 | |
286 \begin{itemize} | |
287 \item R7 instead of R11 is used as frame pointer | |
288 \item R9 is scratch since iOS 3.0, was preserved before. | |
289 \end{itemize} | |
290 | |
34 | 291 |
292 \subsubsection{ARM hard float (armhf)} | |
293 | |
294 | |
35 | 295 Most debian-based Linux systems on ARMv7 (or ARMv6 with FPU) platforms use a calling convention referred to |
41 | 296 as armhf, using 16 32-bit floating point registers of the FPU of the VFPv3-D16 extension to the ARM architecture. |
35 | 297 The instruction set used for armhf is Thumb-2. Refer to the debian wiki for more information \cite{armhf}. |
298 | |
41 | 299 Code is little-endian, rest is similar to EABI with an 8-byte aligned stack, etc.. |
34 | 300 |
301 \paragraph{Register usage} | |
302 | |
303 \begin{table}[h] | |
304 \begin{tabular}{3 B} | |
305 \hline | |
306 Name & Brief description\\ | |
307 \hline | |
42
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
308 {\bf R0} & parameter 0, scratch, non floating point return value\\ |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
309 {\bf R1} & parameter 1, scratch, non floating point return value\\ |
35 | 310 {\bf R2,R3} & parameters 2 and 3, scratch\\ |
311 {\bf R4,R5} & permanent\\ | |
312 {\bf R6} & scratch\\ | |
313 {\bf R7} & frame pointer, permanent\\ | |
314 {\bf R8} & permanent\\ | |
315 {\bf R9,R10} & scratch\\ | |
316 {\bf R11} & permanent\\ | |
317 {\bf R12} & scratch, intra-procedure scratch register (IP) used by dynamic linker\\ | |
318 {\bf R13} & stack pointer, permanent\\ | |
319 {\bf R14} & link register, permanent\\ | |
320 {\bf R15} & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\ | |
321 {\bf CPSR} & Program status register\\ | |
42
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
322 {\bf S0} & floating point argument, floating point return value, single precision\\ |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
323 {\bf D0} & floating point argument, floating point return value, double precision, aliases S0-S1, \\ |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
324 {\bf S1-S15} & floating point arguments, single precision\\ |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
325 {\bf D1-D7} & aliases S2-S15, floating point arguments, double precision\\ |
35 | 326 {\bf FPSCR} & VFP status register.\\ |
34 | 327 \hline |
328 \end{tabular} | |
329 \caption{Register usage on armhf} | |
330 \end{table} | |
331 | |
35 | 332 \paragraph{Parameter passing} |
333 | |
334 \begin{itemize} | |
335 \item stack parameter order: right-to-left | |
336 \item caller cleans up the stack | |
337 \item first four non-floating-point words are passed using r0-r3 | |
41 | 338 \item first 16 single-precision, or 8 double-precision arguments are passed via s0-s15 or d0-d7, respectively (note that since s and d registers are aliased, already used ones are skipped) |
35 | 339 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters) |
45
e5cdf4b4d813
- armhf callback fix for calls with >= 64byte of floating point params where d7 is filled before all args are pushed
cslag
parents:
42
diff
changeset
|
340 \item note that as soon as d7 is used, subsequent single precision floating point parameters are also pushed onto the stack, even if there are still free S* registers |
41 | 341 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack @@@?check spilling of float args, also |
35 | 342 \item parameters \textless=\ 32 bits are passed as 32 bit words |
343 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3 @@@?check doc | |
344 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values}) | |
345 \end{itemize} | |
346 | |
347 \paragraph{Return values} | |
348 \begin{itemize} | |
42
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
349 \item non floating point return values \textless=\ 32 bits use r0 |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
350 \item non floating point 64-bit return values use r0 and r1 |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
351 \item single precision floating point return value uses s0 |
ecc9403e214a
- final touch for complete armhf callback support, yay
cslag
parents:
41
diff
changeset
|
352 \item double precision floating point return value uses d0 |
35 | 353 \item if return value is a structure, the caller allocates space for the return value on the stack in its frame and passes a pointer to it in r0 |
354 \end{itemize} | |
355 | |
356 | |
357 | |
34 | 358 |
0 | 359 \subsubsection{Architectures} |
360 | |
361 The ARM architecture family contains several revisions with capabilities and | |
34 | 362 extensions (such as thumb-interworking, more vector registers, ...) |
363 The following table sums up the most important properties of the various | |
364 architecture standards, from a calling convention perspective. | |
0 | 365 |
366 % iPhone 3GS : ARM Cortex-A8 | |
367 % Nintendo DS: ARM 7 and ARM 9 | |
368 % ARM 7: ARMv4T | |
369 % ARM 9: ARMv4T, HTC Wizard | |
34 | 370 % Cortex-*: ARMv7, Raspberry Pi 2, ... |
0 | 371 |
372 \begin{table}[h] | |
373 \begin{tabular}{lll} | |
374 Arch & Platforms & Details \\ | |
375 \hline | |
376 ARMv4 & & \\ | |
377 \hline | |
378 ARMv4T & ARM 7, ARM 9, Neo FreeRunner (OpenMoko) & \\ | |
379 \hline | |
34 | 380 ARMv5 & ARM 9E & BLX instruction available \\ |
0 | 381 \hline |
382 ARMv6 & & No vector registers available in thumb \\ | |
383 \hline | |
34 | 384 ARMv7 & iPod touch, iPhone 3GS/4, Raspberry Pi 2 & VFP throughout available, armhf calling convention on some platforms \\ |
385 \hline | |
386 ARMv8 & iPhone 6 and higher & 64bit support \\ | |
0 | 387 \hline |
388 \end{tabular} | |
389 \caption{Overview of ARM Architecture, Platforms and Details} | |
390 \end{table} | |
391 | |
392 \newpage | |
393 |