changeset 328:276eb8c87aa0

- review and fixes, cleanup, amendments to calling convention appendix of manual
author Tassilo Philipp
date Fri, 22 Nov 2019 23:11:56 +0100
parents c0390dc85a07
children 8b0fc583ce62
files doc/manual/callconvs/callconv_arm32.tex doc/manual/callconvs/callconv_arm64.tex doc/manual/callconvs/callconv_mips32.tex doc/manual/callconvs/callconv_mips64.tex doc/manual/callconvs/callconv_ppc32.tex doc/manual/callconvs/callconv_ppc64.tex doc/manual/callconvs/callconv_sparc32.tex doc/manual/callconvs/callconv_sparc64.tex doc/manual/callconvs/callconv_x64.tex doc/manual/callconvs/callconv_x86.tex
diffstat 10 files changed, 1695 insertions(+), 1576 deletions(-) [+]
line wrap: on
line diff
--- a/doc/manual/callconvs/callconv_arm32.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_arm32.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,5 +1,6 @@
+%//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2007,2010 Daniel Adler <dadler@uni-goettingen.de>,
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>,
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -14,11 +15,12 @@
 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 %
+%//////////////////////////////////////////////////////////////////////////////
 
 % ==================================================
 % ARM32
 % ==================================================
-\subsection{ARM32 Calling Convention}
+\subsection{ARM32 Calling Conventions}
 
 \paragraph{Overview}
 
@@ -35,13 +37,23 @@
 \end{tabular*}
 \\
 \\
-For more details, take a look at the ARM-THUMB Procedure Call Standard (ATPCS) \cite{ATPCS}, the Procedure Call Standard for the ARM Architecture (AAPCS) \cite{AAPCS}, as well as the Debian ARM EABI port wiki \cite{armeabi}.\\
-\\
+For more details, take a look at the ARM-THUMB Procedure Call Standard (ATPCS)
+\cite{ATPCS}, the Procedure Call Standard for the ARM Architecture (AAPCS)
+\cite{AAPCS}, as well as Debian's ARM EABI port \cite{armeabi} and hard-float
+\cite{armhf} wiki pages.\\ \\
+
 \paragraph{\product{dyncall} support}
 
-Currently, the \product{dyncall} library supports the ARM and THUMB mode of the ARM32 family (ATPCS \cite{ATPCS} and EABI \cite{armeabi}), excluding manually triggered ARM-THUMB interworking calls. Although it's quite possible that the current implementation runs on other ARM processor families as well, please note that only the ARMv4t family has been thoroughly tested at the time of writing. Please report if the code runs on other ARM families, too.\\
-It is important to note, that dyncall supports the ARM architecture calling convention variant {\bf with floating point hardware disabled} (meaning that the FPA and the VFP (scalar mode) procedure call standards are not supported).
-This processor family features some instruction sets accelerating DSP and multimedia application like the ARM Jazelle Technology (direct Java bytecode execution, providing acceleration for some bytecodes while calling software code for others), etc. that are not supported by the dyncall library.\\
+Currently, the \product{dyncall} library supports the ARM and THUMB mode of the
+ARM32 family (ATPCS \cite{ATPCS}, EABI \cite{armeabi}, the ARM hard-float
+(armhf) \cite{armeabi} varian, as well as Apple's calling convention based on
+the ATPCS), excluding manually triggered ARM-THUMB interworking calls.\\
+Also supported is armhf, a calling convention with register support to pass
+floating point numbers. FPA and the VFP (scalar mode) procedure call standards,
+as well as some instruction sets accelerating DSP and multimedia application
+like the ARM Jazelle Technology (direct Java bytecode execution, providing
+acceleration for some bytecodes while calling software code for others), etc.,
+are not supported by the dyncall library.\\
 
 
 \subsubsection{ATPCS ARM mode}
@@ -52,18 +64,19 @@
 In ARM mode, the ARM32 processor has sixteen 32 bit general purpose registers, namely r0-r15:\\
 \\
 \begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name         & Brief description\\
+\begin{tabular*}{0.95\textwidth}{lll}
+Name        & Alias       & Brief description\\
 \hline
-{\bf r0}     & parameter 0, scratch, return value\\
-{\bf r1}     & parameter 1, scratch, return value\\
-{\bf r2-r3}  & parameters 2 and 3, scratch\\
-{\bf r4-r10} & permanent\\
-{\bf r11}    & frame pointer, permanent\\
-{\bf r12}    & scratch\\
-{\bf r13}    & stack pointer, permanent\\
-{\bf r14}    & link register, permanent\\
-{\bf r15}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
+{\bf r0}    & {\bf a1}    & parameter 0, scratch, return value\\
+{\bf r1}    & {\bf a2}    & parameter 1, scratch, return value\\
+{\bf r2,r3} & {\bf a3,a4} & parameters 2 and 3, scratch\\
+{\bf r4-r9} & {\bf v1-v6} & permanent\\
+{\bf r10}   & {\bf sl}    & permanent\\
+{\bf r11}   & {\bf fp}    & frame pointer, permanent\\
+{\bf r12}   & {\bf ip}    & scratch\\
+{\bf r13}   & {\bf sp}    & stack pointer, permanent\\
+{\bf r14}   & {\bf lr}    & link register, permanent\\
+{\bf r15}   & {\bf pc}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
 \end{tabular*}
 \caption{Register usage on arm32}
 \end{table}
@@ -77,7 +90,7 @@
 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
-\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn't seem to be specified in the ATPCS), with the loword coming first
+\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack, although this doesn't seem to be specified in the ATPCS)
 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3
 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
@@ -92,32 +105,31 @@
 
 \paragraph{Stack layout}
 
+% verified/amended: TP nov 2019 (see also doc/disas_examples/arm.atpcs_arm.disas)
 Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                         & \vdots       &                                      &                              \\
+                                         & \vdots               &                                      &                              \\
 \hhline{~=~~}
-register save area                       & \hspace{4cm} &                                      & \mrrbrace{5}{caller's frame} \\
+register save area                       & \hspace{4cm}         &                                      & \mrrbrace{5}{caller's frame} \\
 \hhline{~-~~}
-local data                               &              &                                      &                              \\
+local data                               &                      &                                      &                              \\
 \hhline{~-~~}
-\mrlbrace{7}{parameter area}             & \ldots       & \mrrbrace{3}{stack parameters}       &                              \\
-                                         & \ldots       &                                      &                              \\
-                                         & \ldots       &                                      &                              \\
+\mrlbrace{7}{parameter area}             & last arg             & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots               &                                      &                              \\
+                                         & 5th word of arg data &                                      &                              \\
 \hhline{~=~~}
-                                         & r3           & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
-                                         & r2           &                                      &                              \\
-                                         & r1           &                                      &                              \\
-                                         & r0           &                                      &                              \\
+                                         & r3                   & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
+                                         & r2                   &                                      &                              \\
+                                         & r1                   &                                      &                              \\
+                                         & r0                   &                                      &                              \\
 \hhline{~-~~}
-register save area (with return address) &              &                                      &                              \\
+register save area (with return address) &                      &                                      &                              \\ %fp points here to 1st word of this area: $\leftarrow$ fp
 \hhline{~-~~}
-local data                               &              &                                      &                              \\
+local data                               &                      &                                      &                              \\
 \hhline{~-~~}
-parameter area                           & \vdots       &                                      &                              \\
-\hhline{~-~~}
+parameter area                           & \vdots               &                                      &                              \\
 \end{tabular}
 \caption{Stack layout on arm32}
 \end{figure}
@@ -125,6 +137,7 @@
 
 \newpage
 
+
 \subsubsection{ATPCS THUMB mode}
 
 
@@ -141,19 +154,19 @@
 In THUMB mode, the ARM32 processor family supports eight 32 bit general purpose registers r0-r7 and access to high order registers r8-r15:\\
 \\
 \begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name         & Brief description\\
+\begin{tabular*}{0.95\textwidth}{lll}
+Name         & Alias       & Brief description\\
 \hline
-{\bf r0}     & parameter 0, scratch, return value\\
-{\bf r1}     & parameter 1, scratch, return value\\
-{\bf r2,r3}  & parameters 2 and 3, scratch\\
-{\bf r4-r6}  & permanent\\
-{\bf r7}     & frame pointer, permanent\\
-{\bf r8-r11} & permanent\\
-{\bf r12}    & scratch\\
-{\bf r13}    & stack pointer, permanent\\
-{\bf r14}    & link register, permanent\\
-{\bf r15}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
+{\bf r0}     & {\bf a1}    & parameter 0, scratch, return value\\
+{\bf r1}     & {\bf a2}    & parameter 1, scratch, return value\\
+{\bf r2,r3}  & {\bf a3,a4} & parameters 2 and 3, scratch\\
+{\bf r4-r6}  & {\bf v1-v3} & permanent\\
+{\bf r7}     & {\bf v4}    & frame pointer, permanent\\
+{\bf r8-r11} & {\bf v5-v8} & permanent\\
+{\bf r12}    & {\bf ip}    & scratch\\
+{\bf r13}    & {\bf sp}    & stack pointer, permanent\\
+{\bf r14}    & {\bf lr}    & link register, permanent\\
+{\bf r15}    & {\bf pc}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
 \end{tabular*}
 \caption{Register usage on arm32 thumb mode}
 \end{table}
@@ -167,7 +180,7 @@
 \item subsequent parameters are pushed onto the stack (in right to left order, such that the stack pointer points to the first of the remaining parameters)
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
-\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack), although this doesn't seem to be specified in the ATPCS), with the loword coming first
+\item 64 bit parameters are passed as two 32 bit parts (even partly via the register and partly via the stack), although this doesn't seem to be specified in the ATPCS)
 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3
 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values})
 \item keeping the stack eight-byte aligned can improve memory access performance and is required by LDRD and STRD on ARMv5TE processors which are part of the ARM32 family, so, in order to avoid problems one should always align the stack (tests have shown, that GCC does care about the alignment when using the ellipsis)
@@ -186,35 +199,33 @@
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                         & \vdots       &                                      &                              \\
-\hhline{~=~~}
-register save area                       & \hspace{4cm} &                                      & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-local data                               &              &                                      &                              \\
-\hhline{~-~~}
-\mrlbrace{7}{parameter area}             & \ldots       & \mrrbrace{3}{stack parameters}       &                              \\
-                                         & \ldots       &                                      &                              \\
-                                         & \ldots       &                                      &                              \\
-\hhline{~=~~}
-                                         & r3           & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
-                                         & r2           &                                      &                              \\
-                                         & r1           &                                      &                              \\
-                                         & r0           &                                      &                              \\
-\hhline{~-~~}
-register save area (with return address) &              &                                      &                              \\
-\hhline{~-~~}
-local data                               &              &                                      &                              \\
-\hhline{~-~~}
-parameter area                           & \vdots       &                                      &                              \\
-\hhline{~-~~}
+                                         & \vdots               &                                      &                              \\
+\hhline{~=~~}                                                  
+register save area                       & \hspace{4cm}         &                                      & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}                                                  
+local data                               &                      &                                      &                              \\
+\hhline{~-~~}                                                  
+\mrlbrace{7}{parameter area}             & last arg             & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots               &                                      &                              \\
+                                         & 5th word of arg data &                                      &                              \\
+\hhline{~=~~}                                                  
+                                         & r3                   & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
+                                         & r2                   &                                      &                              \\
+                                         & r1                   &                                      &                              \\
+                                         & r0                   &                                      &                              \\
+\hhline{~-~~}                                                  
+register save area (with return address) &                      &                                      &                              \\ %fp points here to 1st word of this area: $\leftarrow$ fp
+\hhline{~-~~}                                                  
+local data                               &                      &                                      &                              \\
+\hhline{~-~~}                                                  
+parameter area                           & \vdots               &                                      &                              \\
 \end{tabular}
 \caption{Stack layout on arm32 thumb mode}
 \end{figure}
 
 
+\newpage
 
-\newpage
 
 \subsubsection{EABI (ARM and THUMB mode)}
 
@@ -236,83 +247,119 @@
 \item C++ this calls do not work.
 \end{itemize}
 
+
 \newpage
 
-\subsubsection{ARM on Apple's iOS (Darwin) Platform}
+
+\subsubsection{ARM on Apple's iOS (Darwin) Platform (ARM and THUMB mode)}
 
 
-The iOS runs on ARMv6 (iOS 2.0) and ARMv7 (iOS 3.0) architectures.
-Typically code is compiled in Thumb mode.\\
+The iOS runs on ARMv6 (iOS 2.0) and ARMv7 (iOS 3.0) architectures. Both, ARM and THUMB are available,
+code is usually compiled in THUMB mode.\\
 \\
 \paragraph{Register usage}
 
 \begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name         & Brief description\\
+\begin{tabular*}{0.95\textwidth}{lll}
+Name         & Alias    & Brief description\\
 \hline
-{\bf R0}     & parameter 0, scratch, return value\\
-{\bf R1}     & parameter 1, scratch, return value\\
-{\bf R2,R3}  & parameters 2 and 3, scratch\\
-{\bf R4-R6}  & permanent\\
-{\bf R7}     & frame pointer, permanent\\
-{\bf R8}     & permanent\\
-{\bf R9}     & permanent(iOS 2.0) and scratch (since iOS 3.0)\\
-{\bf R10-R11}& permanent\\
-{\bf R12}    & scratch, intra-procedure scratch register (IP) used by dynamic linker\\
-{\bf R13}    & stack pointer, permanent\\
-{\bf R14}    & link register, permanent\\
-{\bf R15}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
-{\bf CPSR}   & Program status register\\
-{\bf D0-D7}  & scratch. aliases S0-S15, on ARMv7 also as Q0-Q3. Not accessible from Thumb mode on ARMv6.\\
-{\bf D8-D15} & permanent, aliases S16-S31, on ARMv7 also as Q4-A7. Not accesible from Thumb mode on ARMv6.\\
-{\bf D16-D31}& Only available in ARMv7, aliases Q8-Q15.\\
-{\bf FPSCR}  & VFP status register.\\
+{\bf r0}     &          & parameter 0, scratch, return value\\
+{\bf r1}     &          & parameter 1, scratch, return value\\
+{\bf r2,r3}  &          & parameters 2 and 3, scratch\\
+{\bf r4-r6}  &          & permanent\\
+{\bf r7}     &          & frame pointer, permanent\\
+{\bf r8}     &          & permanent\\
+{\bf r9}     &          & permanent (iOS 2.0) / scratch (since iOS 3.0)\\
+{\bf r10-r11}&          & permanent\\
+{\bf r12}    &          & scratch, intra-procedure scratch register (IP) used by dynamic linker\\
+{\bf r13}    & {\bf sp} & stack pointer, permanent\\
+{\bf r14}    & {\bf lr} & link register, permanent\\
+{\bf r15}    & {\bf pc} & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
+{\bf cpsr}   &          & program status register\\
+{\bf d0-d7}  &          & scratch, aliases s0-s15, on ARMv7 also as q0-q3; not accessible from Thumb mode on ARMv6\\
+{\bf d8-d15} &          & permanent, aliases s16-s31, on ARMv7 also as q4-q7; not accesible from Thumb mode on ARMv6\\
+{\bf d16-d31}&          & only available in ARMv7, aliases q8-q15\\
+{\bf fpscr}  &          & VFP status register\\
 \end{tabular*}
 \caption{Register usage on ARM Apple iOS}
 \end{table}
 
-The ABI is based on the AAPCS but with some important differences listed below:
+\paragraph{Parameter passing and Return values}
+
+The ABI is based on the AAPCS but with the following important differences:
 
 \begin{itemize}
-\item R7 instead of R11 is used as frame pointer
-\item R9 is scratch since iOS 3.0, was preserved before.
+\item in ARM mode, r7 is used as frame pointer instead of r11 (so both, ARM and THUMB mode use the same convention)
+\item r9 does not need to be preserved on iOS 3.0 and greater
 \end{itemize}
 
 
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/arm.darwin_{arm,thumb}.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                         & \vdots               &                                      &                              \\
+\hhline{~=~~}                                                  
+register save area                       & \hspace{4cm}         &                                      & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}                                                  
+local data                               &                      &                                      &                              \\
+\hhline{~-~~}                                                  
+\mrlbrace{7}{parameter area}             & last arg             & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots               &                                      &                              \\
+                                         & 5th word of arg data @@@verify &                                      &                              \\
+\hhline{~=~~}                                                  
+                                         & r3                   & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
+                                         & r2                   &                                      &                              \\
+                                         & r1                   &                                      &                              \\
+                                         & r0                   &                                      &                              \\
+\hhline{~-~~}                                                  
+register save area (with return address) &                      &                                      &                              \\ %fp points here to 1st word of this area: $\leftarrow$ fp
+\hhline{~-~~}                                                  
+local data                               &                      &                                      &                              \\
+\hhline{~-~~}                                                  
+parameter area                           & \vdots               &                                      &                              \\
+\end{tabular}
+\caption{Stack layout on arm32}
+\end{figure}
+
+
+\newpage
+
+
 \subsubsection{ARM hard float (armhf)}
 
 
 Most debian-based Linux systems on ARMv7 (or ARMv6 with FPU) platforms use a calling convention referred to
 as armhf, using 16 32-bit floating point registers of the FPU of the VFPv3-D16 extension to the ARM architecture.
-The instruction set used for armhf is Thumb-2. Refer to the debian wiki for more information \cite{armhf}.
+Refer to the debian wiki for more information \cite{armhf}. % The following is for ARM mode, find platform that uses thumb+hard-float @@@
 
 Code is little-endian, rest is similar to EABI with an 8-byte aligned stack, etc..\\
 \\
 \paragraph{Register usage}
 
 \begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name         & Brief description\\
-\hline
-{\bf R0}     & parameter 0, scratch, non floating point return value\\
-{\bf R1}     & parameter 1, scratch, non floating point return value\\
-{\bf R2,R3}  & parameters 2 and 3, scratch\\
-{\bf R4,R5}  & permanent\\
-{\bf R6}     & scratch\\
-{\bf R7}     & frame pointer, permanent\\
-{\bf R8}     & permanent\\
-{\bf R9,R10} & scratch\\
-{\bf R11}    & permanent\\
-{\bf R12}    & scratch, intra-procedure scratch register (IP) used by dynamic linker\\
-{\bf R13}    & stack pointer, permanent\\
-{\bf R14}    & link register, permanent\\
-{\bf R15}    & program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
-{\bf CPSR}   & Program status register\\
-{\bf S0}     & floating point argument, floating point return value, single precision\\
-{\bf D0}     & floating point argument, floating point return value, double precision, aliases S0-S1, \\
-{\bf S1-S15} & floating point arguments, single precision\\
-{\bf D1-D7}  & aliases S2-S15, floating point arguments, double precision\\
-{\bf FPSCR}  & VFP status register.\\
+\begin{tabular*}{0.95\textwidth}{lll}
+Name         & Alias       &  Brief description\\
+\hline          
+{\bf r0}     & {\bf a1}    &  parameter 0, scratch, non floating point return value\\
+{\bf r1}     & {\bf a2}    &  parameter 1, scratch, non floating point return value\\
+{\bf r2,r3}  & {\bf a3,a4} &  parameters 2 and 3, scratch\\
+{\bf r4-r9}  & {\bf v1-v6} &  permanent\\
+{\bf r10}    & {\bf sl}    &  permanent\\
+{\bf r11}    & {\bf fp}    &  frame pointer, permanent\\
+{\bf r12}    & {\bf ip}    &  scratch, intra-procedure scratch register (IP) used by dynamic linker\\
+{\bf r13}    & {\bf sp}    &  stack pointer, permanent\\
+{\bf r14}    & {\bf lr}    &  link register, permanent\\
+{\bf r15}    & {\bf pc}    &  program counter (note: due to pipeline, r15 points to 2 instructions ahead)\\
+{\bf cpsr}   &             &  program status register\\
+{\bf s0}     &             &  floating point argument, floating point return value, single precision\\
+{\bf d0}     &             &  floating point argument, floating point return value, double precision, aliases s0-s1\\
+{\bf s1-s15} &             &  floating point arguments, single precision\\
+{\bf d1-d7}  &             &  aliases s2-s15, floating point arguments, double precision\\
+{\bf fpscr}  &             &  VFP status register\\
 \end{tabular*}
 \caption{Register usage on armhf}
 \end{table}
@@ -330,7 +377,7 @@
 \item float and double vararg function parameters (no matter if in ellipsis part of function, or not) are passed like int or long long parameters, vfp registers aren't used
 \item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first four words (for first 4 integer arguments) to a reserved stack area adjacent to the other parameters on the stack
 \item parameters \textless=\ 32 bits are passed as 32 bit words
-\item structures and unions are passed by value, with the first four words of the parameters in r0-r3 @@@?check doc
+\item structures and unions are passed by value, with the first four words of the parameters in r0-r3
 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc. (see {\bf return values})
 \item callee spills, caller reserves spill area space, though
 \end{itemize}
@@ -346,29 +393,31 @@
 
 \paragraph{Stack layout}
 
+% verified/amended: TP nov 2019 (see also doc/disas_examples/arm.armhf.disas)
 Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                         & \vdots       &                                      &                              \\
+                                         & \vdots                     &                                      &                              \\
 \hhline{~=~~}
-register save area                       & \hspace{4cm} &                                      & \mrrbrace{6}{caller's frame} \\
+register save area                       & \hspace{4cm}               &                                      & \mrrbrace{5}{caller's frame} \\
 \hhline{~-~~}
-local data                               &              &                                      &                              \\
-\hhline{~-~~}
-\mrlbrace{4}{parameter area}             & r0-r3        & \mrrbrace{1}{spill area (if needed)} &                              \\
+local data                               &                            &                                      &                              \\
 \hhline{~-~~}
-                                         & \ldots       & \mrrbrace{3}{stack parameters}       &                              \\
-                                         & \ldots       &                                      &                              \\
-                                         & \ldots       &                                      &                              \\
+\mrlbrace{7}{parameter area}             & last arg                   & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots                     &                                      &                              \\
+                                         & first arg passed via stack &                                      &                              \\
 \hhline{~=~~}
-register save area (with return address) &              &                                      & \mrrbrace{3}{current frame}  \\
+                                         & r3                         & \mrrbrace{4}{spill area (if needed)} & \mrrbrace{7}{current frame}  \\
+                                         & r2                         &                                      &                              \\
+                                         & r1                         &                                      &                              \\
+                                         & r0                         &                                      &                              \\
 \hhline{~-~~}
-local data                               &              &                                      &                              \\
+register save area (with return address) &                            &                                      &                              \\ %fp points here to 1st word of this area: $\leftarrow$ fp
 \hhline{~-~~}
-parameter area                           & \vdots       &                                      &                              \\
+local data                               &                            &                                      &                              \\
 \hhline{~-~~}
+parameter area                           & \vdots                     &                                      &                              \\
 \end{tabular}
 \caption{Stack layout on arm32 armhf}
 \end{figure}
@@ -394,15 +443,16 @@
 \begin{tabular*}{0.95\textwidth}{lll}
 Arch   & Platforms & Details \\
 \hline
-ARMv4  & & \\
-ARMv4T & ARM 7, ARM 9, Neo FreeRunner (OpenMoko) & \\
-ARMv5  & ARM 9E & BLX instruction available \\
-ARMv6  & & No vector registers available in thumb \\
-ARMv7  & iPod touch, iPhone 3GS/4, Raspberry Pi 2 & VFP throughout available, armhf calling convention on some platforms \\
-ARMv8  & iPhone 6 and higher & 64bit support \\
+ARMv4  &                                          & \\
+ARMv4T & ARM 7, ARM 9, Neo FreeRunner (OpenMoko)  & \\
+ARMv5  & ARM 9E                                   & BLX instruction available \\
+ARMv6  &                                          & No vector registers available in thumb \\
+ARMv7  & iPod touch, iPhone 3GS/4, Raspberry Pi 2 & VFP, armhf convention on some platforms \\
+ARMv8  & iPhone 6 and higher                      & 64bit support \\
 \end{tabular*}
 \caption{Overview of ARM Architecture, Platforms and Details}
 \end{table}
 
+
 \newpage
 
--- a/doc/manual/callconvs/callconv_arm64.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_arm64.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,5 +1,6 @@
+%//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2014,2015 Daniel Adler <dadler@uni-goettingen.de>, 
+% Copyright (c) 2014-2019 Daniel Adler <dadler@uni-goettingen.de>, 
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -14,11 +15,12 @@
 % ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 % OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 %
+%//////////////////////////////////////////////////////////////////////////////
 
 % ==================================================
 % ARM64
 % ==================================================
-\subsection{ARM64 Calling Convention}
+\subsection{ARM64 Calling Conventions}
 
 \paragraph{Overview}
 
@@ -28,17 +30,21 @@
 
 \paragraph{\product{dyncall} support}
 
-The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, for calls and callbacks.
+The \product{dyncall} library supports the ARM 64-bit AArch64 PCS ABI, as well as Apple's convention derived from it, for calls and callbacks.
 
 \subsubsection{AAPCS64 Calling Convention}
 
 \paragraph{Registers and register usage}
 
-ARM64 features thirty-one 64 bit general purpose registers, namely x0-x30.
-Also, there is SP, a register with restricted use, used for the stack pointer,
-and PC dedicated as program counter. Additionally, there are thirty-two 128 bit
-registers v0-v31, to be used as SIMD and floating point registers, referred to
-as q0-q31, d0-d31 and s0-s31, respectively, depending on their use:\\
+ARM64 features thirty-one 64 bit general purpose registers, namely {\bf r0-r30},
+which are referred to as either {\bf x0-x30} for 64bit access, or {\bf w0-w30}
+for 32bit access (with upper bits either cleared or sign extended on load).\\
+Also, there is {\bf sp/xzr/wzr}, a register with restricted use, used for the
+stack pointer in instructions dealing with the stack ({\bf sp}) or a hardware
+zero register for all other instructions {\bf xzr/wzr}, and {\bf pc}, the
+program counter. Additionally, there are thirty-two 128 bit registers {\bf v0-v31},
+to be used as SIMD and floating point registers, referred to as {\bf q0-q31}, {\bf d0-d31}
+and {\bf s0-s31}, respectively, depending on their use:\\
 
 \begin{table}[h]
 \begin{tabular*}{0.95\textwidth}{3 B}
@@ -53,8 +59,8 @@
 {\bf x19-x28} & permanent\\
 {\bf x29}     & permanent, frame pointer\\
 {\bf x30}     & permanent, link register\\
-{\bf SP}      & permanent, stack pointer\\
-{\bf PC}      & program counter\\
+{\bf sp}      & permanent, stack pointer\\
+{\bf pc}      & program counter\\
 \end{tabular*}
 \caption{Register usage on arm64}
 \end{table}
@@ -67,7 +73,7 @@
 \item first 8 integer arguments are passed using x0-x7
 \item first 8 floating point arguments are passed using d0-d7
 \item subsequent parameters are pushed onto the stack
-\item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed parameters require saving, though)
+\item if the callee takes the address of one of the parameters and uses it to address other parameters (e.g. varargs) it has to copy - in its prolog - the first 8 integer and 8 floating-point registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though)
 \item structures and unions are passed by value, with the first four words of the parameters in r0-r3
 \item if return value is a structure, a pointer pointing to the return value's space is passed in r0, the first parameter in r1, etc... (see {\bf return values})
 \item stack is required to be throughout eight-byte aligned
@@ -82,41 +88,33 @@
 
 \paragraph{Stack layout}
 
+% verified/amended: TP nov 2019 (see also doc/disas_examples/arm64.aapcs.disas)
 Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                   & \vdots       &                                       &                              \\
+                                         & \vdots                 &                                      &                              \\
 \hhline{~=~~}                                                                            
-register save area                 & \hspace{4cm} &                                       & \mrrbrace{5}{caller's frame} \\
+register save area                       & \hspace{4cm}           &                                      & \mrrbrace{5}{caller's frame} \\
 \hhline{~-~~}                                                                            
-local data                         &              &                                       &                              \\
+local data                               &                        &                                      &                              \\
 \hhline{~-~~}                                                                             
-\mrlbrace{13}{parameter area}      & \ldots       & \mrrbrace{3}{stack parameters}        &                              \\
-                                   & \ldots       &                                       &                              \\
-                                   & \ldots       &                                       &                              \\
+\mrlbrace{9}{parameter area}             & arg n-1                & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots                 &                                      &                              \\
+                                         & arg 8                  &                                      &                              \\
 \hhline{~=~~}                                     
-                                   & x0           & \mrrbrace{10}{spill area (if needed)} & \mrrbrace{15}{current frame} \\
-                                   & x1           &                                       &                              \\
-                                   & \ldots       &                                       &                              \\
-                                   & x2           &                                       &                              \\
-                                   & x7           &                                       &                              \\
-                                   & d0           &                                       &                              \\
-                                   & d1           &                                       &                              \\
-                                   & \ldots       &                                       &                              \\
-                                   & d2           &                                       &                              \\
-                                   & d7           &                                       &                              \\
+                                         & x7                     & \mrrbrace{6}{spill area (if needed)} & \mrrbrace{9}{current frame}  \\
+                                         & \ldots                 &                                      &                              \\
+                                         & x? (first unnamed reg) &                                      &                              \\
+                                         & q7                     &                                      &                              \\
+                                         & \ldots                 &                                      &                              \\
+                                         & q0                     &                                      &                              \\
 \hhline{~-~~}                                                                             
-register save area                 &              &                                       &                              \\
-\hhline{~-~~}                                                                             
-local data                         &              &                                       &                              \\
+register save area (with return address) &                        &                                      &                              \\ % fp will point here (to 1st arg) @@@ verify
 \hhline{~-~~}                                                                             
-link and frame register            & x30          &                                       &                              \\
-                                   & x29          &                                       &                              \\
+local data                               &                        &                                      &                              \\
 \hhline{~-~~}                                                                             
-parameter area                     & \vdots       &                                       &                              \\
-\hhline{~-~~}
+parameter area                           & \vdots                 &                                      &                              \\
 \end{tabular}
 \caption{Stack layout on arm64}
 \end{figure}
--- a/doc/manual/callconvs/callconv_mips32.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_mips32.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,200 +1,208 @@
-%//////////////////////////////////////////////////////////////////////////////
-%
-% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
-%                         Tassilo Philipp <tphilipp@potion-studios.com>
-%
-% Permission to use, copy, modify, and distribute this software for any
-% purpose with or without fee is hereby granted, provided that the above
-% copyright notice and this permission notice appear in all copies.
-%
-% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-%
-%//////////////////////////////////////////////////////////////////////////////
-
-\subsection{MIPS32 Calling Convention}
-
-\paragraph{Overview}
-
-Multiple revisions of the MIPS Instruction set exist, namely MIPS I, MIPS II, MIPS III, MIPS IV, MIPS32 and MIPS64.
-Nowadays, MIPS32 and MIPS64 are the main ones used for 32-bit and 64-bit instruction sets, respectively.\\
-Given MIPS processors are often used for embedded devices, several add-on extensions exist for the MIPS family, for example: 
-
-\begin{description}
-\item [MIPS-3D] simple floating-point SIMD instructions dedicated to common 3D tasks.
-\item [MDMX] (MaDMaX) more extensive integer SIMD instruction set using 64 bit floating-point registers.
-\item [MIPS16e] adds compression to the instruction stream to make programs take up less room (allegedly a response to the THUMB instruction set of the ARM architecture).
-\item [MIPS MT] multithreading additions to the system similar to HyperThreading.
-\end{description}
-
-Unfortunately, there is actually no such thing as "The MIPS Calling Convention". Many possible conventions are used
-by many different environments such as \emph{O32}\cite{MIPSo32}, \emph{O64}\cite{MIPSo64}, \emph{N32}\cite{MIPSn32/n64}, \emph{N64}\cite{MIPSn32/n64}, \emph{EABI}\cite{MIPSeabi} and \emph{NUBI}\cite{MIPSnubi}.\\
-
-\paragraph{\product{dyncall} support}
-
-Currently, dyncall supports for MIPS 32-bit architectures the widely-used O32 calling convention (for all four combinations of big/little-endian, and soft/hard-float targets),
-as well as EABI (little-endian/hard-float, which is used on the Homebrew SDK for the Playstation Portable). \product{dyncall} currently does not support MIPS16e
-(contrary to the like-minded ARM-THUMB, which is supported). Both, calls and callbacks are supported.
-
-\subsubsection{MIPS EABI 32-bit Calling Convention}
-
-\paragraph{Register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{lll}
-Name                                   & Alias                & Brief description\\
-\hline
-{\bf \$0}                              & {\bf \$zero}         & hardware zero \\
-{\bf \$1}                              & {\bf \$at}           & assembler temporary \\
-{\bf \$2-\$3}                          & {\bf \$v0-\$v1}      & integer results \\
-{\bf \$4-\$11}                         & {\bf \$a0-\$a7}      & integer arguments, or double precision float arguments\\
-{\bf \$12-\$15,\$24}                   & {\bf \$t4-\$t7,\$t8} & integer temporaries \\
-{\bf \$25}                             & {\bf \$t9}           & integer temporary, holds address of called function for PIC calls (by convention) \\
-{\bf \$16-\$23}                        & {\bf \$s0-\$s7}      & preserved \\
-{\bf \$26,\$27}                        & {\bf \$kt0,\$kt1}    & reserved for kernel \\
-{\bf \$28}                             & {\bf \$gp}           & global pointer, preserve \\
-{\bf \$29}                             & {\bf \$sp}           & stack pointer, preserve \\
-{\bf \$30}                             & {\bf \$s8}           & frame pointer, preserve \\
-{\bf \$31}                             & {\bf \$ra}           & return address, preserve \\
-{\bf hi, lo}                           &                      & multiply/divide special registers \\
-{\bf \$f0,\$f2}                        &                      & float results \\
-{\bf \$f1,\$f3,\$f4-\$f11,\$f20-\$f23} &                      & float temporaries \\
-{\bf \$f12-\$f19}                      &                      & single precision float arguments \\
-\end{tabular*}
-\caption{Register usage on MIPS32 EABI calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item Stack grows down
-\item Stack parameter order: right-to-left
-\item Caller cleans up the stack
-\item first 8 integers (\textless=\ 32bit) are passed in registers \$a0-\$a7
-\item first 8 single precision floating point arguments are passed in registers \$f12-\$f19
-\item if either integer or float registers are used up, the stack is used
-\item 64-bit stack arguments are always aligned to 8 bytes
-\item 64-bit integers or double precision floats are passed on two general purpose registers starting at an even register number, skipping one odd register
-\item \$a0-\$a7 and \$f12-\$f19 are not required to be preserved
-\item results are returned in \$v0 (32-bit), \$v0 and \$v1 (64-bit), \$f0 or \$f0 and \$f2 (2 $\times$ 32 bit float e.g. complex)
-\end{itemize}
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                         & \vdots       &                                &                              \\
-\hhline{~=~~}                            
-register save area                       & \hspace{4cm} &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}                            
-local data                               &              &                                &                              \\
-\hhline{~-~~}                            
-\mrlbrace{3}{parameter area}             & \ldots       & \mrrbrace{3}{stack parameters} &                              \\
-                                         & \ldots       &                                &                              \\
-                                         & \ldots       &                                &                              \\
-\hhline{~=~~}
-register save area (with return address) &              &                                & \mrrbrace{5}{current frame}  \\
-\hhline{~-~~}
-local data                               &              &                                &                              \\
-\hhline{~-~~}
-parameter area                           &              &                                &                              \\
-\hhline{~-~~}
-                                         & \vdots       &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on mips32 eabi calling convention}
-\end{figure}
-
-\newpage
-
-\subsubsection{MIPS O32 32-bit Calling Convention}
-
-\paragraph{Register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{lll}
-Name                         & Alias                & Brief description\\
-\hline                                                             
-{\bf \$0}                    & {\bf \$zero}         & hardware zero \\
-{\bf \$1}                    & {\bf \$at}           & assembler temporary \\
-{\bf \$2-\$3}                & {\bf \$v0-\$v1}      & return value (only integer on hard-float targets), scratch \\
-{\bf \$4-\$7}                & {\bf \$a0-\$a3}      & first arguments (only integer on hard-float targets), scratch\\
-{\bf \$8-\$15,\$24}          & {\bf \$t0-\$t7,\$t8} & temporaries, scratch \\
-{\bf \$25}                   & {\bf \$t9}           & temporary, holds address of called function for PIC calls (by convention) \\
-{\bf \$16-\$23}              & {\bf \$s0-\$s7}      & preserved \\
-{\bf \$26,\$27}              & {\bf \$k0,\$k1}      & reserved for kernel \\
-{\bf \$28}                   & {\bf \$gp}           & global pointer, preserved by caller \\
-{\bf \$29}                   & {\bf \$sp}           & stack pointer, preserve \\
-{\bf \$30}                   & {\bf \$fp}           & frame pointer, preserve \\
-{\bf \$31}                   & {\bf \$ra}           & return address, preserve \\
-{\bf hi, lo}                 &                      & multiply/divide special registers \\
-{\bf \$f0-\$f3}              &                      & only on hard-float targets: float return value, scratch \\
-{\bf \$f4-\$f11,\$f16-\$f19} &                      & only on hard-float targets: float temporaries, scratch \\
-{\bf \$f12-\$f15}            &                      & only on hard-float targets: first floating point arguments, scratch \\
-{\bf \$f20-\$f31}            &                      & only on hard-float targets: preserved \\
-\end{tabular*}
-\caption{Register usage on MIPS O32 calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item Stack grows down
-\item Stack parameter order: right-to-left
-\item Caller cleans up the stack
-\item Caller is required to always leave a 16-byte spill area for \$a0-\$a3 at the end of {\bf its} frame, to be used and spilled to by the callee, if needed
-\item The different stack areas (local data, register save area, parameter area) are each aligned to 8 bytes
-\item generally, first four 32bit arguments are passed in registers \$a0-\$a3, respectively (only on hard-float targets: see below for exceptions if first arg is a float)
-\item subsequent parameters are passed vie the stack
-\item 64-bit params passed via registers are passed using either two registers (starting at an even register number, skipping an odd one if necessary), or via the stack using an 8-byte alignment
-\item only on hard-float targets: if the very first call argument is a float, up to 2 floats or doubles can be passed via \$f12 and \$f14, respectively, for first and second argument
-\item only on hard-float targets: if any arguments are passed via float registers, skip \$a0-\$a3 for subsequent arguments as if the values were passed via them
-\item only on hard-float targets: note that if the first argument is not a float, but the second, it'll get passed via the \$a? registers
-\item results are returned in \$v0 and \$v1, with \$v0 for all values \textless\ 64bit (only integer on hard-float targets)
-\item only on hard-float targets: floating point results are returned in \$f0 (32-bit float), or \$f0 and \$f3 (64bit float)
-\item single precision float parameters (32 bit) are right-justified in their 8-byte slot on the stack on big endian targets, as they aren't promoted @@@
-\end{itemize}
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                         & \vdots              &                                &                               \\
-\hhline{~=~~}
-register save area (and padding)         & \hspace{4cm}        &                                &                               \\
-\hhline{~-~~}
-local data (and padding)                 &                     &                                & \mrrbrace{10}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{8}{parameter area}             & padding (if needed) &                                &                               \\
-                                         & \ldots              & \mrrbrace{3}{stack parameters} &                               \\
-                                         & \ldots              &                                &                               \\
-                                         & \ldots              &                                &                               \\
-                                         & \$a3                & \mrrbrace{4}{spill area}       &                               \\
-                                         & \$a2                &                                &                               \\
-                                         & \$a1                &                                &                               \\
-                                         & \$a0                &                                &                               \\
-\hhline{~-~~}
-register save area (with return address) &                     &                                &                               \\
-\hhline{~=~~}
-local data                               &                     &                                & \mrrbrace{5}{current frame}   \\
-\hhline{~-~~}
-parameter area                           &                     &                                &                               \\
-                                         & \vdots              &                                &                               \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on MIPS O32 calling convention}
-\end{figure}
-
-\newpage
-
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, 
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+\subsection{MIPS32 Calling Conventions}
+
+\paragraph{Overview}
+
+Multiple revisions of the MIPS Instruction set exist, namely MIPS I, MIPS II, MIPS III, MIPS IV, MIPS32 and MIPS64.
+Nowadays, MIPS32 and MIPS64 are the main ones used for 32-bit and 64-bit instruction sets, respectively.\\
+Given MIPS processors are often used for embedded devices, several add-on extensions exist for the MIPS family, for example: 
+
+\begin{description}
+\item [MIPS-3D] simple floating-point SIMD instructions dedicated to common 3D tasks.
+\item [MDMX] (MaDMaX) more extensive integer SIMD instruction set using 64 bit floating-point registers.
+\item [MIPS16e] adds compression to the instruction stream to make programs take up less room (allegedly a response to the THUMB instruction set of the ARM architecture).
+\item [MIPS MT] multithreading additions to the system similar to HyperThreading.
+\end{description}
+
+Unfortunately, there is actually no such thing as "The MIPS Calling Convention". Many possible conventions are used
+by many different environments such as \emph{O32}\cite{MIPSo32}, \emph{O64}\cite{MIPSo64}, \emph{N32}\cite{MIPSn32/n64}, \emph{N64}\cite{MIPSn32/n64}, \emph{EABI}\cite{MIPSeabi} and \emph{NUBI}\cite{MIPSnubi}.\\
+
+\paragraph{\product{dyncall} support}
+
+Currently, dyncall supports for MIPS 32-bit architectures the widely-used O32 calling convention (for all four combinations of big/little-endian, and soft/hard-float targets),
+as well as EABI (little-endian/hard-float, which is used on the Homebrew SDK for the Playstation Portable). \product{dyncall} currently does not support MIPS16e
+(contrary to the like-minded ARM-THUMB, which is supported). Both, calls and callbacks are supported.
+
+
+\newpage
+
+
+\subsubsection{MIPS EABI 32-bit Calling Convention}
+
+% This is about hardware floating point targtes, there are also softfloat ones @@@
+
+\paragraph{Register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{lll}
+Name                                   & Alias                & Brief description\\
+\hline
+{\bf \$0}                              & {\bf \$zero}         & hardware zero, scratch \\
+{\bf \$1}                              & {\bf \$at}           & assembler temporary, scratch \\
+{\bf \$2-\$3}                          & {\bf \$v0-\$v1}      & integer results, scratch \\
+{\bf \$4-\$11}                         & {\bf \$a0-\$a7}      & integer arguments, or double precision float arguments, scratch \\
+{\bf \$12-\$15,\$24}                   & {\bf \$t4-\$t7,\$t8} & integer temporaries, scratch \\
+{\bf \$25}                             & {\bf \$t9}           & integer temporary, address of callee for PIC calls (by convention), scratch \\
+{\bf \$16-\$23}                        & {\bf \$s0-\$s7}      & preserve \\
+{\bf \$26,\$27}                        & {\bf \$kt0,\$kt1}    & reserved for kernel \\
+{\bf \$28}                             & {\bf \$gp}           & global pointer, preserve \\
+{\bf \$29}                             & {\bf \$sp}           & stack pointer, preserve \\
+{\bf \$30}                             & {\bf \$s8/\$fp}      & frame pointer (some assemblers name it \$fp), preserve \\
+{\bf \$31}                             & {\bf \$ra}           & return address, preserve \\
+{\bf hi, lo}                           &                      & multiply/divide special registers \\
+{\bf \$f0,\$f2}                        &                      & float results, scratch \\
+{\bf \$f1,\$f3,\$f4-\$f11,\$f20-\$f23} &                      & float temporaries, scratch \\
+{\bf \$f12-\$f19}                      &                      & single precision float arguments, scratch \\
+\end{tabular*}
+\caption{Register usage on MIPS32 EABI calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item Stack grows down
+\item Stack parameter order: right-to-left
+\item Caller cleans up the stack
+\item first 8 integers (\textless=\ 32bit) are passed in registers \$a0-\$a7
+\item first 8 single precision floating point arguments are passed in registers \$f12-\$f19
+\item 64-bit stack arguments are always aligned to 8 bytes
+\item 64-bit integers or double precision floats are passed in two general purpose registers starting at an even register number, skipping one odd register
+\item if either integer or float registers are used up, the stack is used
+\item if the callee takes the address of one of the parameters and uses it to address other unnamed parameters (e.g. varargs) it has to copy - in its prolog - the the argument registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though) % @@@ seems to *ONLY* spill with varargs, never for any other reason
+\item float registers don't seem to ever need to be saved that way, because floats passed to an ellipsis function are promoted to doubles, which in turn are passed in a? register pairs, so only \$a0-\$a7 are need to be spilled
+\item results are returned in \$v0 (32-bit), \$v0 and \$v1 (64-bit), \$f0 or \$f0 and \$f2 (2 $\times$ 32 bit float e.g. complex)
+\end{itemize}
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/mips.eabi.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                         & \vdots                     &                                      &                              \\
+\hhline{~=~~}                                                         
+register save area                       & \hspace{4cm}               &                                      & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}                                                         
+local data                               &                            &                                      &                              \\
+\hhline{~-~~}                                                         
+\mrlbrace{6}{parameter area}             & last arg                   & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots                     &                                      &                              \\
+                                         & first arg passed via stack &                                      &                              \\
+\hhline{~=~~}
+                                         & \$a7                       & \mrrbrace{3}{spill area (if needed)} & \mrrbrace{6}{current frame}  \\
+                                         & \ldots                     &                                      &                              \\
+                                         & \$a? (first unnamed reg)   &                                      &                              \\
+\hhline{~-~~}                                                                               
+register save area (with return address) &                            &                                      &                              \\
+\hhline{~-~~}                                                         
+local data                               &                            &                                      &                              \\
+\hhline{~-~~}                                                         
+parameter area                           & \vdots                     &                                      &                              \\
+\end{tabular}
+\caption{Stack layout on MIPS EABI 32-bit calling convention}
+\end{figure}
+
+
+\newpage
+
+
+\subsubsection{MIPS O32 32-bit Calling Convention}
+
+\paragraph{Register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{lll}
+Name                         & Alias                & Brief description\\
+\hline                                                             
+{\bf \$0}                    & {\bf \$zero}         & hardware zero \\
+{\bf \$1}                    & {\bf \$at}           & assembler temporary \\
+{\bf \$2-\$3}                & {\bf \$v0-\$v1}      & return value (only integer on hard-float targets), scratch \\
+{\bf \$4-\$7}                & {\bf \$a0-\$a3}      & first arguments (only integer on hard-float targets), scratch\\
+{\bf \$8-\$15,\$24}          & {\bf \$t0-\$t7,\$t8} & temporaries, scratch \\
+{\bf \$25}                   & {\bf \$t9}           & temporary, holds address of called function for PIC calls (by convention) \\
+{\bf \$16-\$23}              & {\bf \$s0-\$s7}      & preserved \\
+{\bf \$26,\$27}              & {\bf \$k0,\$k1}      & reserved for kernel \\
+{\bf \$28}                   & {\bf \$gp}           & global pointer, preserved by caller \\
+{\bf \$29}                   & {\bf \$sp}           & stack pointer, preserve \\
+{\bf \$30}                   & {\bf \$s8/\$fp}      & frame pointer (some assemblers name it \$fp), preserve \\
+{\bf \$31}                   & {\bf \$ra}           & return address, preserve \\
+{\bf hi, lo}                 &                      & multiply/divide special registers \\
+{\bf \$f0-\$f3}              &                      & only on hard-float targets: float return value, scratch \\
+{\bf \$f4-\$f11,\$f16-\$f19} &                      & only on hard-float targets: float temporaries, scratch \\
+{\bf \$f12-\$f15}            &                      & only on hard-float targets: first floating point arguments, scratch \\
+{\bf \$f20-\$f31}            &                      & only on hard-float targets: preserved \\
+\end{tabular*}
+\caption{Register usage on MIPS O32 calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item Stack grows down
+\item Stack parameter order: right-to-left
+\item Caller cleans up the stack
+\item Caller is required to always leave a 16-byte spill area for \$a0-\$a3 at the end of {\bf its} frame, to be used and spilled to by the callee, if needed
+\item The different stack areas (local data, register save area, parameter area) are each aligned to 8 bytes
+\item generally, first four 32bit arguments are passed in registers \$a0-\$a3, respectively (only on hard-float targets: see below for exceptions if first arg is a float)
+\item subsequent parameters are passed vie the stack
+\item 64-bit params passed via registers are passed using either two registers (starting at an even register number, skipping an odd one if necessary), or via the stack using an 8-byte alignment
+\item only on hard-float targets: if the very first call argument is a float, up to 2 floats or doubles can be passed via \$f12 and \$f14, respectively, for first and second argument
+\item only on hard-float targets: if any arguments are passed via float registers, skip \$a0-\$a3 for subsequent arguments as if the values were passed via them
+\item only on hard-float targets: note that if the first argument is not a float, but the second, it'll get passed via the \$a? registers
+\item results are returned in \$v0 and \$v1, with \$v0 for all values \textless\ 64bit (only integer on hard-float targets)
+\item only on hard-float targets: floating point results are returned in \$f0 (32-bit float), or \$f0 and \$f3 (64bit float)
+\item single precision float parameters (32 bit) are right-justified in their 8-byte slot on the stack on big endian targets, as they aren't promoted @@@
+\end{itemize}
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/mips.o32.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                         & \vdots                     &                                &                               \\
+\hhline{~=~~}                                                         
+register save area (with return address) & \hspace{4cm}               &                                & \mrrbrace{10}{caller's frame} \\
+\hhline{~-~~}                                                         
+local data (and padding)                 &                            &                                &                               \\
+\hhline{~-~~}                                                         
+\mrlbrace{8}{parameter area}             & padding (if needed)        &                                &                               \\
+                                         & last arg                   & \mrrbrace{3}{stack parameters} &                               \\
+                                         & \ldots                     &                                &                               \\
+                                         & first arg passed via stack &                                &                               \\
+                                         & \$a3                       & \mrrbrace{4}{spill area}       &                               \\
+                                         & \$a2                       &                                &                               \\
+                                         & \$a1                       &                                &                               \\
+                                         & \$a0                       &                                &                               \\
+\hhline{~=~~}                                                         
+register save area                       &                            &                                & \mrrbrace{3}{current frame}   \\
+\hhline{~-~~}                                                         
+local data                               &                            &                                &                               \\
+\hhline{~-~~}                                                         
+parameter area                           & \vdots                     &                                &                               \\
+\end{tabular}
+\caption{Stack layout on MIPS O32 calling convention}
+\end{figure}
+
+\newpage
+
--- a/doc/manual/callconvs/callconv_mips64.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_mips64.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,121 +1,120 @@
-%//////////////////////////////////////////////////////////////////////////////
-%
-% Copyright (c) 2007-2016 Daniel Adler <dadler@uni-goettingen.de>,
-%                         Tassilo Philipp <tphilipp@potion-studios.com>
-%
-% Permission to use, copy, modify, and distribute this software for any
-% purpose with or without fee is hereby granted, provided that the above
-% copyright notice and this permission notice appear in all copies.
-%
-% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-%
-%//////////////////////////////////////////////////////////////////////////////
-
-\subsection{MIPS64 Calling Convention}
-
-\paragraph{Overview}
-
-There are two main ABIs in use for MIPS64 chips, \emph{N64}\cite{MIPSn32/n64} and \emph{N32}\cite{MIPSn32/n64}. Both are
-basically the same, except that N32 uses 32-bit pointers and long integers, instead of 64. All registers of a MIPS64 chip are considered
-to be 64-bit wide, even for the N32 calling convention.\\
-The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn't change from MIPS32).\\
-Other than that there are 64-bit versions of the other ABIs found for MIPS32, e.g. the EABI\cite{MIPSeabi} and O64\cite{MIPSo64}.
-
-\paragraph{\product{dyncall} support}
-
-For MIPS 64-bit machines, dyncall supports the N64 calling conventions for calls and callbacks (for all four combinations of big/little-endian, and soft/hard-float targets).
-The N32 calling convention might work - it used to, but hasn't been tested, recently.
-
-\subsubsection{MIPS N64 Calling Convention}
-
-\paragraph{Register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{lll}
-Name                                   & Alias                & Brief description\\
-\hline
-{\bf \$0}                              & {\bf \$zero}         & hardware zero \\
-{\bf \$1}                              & {\bf \$at}           & assembler temporary \\
-{\bf \$2-\$3}                          & {\bf \$v0-\$v1}      & return value (only integer on hard-float targets) \\
-{\bf \$4-\$11}                         & {\bf \$a0-\$a7}      & first arguments (only integer on hard-float targets) \\
-{\bf \$12-\$15,\$24}                   & {\bf \$t4-\$t7,\$t8} & temporaries, scratch \\
-{\bf \$25}                             & {\bf \$t9}           & temporary, holds the address of the called function for all PIC calls (by convention) \\
-{\bf \$16-\$23}                        & {\bf \$s0-\$s7}      & preserved \\
-{\bf \$26,\$27}                        & {\bf \$kt0,\$kt1}    & reserved for kernel \\
-{\bf \$28}                             & {\bf \$gp}           & global pointer, preserve \\
-{\bf \$29}                             & {\bf \$sp}           & stack pointer, preserve \\
-{\bf \$30}                             & {\bf \$s8}           & frame pointer, preserve \\
-{\bf \$31}                             & {\bf \$ra}           & return address, preserve \\
-{\bf hi, lo}                           &                      & multiply/divide special registers \\
-{\bf \$f0,\$f2}                        &                      & only on hard-float targets: float results \\
-{\bf \$f1,\$f3,\$f4-\$f11,\$f20-\$f23} &                      & only on hard-float targets: float temporaries \\
-{\bf \$f12-\$f19}                      &                      & only on hard-float targets: float arguments \\
-{\bf \$f24-\$f31}                      &                      & only on hard-float targets: preserved \\%@@@on N32, this changes
-\end{tabular*}
-\caption{Register usage on MIPS N64 calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item Stack grows down
-\item Stack parameter order: right-to-left
-\item Caller cleans up the stack
-\item generally, first 8 params \textgreater=\ 64-bit are passed via registers
-\item for hard-float targets: register arguments are passed via \$a0-\$a7 for integers and \$f12-\$f19 for floats - with mixed float and int parameters, some registers are left out (e.g. first parameter ends up in \$a0 or \$f12, second in \$a1 or \$f13, etc.)
-\item for soft-float targets: register arguments are passed via \$a0-\$a7
-\item subsequent arguments are pushed onto the stack
-\item all stack entries are 64-bit aligned
-\item all stack regions are 16-byte aligned
-\item results are returned in \$v0, and for a second one \$v1 is used
-\item only on hard-float targets: floating point results are returned in \$f0
-\item float arguments passed in the variable part of a vararg call are passed like integers
-\item quad precision float arguments are passed in even-odd register pairs, skipping one register if needed
-\item integer parameters \textless\ 64 bit are right-justified (meaning occupy higher-address bytes) in their 8-byte slot on the stack, requiring extra-care for big-endian targets
-\item single precision float parameters (32 bit) are left-justified in their 8-byte slot on the stack, but are right justified in fp-registers on big endian targets, as they aren't promoted @@@doc says "undecided", but openbsd/octeon(mipseb) has it as described here
-\end{itemize}
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-@@@ might be wrong
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                & \vdots       &                                &                              \\
-\hhline{~=~~}                            
-register save area              & \hspace{4cm} &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}                            
-local data                      &              &                                &                              \\
-\hhline{~-~~}                            
-\mrlbrace{3}{parameter area}    & \ldots       & \mrrbrace{3}{stack parameters} &                              \\
-                                & \ldots       &                                &                              \\
-                                & \ldots       &                                &                              \\
-\hhline{~=~~}
-register save area              & padding      &                                & \mrrbrace{7}{current frame}  \\
-                                & \$ra         &                                &                              \\
-                                & \$s8         &                                &                              \\
-                                & \$gp         &                                &                              \\
-\hhline{~-~~}
-local data                      &              &                                &                              \\
-\hhline{~-~~}
-parameter area                  &              &                                &                              \\
-\hhline{~-~~}
-                                & \vdots       &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on mips64 n64 calling convention}
-\end{figure}
-
-
-\subsubsection{MIPS N32 Calling Convention}
-
-@@@
-
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>,
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+\subsection{MIPS64 Calling Conventions}
+
+\paragraph{Overview}
+
+There are two main ABIs in use for MIPS64 chips, \emph{N64}\cite{MIPSn32/n64} and \emph{N32}\cite{MIPSn32/n64}. Both are
+basically the same, except that N32 uses 32-bit pointers and long integers, instead of 64. All registers of a MIPS64 chip are considered
+to be 64-bit wide, even for the N32 calling convention.\\
+The word size is defined to be 32 bits, a dword 64 bits. Note that this is due to historical reasons (terminology didn't change from MIPS32).\\
+Other than that there are correspoding 64-bit versions other MIPS32 ABIs, e.g. the EABI\cite{MIPSeabi} and O64\cite{MIPSo64}.
+
+\paragraph{\product{dyncall} support}
+
+For MIPS 64-bit machines, dyncall supports the N64 calling conventions for calls and callbacks (for all four combinations of big/little-endian, and soft/hard-float targets).
+The N32 calling convention might work - it used to, but hasn't been tested, recently.
+
+\subsubsection{MIPS N64 Calling Convention}
+
+\paragraph{Register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{lll}
+Name                                   & Alias                & Brief description\\
+\hline
+{\bf \$0}                              & {\bf \$zero}         & hardware zero \\
+{\bf \$1}                              & {\bf \$at}           & assembler temporary, scratch \\
+{\bf \$2-\$3}                          & {\bf \$v0-\$v1}      & return value (only integer on hard-float targets), scratch \\
+{\bf \$4-\$11}                         & {\bf \$a0-\$a7}      & first arguments (only integer on hard-float targets), scratch \\
+{\bf \$12-\$15,\$24}                   & {\bf \$t4-\$t7,\$t8} & temporaries, scratch \\
+{\bf \$25}                             & {\bf \$t9}           & temporary, address callee for all PIC calls (by convention), scratch \\
+{\bf \$16-\$23}                        & {\bf \$s0-\$s7}      & preserve \\
+{\bf \$26,\$27}                        & {\bf \$kt0,\$kt1}    & reserved for kernel \\
+{\bf \$28}                             & {\bf \$gp}           & global pointer, preserve \\
+{\bf \$29}                             & {\bf \$sp}           & stack pointer, preserve \\
+{\bf \$30}                             & {\bf \$s8}           & frame pointer, preserve \\
+{\bf \$31}                             & {\bf \$ra}           & return address, preserve \\
+{\bf hi, lo}                           &                      & multiply/divide special registers \\
+{\bf \$f0,\$f2}                        &                      & only on hard-float targets: float results, scratch \\
+{\bf \$f1,\$f3,\$f4-\$f11,\$f20-\$f23} &                      & only on hard-float targets: float temporaries, scratch \\
+{\bf \$f12-\$f19}                      &                      & only on hard-float targets: float arguments, scratch \\
+{\bf \$f24-\$f31}                      &                      & only on hard-float targets: preserved \\%@@@on N32, this changes
+\end{tabular*}
+\caption{Register usage on MIPS N64 calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item Stack grows down
+\item Stack parameter order: right-to-left
+\item Caller cleans up the stack
+\item generally, first 8 params \textgreater=\ 64-bit are passed via registers
+\item for hard-float targets: register arguments are passed via \$a0-\$a7 for integers and \$f12-\$f19 for floats - with mixed float and int parameters, some registers are left out (e.g. first parameter ends up in \$a0 or \$f12, second in \$a1 or \$f13, etc.)
+\item for soft-float targets: register arguments are passed via \$a0-\$a7
+\item subsequent arguments are pushed onto the stack
+\item all stack entries are 64-bit aligned
+\item all stack regions are 16-byte aligned
+\item results are returned in \$v0, and for a second one \$v1 is used
+\item only on hard-float targets: floating point results are returned in \$f0
+\item if the callee takes the address of one of the parameters and uses it to address other unnamed parameters (e.g. varargs) it has to copy - in its prolog - the the argument registers to a reserved stack area adjacent to the other parameters on the stack (only the unnamed integer parameters require saving, though) % @@@ seems to *ONLY* spill with varargs, never for any other reason
+\item float arguments passed in the variable part of a vararg call are passed like integers, meaning float registers don't ever need to be saved that way, so only \$a0-\$a7 are need to be spilled
+\item quad precision float arguments are passed in even-odd register pairs, skipping one register if needed
+\item integer parameters \textless\ 64 bit are right-justified (meaning occupy higher-address bytes) in their 8-byte slot on the stack, requiring extra-care for big-endian targets
+\item single precision float parameters (32 bit) are left-justified in their 8-byte slot on the stack, but are right justified in fp-registers on big endian targets, as they aren't promoted @@@doc says "undecided", but openbsd/octeon(mipseb) has it as described here
+\end{itemize}
+% maybe note somewhere that "prolog-based" spilling is neat for dyncall, as we don't have to care
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/mips64.n64.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                         & \vdots                   &                                      &                              \\
+\hhline{~=~~}                            
+register save area                       & \hspace{4cm}             &                                      & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}                            
+local data                               &                          &                                      &                              \\
+\hhline{~-~~}                            
+\mrlbrace{6}{parameter area}             & arg n-1                  & \mrrbrace{3}{stack parameters}       &                              \\
+                                         & \ldots                   &                                      &                              \\
+                                         & arg 8                    &                                      &                              \\
+\hhline{~=~~}
+                                         & \$a7                     & \mrrbrace{3}{spill area (if needed)} & \mrrbrace{6}{current frame}  \\
+                                         & \ldots                   &                                      &                              \\
+                                         & \$a? (first unnamed reg) &                                      &                              \\
+\hhline{~-~~}                                                                             
+register save area (with return address) &                          &                                      &                              \\
+\hhline{~-~~}
+local data                               &                          &                                      &                              \\
+\hhline{~-~~}
+parameter area                           & \vdots                   &                                      &                              \\
+\end{tabular}
+\caption{Stack layout on MIPS N64 calling convention}
+\end{figure}
+
+
+\subsubsection{MIPS N32 Calling Convention}
+
+@@@
+
--- a/doc/manual/callconvs/callconv_ppc32.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_ppc32.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,6 +1,6 @@
 %//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, 
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -20,7 +20,7 @@
 % ==================================================
 % PowerPC 32
 % ==================================================
-\subsection{PowerPC (32bit) Calling Convention}
+\subsection{PowerPC (32bit) Calling Conventions}
 
 \paragraph{Overview}
 
@@ -35,7 +35,7 @@
 
 \paragraph{\product{dyncall} support}
 
-\product{Dyncall} and \product{dyncallback} are supported for PowerPC (32bit) Big Endian (MSB) on Darwin (tested on Apple Mac OS X) and Linux, however, fail for *BSD.
+\product{Dyncall} and \product{dyncallback} are supported for PowerPC (32bit) Big Endian (MSB), for Darwin's and System V's calling convention.
 
 
 \subsubsection{Mac OS X/Darwin}
@@ -49,24 +49,22 @@
 {\bf gpr0}          & scratch\\
 {\bf gpr1}          & stack pointer\\
 {\bf gpr2}          & scratch\\
-{\bf gpr3,gpr4}     & return value, parameter 0 and 1 for integer or pointer\\
-{\bf gpr5-gpr10}    & parameter 2-7 for integer or pointer parameters\\
-{\bf gpr11}         & permanent\\
+{\bf gpr3,gpr4}     & return value, parameter 0 and 1 for integer or pointer, scratch\\
+{\bf gpr5-gpr10}    & parameter 2-7 for integer or pointer parameters, scratch\\
+{\bf gpr11}         & preserve\\
 {\bf gpr12}         & branch target for dynamic code generation\\
-{\bf gpr13-31}      & permanent\\
+{\bf gpr13-31}      & preserve\\
 {\bf fpr0}          & scratch\\
 {\bf fpr1}          & floating point return value, floating point parameter 0 (always double precision)\\
 {\bf fpr2-fpr13}    & floating point parameters 1-12 (always double precision)\\
-{\bf fpr14-fpr31}   & permanent\\
+{\bf fpr14-fpr31}   & preserve\\
 {\bf v0-v1}         & scratch\\
 {\bf v2-v13}        & vector parameters\\
 {\bf v14-v19}       & scratch\\
-{\bf v20-v31}       & permanent\\
-{\bf lr}            & scratch, link-register\\
-{\bf ctr}           & scratch, count-register\\
-{\bf cr0-cr1}       & scratch\\
-{\bf cr2-cr4}       & permanent\\
-{\bf cr5-cr7}       & scratch\\
+{\bf v20-v31}       & preserve\\
+{\bf lr}            & link-register, scratch\\
+{\bf ctr}           & count-register, scratch\\
+{\bf cr0-cr7}       & conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch)\\
 \end{tabular*}
 \caption{Register usage on Darwin PowerPC 32-Bit}
 \end{table}
@@ -74,10 +72,12 @@
 \paragraph{Parameter passing}
 
 \begin{itemize}
+\item stack grows down
 \item stack parameter order: right-to-left
 \item caller cleans up the stack
 \item the first 8 integer parameters are passed in registers gpr3-gpr10
 \item the first 12 floating point parameters are passed in registers fpr1-fpr13
+\item 64 bit arguments are passed as if they were two 32 bit arguments, without skipping registers for alignment (this means passing half via a register and half via the stack is allowed)
 \item if a float parameter is passed via a register, gpr registers are skipped for subsequent integer parameters (based on the size of
 the float - 1 register for single precision and 2 for double precision floating point values)
 \item the caller pushes subsequent parameters onto the stack
@@ -104,51 +104,53 @@
 \item for types \textgreater\ 64 bits, a secret first parameter with an address to the return value is passed
 \end{itemize}
 
-\pagebreak
 
 \paragraph{Stack layout}
 
-Stack frame is always 16-byte aligned. Stack directly after function prolog:\\
+% verified/amended: TP nov 2019 (see also doc/disas_examples/ppc.darwin.disas)
+Stack frame is always 16-byte aligned. Prolog opens frame with additional, fixed space for a linkage area, to hold a number of values (not all of them are required to be saved, though). Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
+                                  & \vdots                        &                                      &                               \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}                  &                                      & \mrrbrace{14}{caller's frame} \\
 \hhline{~-~~}
-                                  & \vdots              &                                      &                               \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}        &                                      & \mrrbrace{13}{caller's frame} \\
+local data                        &                               &                                      &                               \\
 \hhline{~-~~}
-\mrlbrace{6}{parameter area}      & \ldots              & \mrrbrace{3}{stack parameters}       &                               \\
-                                  & \ldots              &                                      &                               \\
-                                  & \ldots              &                                      &                               \\
-                                  & \ldots              & \mrrbrace{3}{spill area (as needed)} &                               \\
-                                  & \ldots              &                                      &                               \\
-                                  & gpr3 or fpr1        &                                      &                               \\
+\mrlbrace{6}{parameter area}      & last arg                      & \mrrbrace{3}{stack parameters}       &                               \\
+                                  & \ldots                        &                                      &                               \\
+                                  & 9th word of arg data          &                                      &                               \\
+                                  & gpr10                         & \mrrbrace{3}{spill area (as needed)} &                               \\
+                                  & \ldots                        &                                      &                               \\
+                                  & gpr3                          &                                      &                               \\
 \hhline{~-~~}
-\mrlbrace{6}{linkage area}        & reserved            &                                      &                               \\
-                                  & reserved            &                                      &                               \\
-                                  & reserved            &                                      &                               \\
-                                  & return address      &                                      &                               \\
-                                  & reserved for callee &                                      &                               \\
-                                  & saved by callee     &                                      &                               \\
+\mrlbrace{6}{linkage area}        & reserved                      &                                      &                               \\
+                                  & reserved                      &                                      &                               \\
+                                  & reserved                      &                                      &                               \\
+                                  & return address (callee saved) &                                      &                               \\
+                                  & condition reg (callee saved)  &                                      &                               \\
+                                  & parent stack frame pointer    &                                      &                               \\
 \hhline{~=~~}
-local data                        &                     &                                      & \mrrbrace{3}{current frame}   \\
+register save area                &                               &                                      & \mrrbrace{4}{current frame}   \\
+\hhline{~-~~}
+local data                        &                               &                                      &                               \\
 \hhline{~-~~}
-parameter area                    &                     &                                      &                               \\
+parameter area                    &                               &                                      &                               \\
 \hhline{~-~~}
-linkage area                      & \vdots              &                                      &                               \\
-\hhline{~-~~}
+linkage area                      & \vdots                        &                                      &                               \\
 \end{tabular}
 \caption{Stack layout on ppc32 Darwin}
 \end{figure}
 
+
+\newpage
+
+
 \subsubsection{System V PPC 32-bit}
 
 \paragraph{Status}
 
-\begin{itemize}
-\item C++ this calls do not work.
-\end{itemize}
-
 \paragraph{Registers and register usage}
 
 \begin{table}[h]
@@ -156,24 +158,24 @@
 Name              & Brief description\\
 \hline
 {\bf r0}          & scratch\\
-{\bf r1}          & stack pointer\\
+{\bf r1}          & stack pointer, preserve\\
 {\bf r2}          & system-reserved\\
-{\bf r3-r4}       & parameter passing and return value\\
-{\bf r5-r10}      & parameter passing\\
+{\bf r3-r4}       & parameter passing and return value, scratch\\
+{\bf r5-r10}      & parameter passing, scratch\\
 {\bf r11-r12}     & scratch\\
-{\bf r13}         & Small data area pointer register\\
-{\bf r14-r30}     & Local variables\\
-{\bf r31}         & Used for local variables or \emph{environment pointer}\\
+{\bf r13}         & small data area pointer register\\
+{\bf r14-r30}     & local variables, preserve\\
+{\bf r31}         & used for local variables or \emph{environment pointer}, preserve\\
 {\bf f0}          & scratch\\
-{\bf f1}          & parameter passing and return value\\
-{\bf f2-f8}       & parameter passing\\
+{\bf f1}          & parameter passing and return value, scratch\\
+{\bf f2-f8}       & parameter passing, scratch\\
 {\bf f9-13}       & scratch\\
-{\bf f14-f31}     & Local variables\\
-{\bf cr0-cr7}     & Conditional register fields, each 4-bit wide (cr0-cr1 and   cr5-cr7 are scratch)\\
-{\bf lr}          & Link register (scratch)\\
-{\bf ctr}         & Count register (scratch) \\
-{\bf xer}         & Fixed-point exception register (scratch)\\
-{\bf fpscr}       & Floating-point Status and Control Register\\
+{\bf f14-f31}     & local variables, preserve\\
+{\bf cr0-cr7}     & conditional register fields, each 4-bit wide (cr0-cr1 and cr5-cr7 are scratch)\\
+{\bf lr}          & link register, scratch\\
+{\bf ctr}         & count register, scratch \\
+{\bf xer}         & fixed-point exception register, scratch\\
+{\bf fpscr}       & floating-point Status and Control Register\\
 % {\bf v0-v1}         & scratch\\
 % {\bf v2-v13}        & vector parameters\\
 % {\bf v14-v19}       & scratch\\
@@ -195,10 +197,11 @@
 \item 8 floating-pointer registers (f1-f8) for float (promoted to double) and double types.
 \item Additional arguments are passed on the stack directly after the back-chain and saved return address (8 bytes structure) on the callers stack frame.
 \item 64-bit integer data types are passed in general-purpose registers as a whole in two
- 32-bit general purpose registers (an odd and an even e.g. r3 and r4), probably skipping an even integer register.
- or passed on the stack. They are never splitted into a register and stack part.
+ 32-bit general purpose registers (an odd and an even e.g. r3 and r4), skipping an even integer register
+ or passed on the stack; they are never splitted into a register and stack part
 \item Ellipse calls set CR bit 6 
 \item integer parameters \textless\ 32 bit are right-justified (meaning occupy high-order bytes) in their 4-byte area, requiring extra-care for big-endian targets
+\item no spill area is used on stack, iterating over varargs requires a specific va\_list implementation
 \end{itemize}
 
 \paragraph{Return values}
@@ -208,36 +211,36 @@
 \item floating-point values are returned using register f1.
 \end{itemize}
 
-\pagebreak
 
 \paragraph{Stack layout}
 
+% verified/amended: TP nov 2019 (see also doc/disas_examples/ppc.sysv.disas)
 Stack frame is always 16-byte aligned. Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots                     &                                &                              \\
+                                  & \vdots                        &                                &                              \\
 \hhline{~=~~}
-local data                        & \hspace{4cm}               &                                & \mrrbrace{6}{caller's frame} \\
+register save area                & \hspace{4cm}                  &                                & \mrrbrace{7}{caller's frame} \\
 \hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots                     & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots                     &                                &                              \\
-                                  & \ldots                     &                                &                              \\
+local data                        &                               &                                &                              \\
 \hhline{~-~~}
-                                  & saved return address (for callee) &                                &                              \\
+\mrlbrace{3}{parameter area}      & last arg                      & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                        &                                &                              \\
+                                  & first arg passed via stack    &                                &                              \\
 \hhline{~-~~}
-                                  & parent stack frame pointer &                                &                              \\
+                                  & return address (callee saved) &                                &                              \\
+\hhline{~-~~}
+                                  & parent stack frame pointer    &                                &                              \\
 \hhline{~=~~}
-local data                        &                            &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                            &                                &                              \\
+register save area                &                               &                                & \mrrbrace{4}{current frame}  \\
 \hhline{~-~~}
-                                  & \vdots                     &                                &                              \\
+local data                        &                               &                                &                              \\
 \hhline{~-~~}
+parameter area                    &                               &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots                        &                                &                              \\
 \end{tabular}
-\\
-\\
-\\
 \caption{Stack layout on System V ABI for PowerPC 32-bit calling convention}
 \end{figure}
+
--- a/doc/manual/callconvs/callconv_ppc64.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_ppc64.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,6 +1,6 @@
 %//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, 
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -20,7 +20,7 @@
 % ==================================================
 % PowerPC 64
 % ==================================================
-\subsection{PowerPC (64bit) Calling Convention}
+\subsection{PowerPC (64bit) Calling Conventions}
 
 \paragraph{Overview}
 
--- a/doc/manual/callconvs/callconv_sparc32.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_sparc32.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,107 +1,111 @@
-%//////////////////////////////////////////////////////////////////////////////
-%
-% Copyright (c) 2012-2017 Daniel Adler <dadler@uni-goettingen.de>,
-%                         Tassilo Philipp <tphilipp@potion-studios.com>
-%
-% Permission to use, copy, modify, and distribute this software for any
-% purpose with or without fee is hereby granted, provided that the above
-% copyright notice and this permission notice appear in all copies.
-%
-% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-%
-%//////////////////////////////////////////////////////////////////////////////
-
-\subsection{SPARC Calling Convention}
-
-\paragraph{Overview}
-
-The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically tree revisions,
-V7, V8\cite{SPARCV8}\cite{SPARCSysV} and V9\cite{SPARCV9}\cite{SPARCV9SysV}. The former two are 32-bit whereas the latter refers to the 64-bit SPARC architecture (see next chapter).
-SPARC uses big endian byte order.\\
-
-\paragraph{\product{dyncall} support}
-
-\product{dyncall} fully supports the SPARC 32-bit instruction set (V7 and V8), for calls and callbacks.
-
-\subsubsection{SPARC (32-bit) Calling Convention}
-
-\paragraph{Register usage}
-
-\begin{itemize}
-\item 32 single floating point registers (f0-f31, usable as 8 quad precision q0,q4,q8,...,q28, 16 double precision d0,d2,d4,...,d30)
-\item 32 32-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
-\item calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
-\end{itemize}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{lll}
-Name                          & Alias                          & Brief description\\
-\hline
-{\bf \%g0}                    & \%r0                           & Read-only, hardwired to 0 \\
-{\bf \%g1-\%g7}               & \%r1-\%r7                      & Global \\
-{\bf \%o0,\%o1 and \%i0,\%i1} & \%r8,\%r9 and \%r24,\%r25      & Output and input argument registers, return value \\
-{\bf \%o2-\%o5 and \%i2-\%i5} & \%r10-\%r13 and \%r26-\%r29    & Output and input argument registers \\
-{\bf \%o6 and \%i6}           & \%r14 and \%r30, \%sp and \%fp & Stack and frame pointer \\
-{\bf \%o7 and \%i7}           & \%r15 and \%r31                & Return address (caller writes to o7, callee uses i7) \\
-{\bf \%l0-\%l7}               & \%r16-\%r23                    & preserve \\
-{\bf \%f0,\%f1}               &                                & Floating point return value \\
-{\bf \%f2-\%f31}              &                                & scratch \\
-\end{tabular*}
-\caption{Register usage on sparc calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-\begin{itemize}
-\item stack grows down
-\item stack parameter order: right-to-left
-\item caller cleans up the stack
-\item stack always aligned to 8 bytes
-\item first 6 integers and floats are passed independently in registers using \%o0-\%o5
-\item for every other argument the stack is used
-\item all arguments \textless=\ 32 bit are passed as 32 bit values
-\item 64 bit arguments are passed like two consecutive \textless=\ 32 bit values
-\item minimum stack size is 64 bytes, b/c stack pointer must always point at enough space to store all \%i* and \%l* registers, used when running out of register windows
-\item if needed, register spill area is adjacent to parameters
-\item results are expected by caller to be returned in \%o0/\%o1 (after reg window restore, meaning callee writes to \%i0/\%i1) for integers, \%f0/\%f1 for floats, and for structs/unions a pointer to them is used as a hidden stack parameter (see below)
-\end{itemize}
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                   & \vdots                      &                                &                               \\
-\hhline{~=~~}
-local data (and padding)           & \hspace{4cm}                &                                & \mrrbrace{9}{caller's frame}  \\
-\hhline{~-~~}
-\mrlbrace{7}{parameter area}       & argument x                  & \mrrbrace{3}{stack parameters} &                               \\
-                                   & \ldots                      &                                &                               \\
-                                   & argument 6                  &                                &                               \\
-                                   & input argument 5 spill      & \mrrbrace{3}{spill area}       &                               \\
-                                   & \ldots                      &                                &                               \\
-                                   & input argument 0 spill      &                                &                               \\
-                                   & struct/union return pointer &                                &                               \\
-\hhline{~-~~}
-register save area (\%i* and \%l*) &                             &                                &                               \\
-\hhline{~=~~}
-local data (and padding)           &                             &                                & \mrrbrace{3}{current frame}   \\
-\hhline{~-~~}
-parameter area                     &                             &                                &                               \\
-\hhline{~-~~}
-                                   & \vdots                      &                                &                               \\
-\hhline{~-~~}
-\end{tabular}
-\\
-\\
-\\
-\caption{Stack layout on sparc32 calling convention}
-\end{figure}
-
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2012-2019 Daniel Adler <dadler@uni-goettingen.de>,
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+\subsection{SPARC Calling Conventions}
+
+\paragraph{Overview}
+
+The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically three revisions,
+V7, V8\cite{SPARCV8}\cite{SPARCSysV} and V9\cite{SPARCV9}\cite{SPARCV9SysV}. The former two are 32-bit whereas the latter refers to the 64-bit SPARC architecture (see next chapter).
+SPARC uses big endian byte order.\\
+The word size is defined to be 32 bits.
+
+\paragraph{\product{dyncall} support}
+
+\product{dyncall} fully supports the SPARC 32-bit instruction set (V7 and V8), for calls and callbacks.
+
+\subsubsection{SPARC (32-bit) Calling Convention}
+
+\paragraph{Register usage}
+
+\begin{itemize}
+\item 32 single floating point registers (f0-f31, usable as 8 quad precision q0,q4,q8,...,q28, 16 double precision d0,d2,d4,...,d30)
+\item 32 32-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
+\item calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
+\end{itemize}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{lll}
+Name                          & Alias                          & Brief description\\
+\hline
+{\bf \%g0}                    & \%r0                           & Read-only, hardwired to 0 \\
+{\bf \%g1-\%g7}               & \%r1-\%r7                      & Global \\
+{\bf \%o0,\%o1 and \%i0,\%i1} & \%r8,\%r9 and \%r24,\%r25      & Output and input argument registers, return value \\
+{\bf \%o2-\%o5 and \%i2-\%i5} & \%r10-\%r13 and \%r26-\%r29    & Output and input argument registers \\
+{\bf \%o6 and \%i6}           & \%r14 and \%r30, \%sp and \%fp & Stack and frame pointer \\
+{\bf \%o7 and \%i7}           & \%r15 and \%r31                & Return address (caller writes to o7, callee uses i7) \\
+{\bf \%l0-\%l7}               & \%r16-\%r23                    & preserve \\
+{\bf \%f0,\%f1}               &                                & Floating point return value \\
+{\bf \%f2-\%f31}              &                                & scratch \\
+\end{tabular*}
+\caption{Register usage on sparc calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+\begin{itemize}
+\item stack grows down
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item stack always aligned to 8 bytes
+\item first 6 integers and floats are passed independently in registers using \%o0-\%o5
+\item for every other argument the stack is used
+\item all arguments \textless=\ 32 bit are passed as 32 bit values
+\item 64 bit arguments are passed like two consecutive \textless=\ 32 bit values (which allows for an argument to be split between the stack and \%i5)
+\item minimum stack size is 64 bytes, b/c stack pointer must always point at enough space to store all \%i* and \%l* registers, used when running out of register windows
+\item if needed, register spill area is adjacent to parameters
+\end{itemize}
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item results are expected by caller to be returned in \%o0/\%o1 (after reg window restore, meaning callee writes to \%i0/\%i1) for integers
+\item \%f0/\%f1 are used for floating point values
+\item structs/unions are returned in a space allocated by the caller, with a pointer to it passed as a {\bf additional}, hidden stack parameter (see below)
+\end{itemize}
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/sparc.sparc.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                   & \vdots                      &                                &                               \\
+\hhline{~=~~}
+local data (and padding)           & \hspace{4cm}                &                                & \mrrbrace{9}{caller's frame}  \\
+\hhline{~-~~}
+\mrlbrace{7}{parameter area}       & arg n-1                     & \mrrbrace{3}{stack parameters} &                               \\
+                                   & \ldots                      &                                &                               \\
+                                   & 7th word of arg data        &                                &                               \\
+                                   & \%5                         & \mrrbrace{3}{spill area}       &                               \\
+                                   & \ldots                      &                                &                               \\
+                                   & \%0                         &                                &                               \\
+                                   & struct/union return pointer &                                &                               \\
+\hhline{~-~~}
+register save area (\%i* and \%l*) &                             &                                &                               \\
+\hhline{~=~~}
+local data (and padding)           &                             &                                & \mrrbrace{3}{current frame}   \\
+\hhline{~-~~}
+parameter area                     &                             &                                &                               \\
+\hhline{~-~~}
+                                   & \vdots                      &                                &                               \\
+\end{tabular}
+\caption{Stack layout on sparc32 calling convention}
+\end{figure}
+
--- a/doc/manual/callconvs/callconv_sparc64.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_sparc64.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,112 +1,115 @@
-%//////////////////////////////////////////////////////////////////////////////
-%
-% Copyright (c) 2012-2017 Daniel Adler <dadler@uni-goettingen.de>,
-%                         Tassilo Philipp <tphilipp@potion-studios.com>
-%
-% Permission to use, copy, modify, and distribute this software for any
-% purpose with or without fee is hereby granted, provided that the above
-% copyright notice and this permission notice appear in all copies.
-%
-% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-%
-%//////////////////////////////////////////////////////////////////////////////
-
-\subsection{SPARC64 Calling Convention}
-
-\paragraph{Overview}
-
-The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically tree revisions,
-V7, V8\cite{SPARCV8}\cite{SPARCSysV} and V9\cite{SPARCV9}\cite{SPARCV9SysV}. The former two are 32-bit (see previous chapter) whereas the latter refers to the 64-bit SPARC architecture.
-SPARC uses big endian byte order, however, V9 supports also little endian byte order, but for data access only, not instruction access.\\
-\\
-There are two proposals, one from Sun and one from Hal, which disagree on how to handle some aspects of this calling convention.\\
-
-\paragraph{\product{dyncall} support}
-
-\product{dyncall} fully supports the SPARC 64-bit instruction set (V9), for calls and callbacks.
-
-\subsubsection{SPARC (64-bit) Calling Convention}
-
-\begin{itemize}
-\item 32 double precision floating point registers (d0,d2,d4,...,d62, usable as 16 quad precision ones q0,q4,q8,...g60, and also first half of them are usable as 32 single precision registers f0-f31)
-\item 32 64-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
-\item calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
-\item stack and frame pointer are offset by a BIAS of 2047 (see official doc for reasons)
-\end{itemize}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{lll}
-Name                          & Alias                          & Brief description\\
-\hline
-{\bf \%g0}                    & \%r0                           & Read-only, hardwired to 0 \\
-{\bf \%g1-\%g7}               & \%r1-\%r7                      & Global \\
-{\bf \%o0-\%o3 and \%i0-\%i3} & \%r8-\%r11 and \%r24-\%r27     & Output and input argument registers, return value \\
-{\bf \%o4,\%o5 and \%i4,\%i5} & \%r12,\%r13 and \%r28,\%r29    & Output and input argument registers \\
-{\bf \%o6 and \%i6}           & \%r14 and \%r30, \%sp and \%fp & Stack and frame pointer (NOTE, value is pointing to stack/frame minus a BIAS of 2047) \\
-{\bf \%o7 and \%i7}           & \%r15 and \%r31                & Return address (caller writes to o7, callee uses i7) \\
-{\bf \%l0-\%l7}               & \%r16-\%r23                    & preserve \\
-{\bf \%d0,\%d2,\%d4,\%d6}     &                                & Floating point arguments, return value \\
-{\bf \%d8,\%d10,...,\%d30}    &                                & Floating point arguments \\
-{\bf \%d32,\%d36,...,\%d62}   &                                & scratch (but, according do Hal, \%d16,...,\%d46 are preserved) \\
-\end{tabular*}
-\caption{Register usage on sparc64 calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-\begin{itemize}
-\item stack grows down
-\item stack parameter order: right-to-left
-\item caller cleans up the stack
-\item stack frame is always aligned to 16 bytes
-\item first 6 integers are passed in registers using \%o0-\%o5
-\item first 8 quad precision floating point args (or 16 double precision, or 32 single precision) are passed in floating point registers (\%q0,\%q4,...,\%q28 or \%d0,\%d2,...,\%d30 or \%f0-\%f32, respectively)
-\item for every other argument the stack is used
-\item single precision floating point args are passed in odd \%f* registers, and are "right aligned" in their 8-byte space on the stack
-\item for every argument passed, corresponding \%o*, \%f* register or stack space is skipped (e.g. passing a doube as 3rd call argument, \%d4 is used and \%o2 is skipped)
-\item all arguments \textless=\ 64 bit are passed as 64 bit values
-\item minimum stack size is 128 bytes, b/c stack pointer must always point at enough space to store all \%i* and \%l* registers, used when running out of register windows
-\item if needed, register spill area (for integer arguments passed via \%o0-\%o5) is adjacent to parameters
-\item results are expected by caller to be returned in \%o0-\%o3 (after reg window restore, meaning callee writes to \%i0-\%i3) for integers, \%d0,\%d2,\%d4,\%d6 for floats
-\item structs/unions up to 32b, the fields are returned via the respective registers mentioned in the previous bullet point
-\item for structs/unions \textgreater= 32b, the caller allocates the space and a pointer to it is passed as hidden first parameter to the function called (meaning in \%o0)
-\end{itemize}
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                   & \vdots                      &                                &                               \\
-\hhline{~=~~}
-local data (and padding)           & \hspace{4cm}                &                                & \mrrbrace{8}{caller's frame}  \\
-\hhline{~-~~}
-\mrlbrace{6}{parameter area}       & argument x                  & \mrrbrace{3}{stack parameters} &                               \\
-                                   & \ldots                      &                                &                               \\
-                                   & argument 6                  &                                &                               \\
-                                   & input argument 5 spill      & \mrrbrace{3}{spill area}       &                               \\
-                                   & \ldots                      &                                &                               \\
-                                   & input argument 0 spill      &                                &                               \\
-\hhline{~-~~}
-register save area (\%i* and \%l*) &                             &                                &                               \\
-\hhline{~=~~}
-local data (and padding)           &                             &                                & \mrrbrace{3}{current frame}   \\
-\hhline{~-~~}
-parameter area                     &                             &                                &                               \\
-\hhline{~-~~}
-                                   & \vdots                      &                                &                               \\
-\hhline{~-~~}
-\end{tabular}
-\\
-\\
-\\
-\caption{Stack layout on sparc64 calling convention}
-\end{figure}
-
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2012-2019 Daniel Adler <dadler@uni-goettingen.de>,
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+\subsection{SPARC64 Calling Conventions}
+
+\paragraph{Overview}
+
+The SPARC family of processors is based on the SPARC instruction set architecture, which comes in basically three revisions,
+V7, V8\cite{SPARCV8}\cite{SPARCSysV} and V9\cite{SPARCV9}\cite{SPARCV9SysV}. The former two are 32-bit (see previous chapter) whereas the latter refers to the 64-bit SPARC architecture.
+SPARC uses big endian byte order, however, V9 supports also little endian byte order, but for data access only, not instruction access.\\
+\\
+There are two proposals, one from Sun and one from Hal, which disagree on how to handle some aspects of this calling convention.\\
+
+\paragraph{\product{dyncall} support}
+
+\product{dyncall} fully supports the SPARC 64-bit instruction set (V9), for calls and callbacks.
+
+\subsubsection{SPARC (64-bit) Calling Convention}
+
+\begin{itemize}
+\item 32 double precision floating point registers (d0,d2,d4,...,d62, usable as 16 quad precision ones q0,q4,q8,...g60, and also first half of them are usable as 32 single precision registers f0-f31)
+\item 32 64-bit integer/pointer registers out of a bigger (vendor/model dependent) number that are accessible at a time (8 are global ones (g*), whereas the remaining 24 form a register window with 8 input (i*), 8 output (o*) and 8 local (l*) ones)
+\item calling a function shifts the register window, the old output registers become the new input registers (old local and input ones are not accessible anymore)
+\item stack and frame pointer are offset by a BIAS of 2047 (see official doc for reasons)
+\end{itemize}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{lll}
+Name                          & Alias                          & Brief description\\
+\hline
+{\bf \%g0}                    & \%r0                           & Read-only, hardwired to 0 \\
+{\bf \%g1-\%g7}               & \%r1-\%r7                      & Global \\
+{\bf \%o0-\%o3 and \%i0-\%i3} & \%r8-\%r11 and \%r24-\%r27     & Output and input argument registers, return value \\
+{\bf \%o4,\%o5 and \%i4,\%i5} & \%r12,\%r13 and \%r28,\%r29    & Output and input argument registers \\
+{\bf \%o6 and \%i6}           & \%r14 and \%r30, \%sp and \%fp & Stack and frame pointer (NOTE, offset with a BIAS of 2047) \\
+{\bf \%o7 and \%i7}           & \%r15 and \%r31                & Return address (caller writes to o7, callee uses i7) \\
+{\bf \%l0-\%l7}               & \%r16-\%r23                    & preserve \\
+{\bf \%d0,\%d2,\%d4,\%d6}     &                                & scratch, Floating point arguments, return value \\
+{\bf \%d8,\%d10,...,\%d14}    &                                & scratch, Floating point arguments \\
+{\bf \%d16,\%d18,...,\%d30}   &                                & scratch (preserve for Hal), Floating point arguments \\
+{\bf \%d32,\%d34,...,\%d62}   &                                & scratch (preserve for Hal) \\
+\end{tabular*}
+\caption{Register usage on sparc64 calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+\begin{itemize}
+\item stack grows down
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item stack frame is always aligned to 16 bytes
+\item first 6 integers are passed in registers using \%o0-\%o5
+\item first 8 quad precision floating point args (or 16 double precision, or 32 single precision) are passed in floating point registers (\%q0,\%q4,...,\%q28 or \%d0,\%d2,...,\%d30 or \%f0-\%f31, respectively)
+\item for every other argument the stack is used
+\item single precision floating point args are passed in odd \%f* registers, and are "right aligned" in their 8-byte space on the stack
+\item for every argument passed, corresponding \%o*, \%f* register or stack space is skipped (e.g. passing a double as 3rd call argument, \%d4 is used and \%o2 is skipped)
+\item all arguments \textless=\ 64 bit are passed as 64 bit values
+\item minimum stack size is 128 bytes, b/c stack pointer must always point at enough space to store all \%i* and \%l* registers, used when running out of register windows
+\item if needed, register spill area (both, integer and float arguments are spilled in order) is adjacent to parameters
+\end{itemize}
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item results are expected by caller to be returned in \%o0-\%o3 (after reg window restore, meaning callee writes to \%i0-\%i3) for integers
+\item \%d0,\%d2,\%d4,\%d6 are used for floating point values
+\item the fields of structs/unions up to 32b are returned via the respective registers mentioned in the previous bullet points
+\item structs/unions \textgreater= 32b are returned in a space allocated by the caller, with a pointer to it passed as first parameter to the function called (meaning in \%o0)
+\end{itemize}
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/sparc64.sparc64.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                   & \vdots                      &                                &                               \\
+\hhline{~=~~}
+local data (and padding)           & \hspace{4cm}                &                                & \mrrbrace{8}{caller's frame}  \\
+\hhline{~-~~}
+\mrlbrace{6}{parameter area}       & arg n-1                     & \mrrbrace{3}{stack parameters} &                               \\
+                                   & \ldots                      &                                &                               \\
+                                   & arg 6                       &                                &                               \\
+                                   & \%i5                        & \mrrbrace{3}{spill area}       &                               \\
+                                   & \ldots                      &                                &                               \\
+                                   & \%i0                        &                                &                               \\
+\hhline{~-~~}
+register save area (\%i* and \%l*) &                             &                                &                               \\
+\hhline{~=~~}
+local data (and padding)           &                             &                                & \mrrbrace{3}{current frame}   \\
+\hhline{~-~~}
+parameter area                     &                             &                                &                               \\
+\hhline{~-~~}
+                                   & \vdots                      &                                &                               \\
+\end{tabular}
+\caption{Stack layout on sparc64 calling convention}
+\end{figure}
+
--- a/doc/manual/callconvs/callconv_x64.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_x64.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,6 +1,6 @@
 %//////////////////////////////////////////////////////////////////////////////
 %
-% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, 
 %                         Tassilo Philipp <tphilipp@potion-studios.com>
 %
 % Permission to use, copy, modify, and distribute this software for any
@@ -20,7 +20,7 @@
 % ==================================================
 % x64
 % ==================================================
-\subsection{x64 Calling Convention}
+\subsection{x64 Calling Conventions}
 
 
 \paragraph{Overview}
@@ -84,7 +84,7 @@
 \item stack parameter order: right-to-left
 \item caller cleans up the stack
 \item first 4 integer/pointer parameters are passed via rcx, rdx, r8, r9 (from left to right), others are pushed on stack (there is a
-preserve area for the first 4)
+spill area for the first 4)
 \item float and double parameters are passed via xmm0l-xmm3l
 \item first 4 parameters are passed via the correct register depending on the parameter type - with mixed float and int parameters,
 some registers are left out (e.g. first parameter ends up in rcx or xmm0, second in rdx or xmm1, etc.)
@@ -114,31 +114,35 @@
 
 \paragraph{Stack layout}
 
-Stack frame is always 16-byte aligned. Stack directly after function prolog:\\
+Stack frame is always 16-byte aligned.
+% verified/amended: TP nov 2019 (@@@ no doc/disas_examples/x64.win.disas, yet...@@@)
+Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
+                                  & \vdots         &                                &                               \\
 \hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{9}{caller's frame} \\
+register save area                & \hspace{4cm}   &                                & \mrrbrace{10}{caller's frame} \\
 \hhline{~-~~}
-\mrlbrace{7}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & r9 or xmm3     & \mrrbrace{4}{spill area}       &                              \\
-                                  & r8 or xmm2     &                                &                              \\
-                                  & rdx or xmm1    &                                &                              \\
-                                  & rcx or xmm0    &                                &                              \\
+local data                        &                &                                &                               \\
+\hhline{~-~~}                            
+\mrlbrace{7}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                               \\
+                                  & \ldots         &                                &                               \\
+                                  & arg 4          &                                &                               \\
+                                  & r9 or xmm3     & \mrrbrace{4}{spill area}       &                               \\
+                                  & r8 or xmm2     &                                &                               \\
+                                  & rdx or xmm1    &                                &                               \\
+                                  & rcx or xmm0    &                                &                               \\
 \hhline{~-~~}
-                                  & return address &                                &                              \\
+                                  & return address &                                &                               \\
 \hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
+register save area                &                &                                & \mrrbrace{4}{current frame}   \\
 \hhline{~-~~}
-parameter area                    &                &                                &                              \\
+local data                        &                &                                &                               \\
 \hhline{~-~~}
-                                  & \vdots         &                                &                              \\
+parameter area                    &                &                                &                               \\
 \hhline{~-~~}
+                                  & \vdots         &                                &                               \\
 \end{tabular}
 \caption{Stack layout on x64 Microsoft platform}
 \end{figure}
@@ -191,6 +195,7 @@
 exact but an upper bound on the number of used xmm registers)
 \item stack is always 16byte aligned - since return address is 64 bits in size, stacks with an odd number of parameters are
 already aligned
+\item no spill area is used on stack, iterating over varargs requires a specific va\_list implementation
 \end{itemize}
 
 
@@ -207,28 +212,31 @@
 
 \paragraph{Stack layout}
 
-Stack frame is always 16-byte aligned. Note that there is no spill area.
+Stack frame is always 16-byte aligned.
+% verified/amended: TP nov 2019 (see also doc/disas_examples/x64.sysv.disas)
 Stack directly after function prolog:\\
 
 \begin{figure}[h]
 \begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
+                             & \vdots         &                                &                              \\
 \hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
+register save area           & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data (with padding)    &                &                                &                              \\
 \hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
+\mrlbrace{3}{parameter area} & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+                             & \ldots         &                                &                              \\
+                             & arg 6          &                                &                              \\
 \hhline{~-~~}
-                                  & return address &                                &                              \\
+                             & return address &                                &                              \\
 \hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
+register save area           &                &                                & \mrrbrace{4}{current frame}  \\
 \hhline{~-~~}
-parameter area                    &                &                                &                              \\
+local data                   &                &                                &                              \\
 \hhline{~-~~}
-                                  & \vdots         &                                &                              \\
+parameter area               &                &                                &                              \\
 \hhline{~-~~}
+                             & \vdots         &                                &                              \\
 \end{tabular}
 \caption{Stack layout on x64 System V (Linux/*BSD)}
 \end{figure}
--- a/doc/manual/callconvs/callconv_x86.tex	Fri Nov 22 23:08:59 2019 +0100
+++ b/doc/manual/callconvs/callconv_x86.tex	Fri Nov 22 23:11:56 2019 +0100
@@ -1,762 +1,808 @@
-%//////////////////////////////////////////////////////////////////////////////
-%
-% Copyright (c) 2007,2009 Daniel Adler <dadler@uni-goettingen.de>, 
-%                         Tassilo Philipp <tphilipp@potion-studios.com>
-%
-% Permission to use, copy, modify, and distribute this software for any
-% purpose with or without fee is hereby granted, provided that the above
-% copyright notice and this permission notice appear in all copies.
-%
-% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
-% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
-% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
-% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
-% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
-% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
-% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
-%
-%//////////////////////////////////////////////////////////////////////////////
-
-% ==================================================
-% x86 
-% ==================================================
-\subsection{x86 Calling Conventions}
-
-
-\paragraph{Overview}
-
-There are numerous different calling conventions on the x86 processor
-architecture, like cdecl \cite{x86cdecl}, MS fastcall \cite{x86Winfastcall}, GNU
-fastcall \cite{x86GNUfastcall}, Borland fastcall \cite{x86Borlandfastcall}, Watcom
-fastcall \cite{x86Watcomfastcall}, Win32 stdcall \cite{x86Winstdcall}, MS thiscall
-\cite{x86Winthiscall}, GNU thiscall \cite{x86GNUthiscall}, the pascal calling
-convention \cite{x86Pascal} and a cdecl-like version for Plan9 \cite{x86Plan9}
-(dubbed plan9call by us), etc.\\
-
-
-\paragraph{\product{dyncall} support}
-
-Currently cdecl, stdcall, fastcall (MS and GNU), thiscall (MS and GNU) and
-plan9call are supported.\\
-\\
-
-
-\subsubsection{cdecl}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch\\
-{\bf edx}     & scratch, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 cdecl calling convention}
-\end{table}
-
-
-\pagebreak
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item caller cleans up the stack
-\item all arguments are pushed onto the stack
-\end{itemize}
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers
-\item floating point types are returned via the st0 register (except on Minix, where they are returned as integers are)
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 cdecl calling convention}
-\end{figure}
-
-
-\pagebreak
-
-\subsubsection{MS fastcall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch, parameter 0\\
-{\bf edx}     & scratch, parameter 1, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 fastcall (MS) calling convention}
-\end{table}
-
-
-\pagebreak
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item called function cleans up the stack
-\item first two integers/pointers (\textless=\ 32bit) are passed via ecx and edx (even if preceded by other arguments)
-\item integer types 64 bits in size @@@ ? first in edx:eax ?
-\item if first argument is a 64 bit integer, it is passed via ecx and edx
-\item all other parameters are pushed onto the stack
-\end{itemize}
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@verify
-\item floating point types are returned via the st0 register@@@ really ?
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 fastcall (MS) calling convention}
-\end{figure}
-
-
-\pagebreak
-
-\subsubsection{GNU fastcall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch, parameter 0\\
-{\bf edx}     & scratch, parameter 1, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 fastcall (GNU) calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item called function cleans up the stack
-\item first two integers/pointers (\textless=\ 32bit) are passed via ecx and edx (even if preceded by other arguments)
-\item if first argument is a 64 bit integer, it is pushed on the stack and the two registers are skipped 
-\item all other parameters are pushed onto the stack
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register.
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers.
-\item floating point types are returned via the st0.
-\end{itemize}
-
-
-\pagebreak
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 fastcall (GNU) calling convention}
-\end{figure}
-
-
-\subsubsection{Borland fastcall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, parameter 0, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch, parameter 2\\
-{\bf edx}     & scratch, parameter 1, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 fastcall (Borland) calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: left-to-right
-\item called function cleans up the stack
-\item first three integers/pointers (\textless=\ 32bit) are passed via eax, ecx and edx (even if preceded by other arguments@@@?)
-\item integer types 64 bits in size @@@ ?
-\item all other parameters are pushed onto the stack
-\end{itemize}
-
-
-\pagebreak
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@ verify
-\item floating point types are returned via the st0 register@@@ really ?
-\end{itemize}
-
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 fastcall (Borland) calling convention}
-\end{figure}
-
-
-\subsubsection{Watcom fastcall}
-
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, parameter 0, return value@@@\\
-{\bf ebx}     & scratch when used for parameter, parameter 2\\
-{\bf ecx}     & scratch when used for parameter, parameter 3\\
-{\bf edx}     & scratch when used for parameter, parameter 1, return value@@@\\
-{\bf esi}     & scratch when used for return pointer @@@??\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 fastcall (Watcom) calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item called function cleans up the stack
-\item first four integers/pointers (\textless=\ 32bit) are passed via eax, edx, ebx and ecx (even if preceded by other arguments@@@?)
-\item integer types 64 bits in size @@@ ?
-\item all other parameters are pushed onto the stack
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register@@@verify, I thnik its esi?
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@ verify
-\item floating point types are returned via the st0 register@@@ really ?
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 fastcall (Watcom) calling convention}
-\end{figure}
-
-
-
-\subsubsection{win32 stdcall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch\\
-{\bf edx}     & scratch, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 stdcall calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item Stack parameter order: right-to-left
-\item Called function cleans up the stack
-\item All parameters are pushed onto the stack
-\item Stack is usually 4 byte aligned (GCC \textgreater=\ 3.x seems to use a 16byte alignement@@@)
-\item Function name is decorated by prepending an underscore character and appending a '@' character and the number of bytes of stack space required
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers
-\item floating point types are returned via the st0 register
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 stdcall calling convention}
-\end{figure}
-
-\subsubsection{MS thiscall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch, parameter 0\\
-{\bf edx}     & scratch, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 thiscall (MS) calling convention}
-\end{table}
-
-\newpage
-
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item called function cleans up the stack
-\item first parameter (this pointer) is passed via ecx
-\item all other parameters are pushed onto the stack
-\item Function name is decorated by prepending a '@' character and appending a '@' character and the number of bytes (decimal) of stack space required
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@verify
-\item floating point types are returned via the st0 register@@@ really ?
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 thiscall (MS) calling convention}
-\end{figure}
-
-
-
-\subsubsection{GNU thiscall}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch\\
-{\bf edx}     & scratch, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 thiscall (GNU) calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item caller cleans up the stack
-\item all parameters are pushed onto the stack
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@verify
-\item floating point types are returned via the st0 register@@@ really ?
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 thiscall (GNU) calling convention}
-\end{figure}
-
-
-
-\subsubsection{pascal}
-
-The best known uses of the pascal calling convention are the 16 bit OS/2 APIs, Microsoft Windows 3.x and Borland Delphi 1.x.
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & permanent\\
-{\bf ecx}     & scratch\\
-{\bf edx}     & scratch, return value\\
-{\bf esi}     & permanent\\
-{\bf edi}     & permanent\\
-{\bf ebp}     & permanent\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 pascal calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: left-to-right
-\item called function cleans up the stack
-\item all parameters are pushed onto the stack
-\end{itemize}
-
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits are returned via the eax and edx registers
-\item floating point types are returned via the st0 register
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\caption{Stack layout on x86 pascal calling convention}
-\end{figure}
-
-
-\newpage
-
-\subsubsection{plan9call}
-
-\paragraph{Registers and register usage}
-
-\begin{table}[h]
-\begin{tabular*}{0.95\textwidth}{3 B}
-Name          & Brief description\\
-\hline
-{\bf eax}     & scratch, return value\\
-{\bf ebx}     & scratch\\
-{\bf ecx}     & scratch\\
-{\bf edx}     & scratch\\
-{\bf esi}     & scratch\\
-{\bf edi}     & scratch\\
-{\bf ebp}     & scratch\\
-{\bf esp}     & stack pointer\\
-{\bf st0}     & scratch, floating point return value\\
-{\bf st1-st7} & scratch\\
-\end{tabular*}
-\caption{Register usage on x86 plan9call calling convention}
-\end{table}
-
-\paragraph{Parameter passing}
-
-\begin{itemize}
-\item stack parameter order: right-to-left
-\item caller cleans up the stack%@@@ doesn't belong to "parameter passing"
-\item all parameters are pushed onto the stack
-\end{itemize}
-
-\pagebreak
-
-\paragraph{Return values}
-
-\begin{itemize}
-\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
-\item integers \textgreater\ 32 bits or structures are returned by the caller allocating the space and
-passing a pointer to the callee as a new, implicit first parameter (this means, on the stack)
-\item floating point types are returned via the st0 register (called F0 in plan9 8a's terms)
-\end{itemize}
-
-
-\paragraph{Stack layout}
-
-Stack directly after function prolog:\\
-
-\begin{figure}[h]
-\begin{tabular}{5|3|1 1}
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~=~~}
-local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
-\hhline{~-~~}
-\mrlbrace{3}{parameter area}      & \ldots         & \mrrbrace{3}{stack parameters} &                              \\
-                                  & \ldots         &                                &                              \\
-                                  & \ldots         &                                &                              \\
-\hhline{~-~~}
-                                  & return address &                                &                              \\
-\hhline{~=~~}
-local data                        &                &                                & \mrrbrace{3}{current frame}  \\
-\hhline{~-~~}
-parameter area                    &                &                                &                              \\
-\hhline{~-~~}
-                                  & \vdots         &                                &                              \\
-\hhline{~-~~}
-\end{tabular}
-\\
-\\
-\\
-\caption{Stack layout on x86 plan9call calling convention}
-\end{figure}
+%//////////////////////////////////////////////////////////////////////////////
+%
+% Copyright (c) 2007-2019 Daniel Adler <dadler@uni-goettingen.de>, 
+%                         Tassilo Philipp <tphilipp@potion-studios.com>
+%
+% Permission to use, copy, modify, and distribute this software for any
+% purpose with or without fee is hereby granted, provided that the above
+% copyright notice and this permission notice appear in all copies.
+%
+% THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+% WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+% MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+% ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+% WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+% ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+% OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+%
+%//////////////////////////////////////////////////////////////////////////////
+
+% ==================================================
+% x86 
+% ==================================================
+\subsection{x86 Calling Conventions}
+
+
+\paragraph{Overview}
+
+On this processor, a word is defined to be 16 bits in size, a dword 32 bits
+and a qword 64 bits.\\
+
+There are numerous different calling conventions on the x86 processor
+architecture, like cdecl \cite{x86cdecl}, MS fastcall \cite{x86Winfastcall}, GNU
+fastcall \cite{x86GNUfastcall}, Borland fastcall \cite{x86Borlandfastcall}, Watcom
+fastcall \cite{x86Watcomfastcall}, Win32 stdcall \cite{x86Winstdcall}, MS thiscall
+\cite{x86Winthiscall}, GNU thiscall \cite{x86GNUthiscall}, the pascal calling
+convention \cite{x86Pascal} and a cdecl-like version for Plan9 \cite{x86Plan9}
+(dubbed plan9call by us), etc.\\
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{rccccc}
+                 & \# of regs & \# regs to  &               & cleanup & 64bit args \\
+Name             & for params & \# preserve & push order    & by      & via regs?  \\
+\hline                           
+cdecl            & 0          & 4           & $\leftarrow$  & caller  & -          \\
+MS fastcall      & 2          & 4           & $\leftarrow$  & callee  & Y          \\
+GNU fastcall     & 2          & 4           & $\leftarrow$  & callee  & N          \\
+Borland fastcall & 3          & 4           & $\rightarrow$ & callee  & N          \\
+Watcom fastcall  & 4          & 2-6         & $\leftarrow$  & callee  & N          \\
+win32 stdcall    & 0          & 4           & $\leftarrow$  & callee  & -          \\
+MS thiscall      & 1          & 4           & $\leftarrow$  & callee  & N          \\
+GNU thiscall     & 0          & 4           & $\leftarrow$  & caller  & -          \\
+pascal           & 0          & 4           & $\rightarrow$ & callee  & -          \\
+plan9call        & 0          & 0           & $\leftarrow$  & caller  & -          \\
+\end{tabular*}
+\caption{short x86 calling convention comparison}
+\end{table}
+
+
+\paragraph{\product{dyncall} support}
+
+Currently cdecl, stdcall, fastcall (MS and GNU), thiscall (MS and GNU) and
+plan9call are supported.\\
+\\
+
+
+\newpage
+
+
+\subsubsection{cdecl}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch\\
+{\bf edx}     & scratch, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 cdecl calling convention}
+\end{table}
+
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item all arguments are pushed onto the stack
+\end{itemize}
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers
+\item return values \textgreater\ 64 bits (e.g. structures) are returned by the caller allocating the space and
+passing a pointer to the callee as a new, implicit first parameter (this means, on the stack)
+\item floating point types are returned via the st0 register (except on Minix, where they are returned as integers are)
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/x86.cdecl.disas)
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots         &                                &                              \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots         &                                &                              \\
+                                  & arg 0          &                                &                              \\
+\hhline{~-~~}
+                                  & return address &                                &                              \\
+\hhline{~=~~}
+register save area                &                &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+parameter area                    &                &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots         &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 cdecl calling convention}
+\end{figure}
+
+
+\newpage
+
+
+\subsubsection{MS fastcall}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch, parameter 0\\
+{\bf edx}     & scratch, parameter 1, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 fastcall (MS) calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item called function cleans up the stack
+\item first two integers/pointers (\textless=\ 32bit) are passed via ecx and edx (even if preceded by other arguments)
+\item if first argument is a 64 bit integer, it is passed via ecx and edx
+\item all other parameters are pushed onto the stack
+\end{itemize}
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@verify
+\item floating point types are returned via the st0 register@@@ really ?
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}               &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                            &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & last arg                   & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & first arg passed via stack &                                &                              \\
+\hhline{~-~~}
+                                  & return address             &                                &                              \\
+\hhline{~=~~}
+register save area                &                            &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}
+local data                        &                            &                                &                              \\
+\hhline{~-~~}
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots                     &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 fastcall (MS) calling convention}
+\end{figure}
+
+
+\pagebreak
+
+\subsubsection{GNU fastcall}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch, parameter 0\\
+{\bf edx}     & scratch, parameter 1, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 fastcall (GNU) calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item called function cleans up the stack
+\item first two integers/pointers (\textless=\ 32bit) are passed via ecx and edx (even if preceded by other arguments)
+\item if first argument is a 64 bit integer, it is pushed on the stack and the two registers are skipped 
+\item all other parameters are pushed onto the stack
+\end{itemize}
+
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register.
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers.
+\item floating point types are returned via the st0.
+\end{itemize}
+
+
+\pagebreak
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}                                                 
+register save area                & \hspace{4cm}               &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}                                                 
+local data                        &                            &                                &                              \\
+\hhline{~-~~}                                                 
+\mrlbrace{3}{parameter area}      & last arg                   & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & first arg passed via stack &                                &                              \\
+\hhline{~-~~}
+                                  & return address             &                                &                              \\
+\hhline{~=~~}                                                  
+register save area                &                            &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}                                                  
+local data                        &                            &                                &                              \\
+\hhline{~-~~}                                                  
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}                                                  
+                                  & \vdots                     &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 fastcall (GNU) calling convention}
+\end{figure}
+
+
+\subsubsection{Borland fastcall}
+
+Also called {\bf register convention} by Borland.
+
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, parameter 0, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch, parameter 2\\
+{\bf edx}     & scratch, parameter 1, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 fastcall (Borland) calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: left-to-right
+\item called function cleans up the stack
+\item first three integers/pointers (with exception of method pointers) (\textless=\ 32bit) are passed via eax, ecx and edx (even if preceded or interleaved by other arguments)
+\item all other parameters are pushed onto the stack
+\end{itemize}
+
+
+\pagebreak
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers
+\item floating point types are returned via the st0 register
+\item all others (e.g. all structs, return values \textgreater\ 64 bits, ...) are returned by the caller allocating the space and
+passing a pointer to the callee as a new, implicit first parameter
+\end{itemize}
+
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}                                                  
+register save area                & \hspace{4cm}               &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                            &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & first arg passed via stack & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & last arg                   &                                &                              \\
+\hhline{~-~~}                                                  
+                                  & return address             &                                &                              \\
+\hhline{~=~~}                                                  
+register save area                &                            &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}                                                  
+local data                        &                            &                                &                              \\
+\hhline{~-~~}                                                  
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}                                                  
+                                  & \vdots                     &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 fastcall (Borland) calling convention}
+\end{figure}
+
+
+\subsubsection{Watcom fastcall}
+
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, parameter 0, return value\\
+{\bf ebx}     & scratch when used for parameter, otherwise preserve, parameter 2\\
+{\bf ecx}     & scratch when used for parameter, otherwise preserve, parameter 3\\
+{\bf edx}     & scratch when used for parameter, otherwise preserve, parameter 1, return value\\
+{\bf esi}     & scratch when used for return pointer, otherwise preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 fastcall (Watcom) calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item called function cleans up the stack
+\item first four integers/pointers (\textless=\ 32bit) are passed via eax, edx, ebx and ecx (even if preceded by other arguments)
+\item arguments \textgreater 32 bits, as well as all subsequent arguments, are passed via the stack
+\item all other parameters are pushed onto the stack
+\item varargs are always passed via the stack
+\end{itemize}
+
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@ verify
+\item floating point types are returned via the st0 register@@@ really ?
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots                     &                                &                              \\
+\hhline{~=~~}                                                
+register save area                & \hspace{4cm}               &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                            &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & last arg                   & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots                     &                                &                              \\
+                                  & first arg passed via stack &                                &                              \\
+\hhline{~-~~}
+                                  & return address             &                                &                              \\
+\hhline{~=~~}                                                  
+register save area                &                            &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}                                                  
+local data                        &                            &                                &                              \\
+\hhline{~-~~}                                                  
+parameter area                    &                            &                                &                              \\
+\hhline{~-~~}                                                  
+                                  & \vdots                     &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 fastcall (Watcom) calling convention}
+\end{figure}
+
+
+
+\subsubsection{win32 stdcall}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch\\
+{\bf edx}     & scratch, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 stdcall calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item Stack parameter order: right-to-left
+\item Called function cleans up the stack
+\item All parameters are pushed onto the stack
+\item Stack is usually 4 byte aligned (GCC \textgreater=\ 3.x seems to use a 16byte alignement)
+\item the direction flag is clear on entry and must be returned clear % mention it first, above @@@
+\end{itemize}
+
+% introduce mangling section? \item Function name is decorated by prepending an underscore character and appending a '@' character and the number of bytes of stack space required
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers
+\item floating point types are returned via the st0 register
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots         &                                &                              \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots         &                                &                              \\
+                                  & arg 0          &                                &                              \\
+\hhline{~-~~}
+                                  & return address &                                &                              \\
+\hhline{~=~~}
+register save area                &                &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+parameter area                    &                &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots         &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 stdcall calling convention}
+\end{figure}
+
+\subsubsection{MS thiscall}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch, parameter 0\\
+{\bf edx}     & scratch, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 thiscall (MS) calling convention}
+\end{table}
+
+\newpage
+
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item called function cleans up the stack
+\item first parameter (this pointer) is passed via ecx
+\item all other parameters are pushed onto the stack
+\end{itemize}
+
+% introduce mangling section? \item Function name is decorated by prepending a '@' character and appending a '@' character and the number of bytes (decimal) of stack space required
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers@@@verify
+\item floating point types are returned via the st0 register@@@ really ?
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots         &                                &                              \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~=~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots         &                                &                              \\
+                                  & arg 1          &                                &                              \\
+\hhline{~-~~}
+                                  & return address &                                &                              \\
+\hhline{~=~~}
+register save area                &                &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+parameter area                    &                &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots         &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 thiscall (MS) calling convention}
+\end{figure}
+
+
+
+\subsubsection{GNU thiscall}
+
+This is equivalent to the cdecl calling convention, with the first parameter being the this pointer.
+
+% \paragraph{Registers and register usage}
+% 
+% \begin{table}[h]
+% \begin{tabular*}{0.95\textwidth}{3 B}
+% Name          & Brief description\\
+% \hline
+% {\bf eax}     & scratch, return value\\
+% {\bf ebx}     & preserve\\
+% {\bf ecx}     & scratch\\
+% {\bf edx}     & scratch, return value\\
+% {\bf esi}     & preserve\\
+% {\bf edi}     & preserve\\
+% {\bf ebp}     & preserve\\
+% {\bf esp}     & stack pointer\\
+% {\bf st0}     & scratch, floating point return value\\
+% {\bf st1-st7} & scratch\\
+% \end{tabular*}
+% \caption{Register usage on x86 thiscall (GNU) calling convention}
+% \end{table}
+% 
+% \paragraph{Parameter passing}
+% 
+% \begin{itemize}
+% \item stack parameter order: right-to-left
+% \item caller cleans up the stack
+% \item all parameters are pushed onto the stack
+% \end{itemize}
+% 
+% 
+% \paragraph{Return values}
+% 
+% \begin{itemize}
+% \item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+% \item integers \textgreater\ 32 bits are returned via the eax and edx registers
+% \item floating point types are returned via the st0 register
+% \end{itemize}
+% 
+% 
+% \paragraph{Stack layout}
+% 
+% Stack directly after function prolog:\\
+% 
+% \begin{figure}[h]
+% \begin{tabular}{5|3|1 1}
+%                                   & \vdots         &                                &                              \\
+% \hhline{~=~~}
+% register save area                & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+% \hhline{~-~~}
+% local data                        &                &                                &                              \\
+% \hhline{~-~~}
+% \mrlbrace{3}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+%                                   & \ldots         &                                &                              \\
+%                                   & arg 0          &                                &                              \\
+% \hhline{~-~~}
+%                                   & return address &                                &                              \\
+% \hhline{~=~~}
+% register save area                &                &                                & \mrrbrace{4}{current frame}  \\
+% \hhline{~-~~}
+% local data                        &                &                                &                              \\
+% \hhline{~-~~}
+% parameter area                    &                &                                &                              \\
+% \hhline{~-~~}
+%                                   & \vdots         &                                &                              \\
+% \end{tabular}
+% \caption{Stack layout on x86 thiscall (GNU) calling convention}
+% \end{figure}
+
+
+
+\subsubsection{pascal}
+
+The best known uses of the pascal calling convention are the 16 bit OS/2 APIs, Microsoft Windows 3.x and Borland Delphi 1.x.
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & preserve\\
+{\bf ecx}     & scratch\\
+{\bf edx}     & scratch, return value\\
+{\bf esi}     & preserve\\
+{\bf edi}     & preserve\\
+{\bf ebp}     & preserve\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 pascal calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: left-to-right
+\item called function cleans up the stack
+\item all parameters are pushed onto the stack
+\end{itemize}
+
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits are returned via the eax and edx registers
+\item floating point types are returned via the st0 register
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots         &                                &                              \\
+\hhline{~=~~}
+register save area                & \hspace{4cm}   &                                & \mrrbrace{6}{caller's frame} \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & arg 0          & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots         &                                &                              \\
+                                  & arg n-1        &                                &                              \\
+\hhline{~-~~}
+                                  & return address &                                &                              \\
+\hhline{~=~~}
+register save area                &                &                                & \mrrbrace{4}{current frame}  \\
+\hhline{~-~~}
+local data                        &                &                                &                              \\
+\hhline{~-~~}
+parameter area                    &                &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots         &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 pascal calling convention}
+\end{figure}
+
+
+\newpage
+
+\subsubsection{plan9call}
+
+\paragraph{Registers and register usage}
+
+\begin{table}[h]
+\begin{tabular*}{0.95\textwidth}{3 B}
+Name          & Brief description\\
+\hline
+{\bf eax}     & scratch, return value\\
+{\bf ebx}     & scratch\\
+{\bf ecx}     & scratch\\
+{\bf edx}     & scratch\\
+{\bf esi}     & scratch\\
+{\bf edi}     & scratch\\
+{\bf ebp}     & scratch\\
+{\bf esp}     & stack pointer\\
+{\bf st0}     & scratch, floating point return value\\
+{\bf st1-st7} & scratch\\
+\end{tabular*}
+\caption{Register usage on x86 plan9call calling convention}
+\end{table}
+
+\paragraph{Parameter passing}
+
+\begin{itemize}
+\item stack parameter order: right-to-left
+\item caller cleans up the stack
+\item all parameters are pushed onto the stack
+\end{itemize}
+
+\pagebreak
+
+\paragraph{Return values}
+
+\begin{itemize}
+\item return values of pointer or integral type (\textless=\ 32 bits) are returned via the eax register
+\item integers \textgreater\ 32 bits or structures are returned by the caller allocating the space and
+passing a pointer to the callee as a new, implicit first parameter (this means, on the stack)
+\item floating point types are returned via the st0 register (called F0 in plan9 8a's terms)
+\end{itemize}
+
+
+\paragraph{Stack layout}
+
+% verified/amended: TP nov 2019 (see also doc/disas_examples/x86.plan9call.disas)
+Note there is no register save area at all. Stack directly after function prolog:\\
+
+\begin{figure}[h]
+\begin{tabular}{5|3|1 1}
+                                  & \vdots         &                                &                              \\
+\hhline{~=~~}
+local data                        & \hspace{4cm}   &                                & \mrrbrace{5}{caller's frame} \\
+\hhline{~-~~}
+\mrlbrace{3}{parameter area}      & arg n-1        & \mrrbrace{3}{stack parameters} &                              \\
+                                  & \ldots         &                                &                              \\
+                                  & arg 0          &                                &                              \\
+\hhline{~-~~}
+                                  & return address &                                &                              \\
+\hhline{~=~~}
+local data                        &                &                                & \mrrbrace{3}{current frame}  \\
+\hhline{~-~~}
+parameter area                    &                &                                &                              \\
+\hhline{~-~~}
+                                  & \vdots         &                                &                              \\
+\end{tabular}
+\caption{Stack layout on x86 plan9call calling convention}
+\end{figure}
+