0
|
1 \documentclass[11pt]{article}
|
|
2 \usepackage[round]{natbib}
|
|
3 \usepackage{hyperref}
|
|
4 \usepackage{amsmath}
|
|
5 \usepackage{fancyvrb}
|
|
6 \usepackage{verbatim}
|
|
7 \usepackage{alltt,graphicx}
|
|
8 \usepackage{fullpage}
|
|
9 \bibliographystyle{abbrvnat}
|
|
10 \newcommand{\file}[1]{{`\normalfont\textsf{#1}'}}
|
|
11 \newcommand{\strong}[1]{\texorpdfstring%
|
|
12 {{\normalfont\fontseries{b}\selectfont #1}}%
|
|
13 {#1}}
|
|
14 \let\pkg=\strong
|
|
15 \newcommand\code{\bgroup\@codex}
|
|
16 \def\@codex#1{\texorpdfstring%
|
|
17 {{\normalfont\ttfamily\hyphenchar\font=-1 #1}}%
|
|
18 {#1}\egroup}
|
|
19 \newenvironment{smallverbatim}{\small\verbatim}{\endverbatim}
|
|
20 \newenvironment{example}{\begin{alltt}}{\end{alltt}}
|
|
21 \newenvironment{smallexample}{\begin{alltt}\small}{\end{alltt}}
|
|
22
|
|
23 \begin{document}
|
|
24
|
|
25
|
|
26 \title{Foreign Library Interface}
|
|
27 %\VignetteIndexEntry{Foreign Library Interface}
|
|
28 \author{by Daniel Adler}
|
|
29 \maketitle
|
|
30 \abstract{
|
|
31 We present an improved Foreign Function Interface (FFI) for R to
|
|
32 call arbitary native functions without the need for C wrapper code.
|
|
33 Further we discuss a dynamic linkage
|
|
34 framework for binding standard C libraries to R across platforms using a
|
|
35 universal type information format.
|
|
36 The package \pkg{rdyncall} comprises the framework
|
|
37 and an initial repository of cross-platform bindings for standard libraries such as
|
|
38 (legacy and modern) \emph{OpenGL}, the family of \emph{SDL} libraries and \emph{Expat}.
|
|
39 The package enables system-level programming using the R language;
|
|
40 sample applications are given in the article.
|
|
41 We outline the underlying automation tool-chain that extracts
|
|
42 cross-platform bindings from C headers, making the
|
|
43 repository extendable and open for library developers.
|
|
44 }
|
|
45 \section{Introduction}
|
|
46
|
|
47 \begin{table*}
|
|
48 \centering
|
|
49 \label{tab:libs}
|
|
50 \begin{tabular}{l|l|c|c|c}
|
|
51 lib/dynport & description & functions & constants & aggregate types \\
|
|
52 \hline
|
|
53 \code{gl} & opengl & 337 & 3253 & - \\
|
|
54 \code{glu} & opengl utility & 59 & 154 & - \\
|
|
55 \code{r} & r library & 238 & 700 & 27 \\
|
|
56 \code{sdl} & audio/video/ui abstraction & 203 & 465 & 51 \\
|
|
57 \code{sdl\_image} & pixel format loaders & 29 & - & - \\
|
|
58 \code{sdl\_mixer} & music format loaders and playing & 63 & 12 & - \\
|
|
59 \code{sdl\_ttf} & font format loaders & 35 & 9 & - \\
|
|
60 \code{cuda} & gpu programming & 387 & 665 & 84 \\
|
|
61 \code{expat} & xml parsing framework & 65 & 70 & - \\
|
|
62 \code{glew} & gl extensions & 1465 & - & - \\
|
|
63 \code{gl3} & opengl 3 (strict) & 324 & 838 & 1 \\
|
|
64 \code{opencl} & gpu programming & 78 & 260 & 10 \\
|
|
65 \code{stdio} & standard i/o & 76 & 3 & - \\
|
|
66 \end{tabular}
|
|
67 \caption{overview of available dynports for portable c libraries}
|
|
68 \end{table*}
|
|
69
|
|
70 We present an improved Foreign Function Interface (FFI) for R that
|
|
71 significantly reduces the amount of C wrapper code needed to interface with C.
|
|
72 We also introduce a \emph{dynamic} linkage that binds the C
|
|
73 interface of a pre-compiled library (\emph{as a whole}) to an
|
|
74 interpreted programming environment \citep{Oust97a} such as R - hence th name
|
|
75 \emph{Foreign Library Interface}. Table 1 gives a list
|
|
76 of the C libraries currently supported across major R platforms.
|
|
77 For each library supported, abstract interface specifications are declared
|
|
78 in a compact platform-neutral text-based format stored in so-called
|
|
79 \emph{DynPort} files on a local repository.
|
|
80
|
|
81 %between high-level interpreted programming environments
|
|
82 %and native pre-compiled C libraries that uses a compact text-based
|
|
83 %interface and type information format that makes this method work across platforms.
|
|
84
|
|
85 R \citep{R:Ihaka+Gentleman:1996} was choosen as the first language
|
|
86 to implement a proof-of-concept implementation for this approach.
|
|
87 This article describes the \pkg{rdyncall} package which
|
|
88 implements a complete toolkit of low-level facilities that can be used as an
|
|
89 alternative FFI to interface with the C programming language.
|
|
90 And further, it enables direct and quick access to
|
|
91 the common C libraries from R without compilation.
|
|
92
|
|
93 The project was motivated by the fact that
|
|
94 high-quality software solutions implemented in portable C
|
|
95 are often not available in interpreter-based languages such as R.
|
|
96 The pool of freely available C libraries is quite large and
|
|
97 represents an invaluable resource for software development.
|
|
98 For example, OpenGL \citep{Board05} is the most portable and standard interface to
|
|
99 accelerated graphics hardware for developing real-time graphics software.
|
|
100 The combination of OpenGL with the \emph{Simple DirectMedia Layer} (SDL) \citep{SDL}
|
|
101 core and extension libraries offers a foundation framework for
|
|
102 developing interactive multimedia applications that can run on a
|
|
103 multitude of platforms.
|
|
104 Other libraries such as the Expat XML Parser \citep{www:expat} provide a parser framework
|
|
105 for processing very large XML documents.
|
|
106 And even the C library of R contains high-quality statistical
|
|
107 functions that are useful in context of other languages as well.
|
|
108
|
|
109 To make use of these libraries within high-level languages, \emph{language bindings}
|
|
110 to the library must be written as an extension to the language, a task that
|
|
111 requires deep familiarity of the internals of both the library and the interpreter.
|
|
112 Depending on the complexity of the library, the amount of work needed to wrap
|
|
113 the interface can be very large (Table \ref{tab:libs} gives the counts of
|
|
114 functions, constants and types that need to be wrapped).
|
|
115 Rather than having to write a separate binding for each \emph{library and language}
|
|
116 combination, we research a dynamic binding approach that
|
|
117 is adaptable to interpreters and works cross-platform without additional
|
|
118 compilation of wrapper layers.
|
|
119 Once the binding specification for a library has been specified, that
|
|
120 library becomes automatically accessible to all interpreters that
|
|
121 implement such a framework outlined here.
|
|
122 Extension techniques offered by the language interpreter, such as a
|
|
123 \emph{Foreign Function Interface} (FFI), are the fundamental technology
|
|
124 for bridging the dynamic interpreter with statically pre-compiled code.
|
|
125
|
|
126 In the case of R the built-in FFI function \code{.C} provides a fairly
|
|
127 basic call gate to C code with strong limitations; additional wrapper code has
|
|
128 to be written in addition to interface with standard C libraries.
|
|
129 \pkg{rdyncall} contributes an improved FFI for R that offers a \emph{flexible}
|
|
130 and \emph{type-safe} interface with support for almost all C types without
|
|
131 requiring additional C wrappers.
|
|
132
|
|
133 Based on this FFI, the package contains a proof-of-concept implementation of a \emph{Foreign Library Interface} that enables
|
|
134 \emph{direct} and \emph{dynamic} interoperability with foreign C Libraries
|
|
135 (including shared library code and the Application Programming Interface
|
|
136 specified in C headers) from within the R interpreter.
|
|
137 For each C library supported, abstract interface specification are declared in a
|
|
138 compact platform-neutral text-based format stored in a so-called \emph{DynPort} file
|
|
139 located in a local repository within the package.
|
|
140 Table \ref{tab:libs} gives a sample list of available bindings that come with the package.
|
|
141
|
|
142 Users gain access to C libraries from R using the front-end function \code{dynport(}\emph{portname}\code{)},
|
|
143 which processes a \emph{DynPort} file to load the C library\footnote{Pre-compiled libraries need to be installed, OS-specific installation notes are given in the documentation of the package.},
|
|
144 and wrap the C interface as a newly attached R environment
|
|
145 \footnote{Note \pkg{rdyncall} version 0.7.4 and below uses R name space objects \citep{RNameSpace} as dynport containers. This has changed starting with version 0.7.5 due to restrictions for packages hosted on CRAN not to use internal functions. Since there is no public interface for the creation of name space objects currently in R, \pkg{rdyncall} uses ordinary environment objects for now.
|
|
146 This disables the use of the double colon operator (\code{::}) to refer to dynport objects; unloading is done using \code{detach(dynport:<PORTNAME>)}.}
|
|
147 that uses the same symbolic names of the C API.
|
|
148 R code that uses C interfaces via \emph{DynPort}s might look very familiar to C user code.
|
|
149
|
|
150 This article motivates the topic with a comparison of the built-in and
|
|
151 contributed FFI by means of a simple use case. This leads to a detailed description of the improved FFI.
|
|
152 Then follows an overview of the package and a brief tour through the framework
|
|
153 with details on the handling of foreign C data types and wrapping R functions as callbacks.
|
|
154 Two sample applications are given using OpenGL, SDL and Expat.
|
|
155 The article ends with a brief description of the implementation based on C libraries from the \emph{DynCall} project \citep{dyncall}
|
|
156 and the tool-chain that was used to create the repository of \emph{DynPort} files.
|
|
157
|
|
158 \section{Foreign Function Interfaces}
|
|
159
|
|
160 FFIs provide the backbone of a language to interface with foreign code.
|
|
161 Depending on the design of this service,
|
|
162 it can largely unburden developers from writing additional wrapper code.
|
|
163 In this section, we compare the built-in FFI with the improved
|
|
164 FFI provided by \pkg{rdyncall} using a simple example that sketches
|
|
165 the different work flow paths for making an R binding to a function
|
|
166 from a foreign C library.
|
|
167
|
|
168 \subsection{FFI of base R}
|
|
169
|
|
170 Suppose that we wish to invoke the C function \code{sqrt} of the
|
|
171 C Standard Math library. The function is declared as follows in C:
|
|
172 \begin{verbatim}
|
|
173 double sqrt(double x);
|
|
174 \end{verbatim}
|
|
175
|
|
176 R offers a number of functions to call pre-compiled code from
|
|
177 within the R interpreter. While \code{.Call} and \code{.External}
|
|
178 are designed for interoperability with \emph{extension} code, \code{.C}
|
|
179 and \code{.Fortran} seem to offer the most low-level interoperability with
|
|
180 \emph{foreign} code.
|
|
181 But \code{.C} has also very strict conversion rules and strong limitations
|
|
182 regarding argument and return-types:
|
|
183 \code{.C} passes R arguments as C pointers and
|
|
184 C return types are not supported, so only C \code{void} functions,
|
|
185 which are procedures, can be called.
|
|
186 Given these limitations, we are not able to invoke the foreign
|
|
187 \code{sqrt} function directly and need some intermediate wrapper code
|
|
188 written in C that obeys the rules of the \code{.C} interface:
|
|
189
|
|
190 \begin{smallverbatim}
|
|
191 #include <math.h>
|
|
192 void R_C_sqrt(double * ptr_to_x)
|
|
193 {
|
|
194 double x = ptr_to_x[0], ans;
|
|
195 ans = sqrt(x);
|
|
196 ptr_to_x[0] = ans;
|
|
197 }
|
|
198 \end{smallverbatim}
|
|
199
|
|
200
|
|
201 We assume that the wrapper code is deployed as a shared library
|
|
202 in a package named \emph{testsqrt} which links to the C math library.
|
|
203 \footnote{We omit here the details such as registering C functions which is
|
|
204 described in detail in the R Manual '\emph{Writing R Extensions}' \citep{RExt}.}.
|
|
205 Then we load the \emph{testsqrt} package and call the C wrapper function directly
|
|
206 via \code{.C}.
|
|
207
|
|
208 \begin{example}
|
|
209 > library(testsqrt)
|
|
210 > .C("R_C_sqrt", 144, PACKAGE="testsqrt")
|
|
211 [[1]]
|
|
212 [1] 12
|
|
213 \end{example}
|
|
214
|
|
215 To make \code{sqrt} available as a public function, an additional
|
|
216 R wrapper layer is added, that does type-safety checks before
|
|
217 issuing the \code{.C} call.
|
|
218
|
|
219 \begin{smallverbatim}
|
|
220 sqrtViaC <- function(x)
|
|
221 {
|
|
222 x <- as.numeric(x) # type(x) should be C double.
|
|
223 # make sure length > 0:
|
|
224 length(x) <- max(1, length(x))
|
|
225 .C("R_C_sqrt", x, PACKAGE="example")
|
|
226 }
|
|
227 \end{smallverbatim}
|
|
228
|
|
229 As an alternative, R also provides high-level C extension interfaces
|
|
230 such as \code{.Call} and \code{.External}, that give access to R internals
|
|
231 at C level and enable to make type-safety checks within C:
|
|
232
|
|
233 \begin{smallverbatim}
|
|
234 #include <R.h>
|
|
235 #include <Rinternals.h>
|
|
236 #include <math.h>
|
|
237 SEXP R_Call_sqrt(SEXP x)
|
|
238 {
|
|
239 SEXP ans = R_NilValue, tmp;
|
|
240 PROTECT( tmp = coerceVector(x, REALSXP) );
|
|
241 if (LENGTH(tmp) > 0) {
|
|
242 double y = REAL(tmp)[0], result;
|
|
243 result = sqrt(y);
|
|
244 ans = ScalarReal(result);
|
|
245 }
|
|
246 UNPROTECT(1);
|
|
247 return ans;
|
|
248 }
|
|
249 \end{smallverbatim}
|
|
250
|
|
251 Now the corresponding R wrapper shrinks into a simple delegate:
|
|
252
|
|
253 \begin{example}
|
|
254 > sqrtViaCall <- function(x)
|
|
255 + .Call("R_Call_sqrt", x, PACKAGE="example")
|
|
256 \end{example}
|
|
257
|
|
258 The third alternative, via \code{.External}, is omitted here;
|
|
259 it has a different argument passing scheme, but the C and R wrapper
|
|
260 implementations would look very similar.
|
|
261
|
|
262 We can conclude that - in realistic settings - the built-in FFI of R
|
|
263 almost always needs support by a wrapper layer written in C.
|
|
264 The "foreign" in FFI is in fact relegated to the C wrapper layer.
|
|
265
|
|
266 Moreover the R FFI can be viewed as an \emph{extension} interface for
|
|
267 calling pre-compiled code written in a \emph{foreign} language within
|
|
268 the context of the R implementation, rather than a direct invocation
|
|
269 interface for code from a \emph{foreign} context such as an
|
|
270 ordinary C library.
|
|
271
|
|
272 \subsection{FFI of rdyncall}
|
|
273
|
|
274 \begin{table*}
|
|
275 \begin{center}
|
|
276 \begin{tabular}{ll|ll}
|
|
277 \hline \hline
|
|
278 Type& Sign. & Type & Sign. \\
|
|
279 \hline
|
|
280 \verb@void@ & \verb@v@ & \verb@bool@ & \verb@B@ \\
|
|
281 \verb@char@ & \verb@c@ & \verb@unsigned char@ & \verb@C@ \\
|
|
282 \verb@short@ & \verb@s@ & \verb@unsigned short@ & \verb@S@ \\
|
|
283 \verb@int@ & \verb@i@ & \verb@unsigned int@ & \verb@I@ \\
|
|
284 \verb@long@ & \verb@j@ & \verb@unsigned long@ & \verb@J@ \\
|
|
285 \verb@long long@ & \verb@l@ & \verb@unsigned long long@ & \verb@L@ \\
|
|
286 \verb@float@ & \verb@f@ & \verb@double@ & \verb@d@ \\
|
|
287 \verb@void*@ & \verb@p@ & \verb@struct@ \emph{name} \verb@*@ & \verb@*<@\emph{name}\verb@>@ \\
|
|
288 \emph{type}\verb@*@ & \verb@*@... & \verb@const char*@ & \verb@Z@ \\
|
|
289 \hline \hline
|
|
290 \end{tabular}
|
|
291 \end{center}
|
|
292 \caption{\label{tab:signature} C/C++ Types and Signatures}
|
|
293 \end{table*}
|
|
294
|
|
295 \pkg{rdyncall} provides an improved FFI for R
|
|
296 that is accessible via the function \code{.dyncall}.
|
|
297 In contrast to the built-in R FFI which uses a C wrapper layer,
|
|
298 the \code{sqrt} function is invoked dynamically and directly
|
|
299 by the interpreter at run-time.
|
|
300 Whereas the C math library was loaded implicitly via the
|
|
301 example package, it now has to be loaded explicitly.
|
|
302
|
|
303 R offers functions to deal with shared libraries at run-time,
|
|
304 but the location has to be specified as an absolute pathname which
|
|
305 is platform-specific.
|
|
306 For now, let us assume that the example is done on
|
|
307 Mac OS X where the C math library is located
|
|
308 at \file{/usr/lib/libm.dylib}. A platform-portable solution
|
|
309 is discussed in the next section on \emph{Portable loading of shared library}.
|
|
310
|
|
311 \begin{example}
|
|
312 > libm <- dyn.load("/usr/lib/libm.dylib")
|
|
313 > sqrtAddr <- libm$sqrt$address
|
|
314 \end{example}
|
|
315
|
|
316 We first need to load the R package \pkg{rdyncall}:
|
|
317
|
|
318 \begin{example}
|
|
319 > library(rdyncall)
|
|
320 \end{example}
|
|
321
|
|
322 Finally, we invoke the foreign C function \code{sqrt} \emph{directly} via
|
|
323 \code{.dyncall}:
|
|
324
|
|
325 \begin{example}
|
|
326 > .dyncall(sqrtAddr, "d)d", 144)
|
|
327 [1] 12
|
|
328 \end{example}
|
|
329
|
|
330 Let us review the last call, as it pinpoints the core solution for a direct
|
|
331 invocation of foreign code within R:
|
|
332 The first argument specifies the address of the foreign code, given as an
|
|
333 external pointer.
|
|
334 The second argument is a \emph{call signature}
|
|
335 that specifies the argument- and return types of the target C function.
|
|
336 This string \verb@"d)d"@ specifies that the foreign function
|
|
337 expects a \code{double} scalar argument and returns a \code{double} scalar value
|
|
338 in correspondence to the C declaration of \code{sqrt}.
|
|
339 Arguments following the call signature are passed to the
|
|
340 foreign function using the call signature for type-safe conversion to C types.
|
|
341 In this case we pass \code{144} as a C \code{double} argument type as first
|
|
342 argument and receive a C \code{double} value converted to an R \code{numeric}.
|
|
343
|
|
344 \subsection{Call Signatures}
|
|
345
|
|
346 The introduction of a type descriptor for foreign functions is a key
|
|
347 component that makes the FFI flexible and type-safe.
|
|
348 The format of the call signature has the following pattern:
|
|
349
|
|
350 \begin{center}
|
|
351 \emph{argument-types} \verb@')'@ \emph{return-type}
|
|
352 \end{center}
|
|
353
|
|
354 The signature can be derived from the C function declaration:
|
|
355 Argument types are specified first, in a left-to-right order, and are
|
|
356 terminated by the \verb@')'@ symbol followed by a single return type signature.
|
|
357
|
|
358 Almost all fundamental C types are supported and there is no real
|
|
359 restriction regarding the number of arguments supported to issue
|
|
360 a call.
|
|
361 Table \ref{tab:signature} gives an overview of supported C types and
|
|
362 the corresponding text encoding; Table \ref{tab:signature_examples}
|
|
363 provides some examples of C functions and call signatures.
|
|
364
|
|
365 \begin{table*}
|
|
366 \center
|
|
367 \begin{tabular}{l|l}
|
|
368 C function declaration & dyncall type signature \\
|
|
369 \hline
|
|
370 \verb@void rsort_with_index(double*,int*,int n)@ & \verb@*d*ii)v@ \\
|
|
371 \verb@SDL_Surface * SDL_SetVideoMode(int,int,int,Uint32_t)@ & \verb@iiiI)*<SDL_Surface>@ \\
|
|
372 \verb@void glClear(GLfloat,GLfloat,GLfloat,GLfloat)@ & \verb@ffff)v@ \\
|
|
373 \end{tabular}
|
|
374 \caption{\label{tab:signature_examples}
|
|
375 Some examples of C functions and corresponding signatures}
|
|
376 \end{table*}
|
|
377
|
|
378 Now, let us define a public and type-safe R wrapper function that
|
|
379 hides the details of the foreign function call by passing the formal
|
|
380 argument place holder "\code{...}" as third argument to \code{.dyncall}:
|
|
381
|
|
382 \begin{example}
|
|
383 > sqrtViaDynCall <- function(...)
|
|
384 + .dyncall(sqrtAddress, "d)d", ...)
|
|
385 \end{example}
|
|
386
|
|
387 Although there is no further guard code, this interface is type-safe and
|
|
388 the user can do no harm by inadvertently using a wrong set and/or type
|
|
389 of arguments due to the built-in type-checks.
|
|
390 Compared to the R wrapper code using \code{.C}, no explicit cast of the
|
|
391 arguments via \code{as.numeric} is required, because
|
|
392 automatic coercion rules for fundamental types are implemented as dictated
|
|
393 by the call signature. For example, \code{integer} R values are
|
|
394 implicitly casted to \code{double} automatically:
|
|
395
|
|
396 \begin{smallverbatim}
|
|
397 > sqrtViaDyncall(144L)
|
|
398 [1] 12
|
|
399 \end{smallverbatim}
|
|
400
|
|
401 A certain level of type-safety is achieved here as well:
|
|
402 All arguments to be passed to C are first checked against the call signature.
|
|
403 If any incompatibility is detected, such as a wrong number of arguments,
|
|
404 empty atomic vectors or incompatible type mappings, the invocation is aborted
|
|
405 and an error is reported before risking an application crash:
|
|
406
|
|
407 \begin{smallverbatim}
|
|
408 > sqrtViaDyncall(1,2)
|
|
409 Error in .dyncall(sqrtAddress, "d)d", ...) :
|
|
410 Too many arguments for signature 'd)d'.
|
|
411 > sqrtViaDyncall()
|
|
412 Error in .dyncall(sqrtAddress, "d)d", ...) :
|
|
413 Not enough arguments
|
|
414 for function-call signature 'd)d'.
|
|
415 > sqrtViaDyncall(NULL)
|
|
416 Error in .dyncall(sqrtAddress, "d)d", ...) :
|
|
417 Argument type mismatch at position 1:
|
|
418 expected double convertible value
|
|
419 > sqrtViaDyncall("144")
|
|
420 Error in .dyncall(sqrtAddress, "d)d", ...) :
|
|
421 Argument type mismatch at position 1:
|
|
422 expected double convertible value
|
|
423 \end{smallverbatim}
|
|
424
|
|
425 In contrast to the R FFI, where the argument conversion is
|
|
426 dictated solely by the R argument type at call-time in a one-way fashion,
|
|
427 the introduction of an additional specification with a call signature gives
|
|
428 several advantages.
|
|
429
|
|
430 \begin{itemize}
|
|
431 \item Almost all possible C functions can be invoked by a single interface;
|
|
432 no additional C wrapper is required.
|
|
433 \item The built-in type-safety checks of passed arguments enhance stability
|
|
434 and reduce assertion code in R wrappers significantly.
|
|
435 \item A single call signature can work across platforms,
|
|
436 given that the C function type remains constant across platforms.
|
|
437 \item Given that our FFI is implemented in multiple languages,
|
|
438 call signatures represent a portable type description for C libraries.
|
|
439 \end{itemize}
|
|
440
|
|
441 \section{Package Overview}
|
|
442
|
|
443 Besides dynamic calling of foreign code, the package provides essential
|
|
444 facilities for interoperability between the R and C programming languages.
|
|
445 A high-level overview of components that make up the
|
|
446 package is given in Figure \ref{fig:pkg_overview}.
|
|
447
|
|
448 \begin{figure}[h]
|
|
449 \centering
|
|
450 \includegraphics[scale=0.44]{img_overview.pdf}
|
|
451 \caption{\label{fig:pkg_overview}
|
|
452 Package Overview}
|
|
453 \end{figure}
|
|
454
|
|
455 We already described the \code{.dyncall} FFI. It follows a
|
|
456 brief description of portable loading of
|
|
457 shared libraries using \code{dynfind}, installation of wrappers via \code{dynbind},
|
|
458 handling of foreign data types via \code{new.struct} and wrapping of R functions as C callbacks via \code{new.callback}.
|
|
459 Finally the high-level \code{dynport} interface for accessing \emph{whole} C libraries is briefly discussed.
|
|
460 The technical details at low-level of some components are described briefly in the
|
|
461 section \emph{Architecture}.
|
|
462
|
|
463 \subsection{Portable loading of shared libraries}
|
|
464
|
|
465 The \emph{portable} loading of shared libraries across platforms is not
|
|
466 trivial because the file path is different in Operating-Systems (OS).
|
|
467 Referring back to the previous example, to load a particular library
|
|
468 in a portable fashion, one would have to check the platform to
|
|
469 locate the C library.\footnote{Possible C math library names are \file{libm.so}, \file{libm.so.6} and \file{MSVCRT.DLL}
|
|
470 in locations such as \file{/lib}, \file{/usr/lib}, \file{/lib64}, \file{/lib/sparcv9}, \file{/usr/lib64}, \file{C:\textbackslash WINDOWS\textbackslash SYSTEM32} etc..}
|
|
471
|
|
472 Although there is variation among the OSs, library file paths and
|
|
473 search patterns have common structures.
|
|
474 For example, among
|
|
475 all the different locations, prefixes and suffixes, there is a part within
|
|
476 a full library filename that can be taken as a \emph{short library name} or
|
|
477 label.
|
|
478
|
|
479 The function \code{dynfind} takes a list of short library names to
|
|
480 locate a library using common search heuristics.
|
|
481 For example, to load the Standard C Math library, one would either use
|
|
482 the Microsoft Visual C Run-Time library labeled \file{msvcrt} on Windows
|
|
483 or the C Math library labeled \file{m} or \file{m.so.6} otherwise.
|
|
484
|
|
485 \begin{example}
|
|
486 > mLib <- dynfind(c("msvcrt","m","m.so.6"))
|
|
487 \end{example}
|
|
488
|
|
489 \code{dynfind} also supports more exotic schemes, such as the Mac OS X Framework folders.
|
|
490 Depending on the library,
|
|
491 it is sometimes enough to have a single short filename - e.g. \code{"expat"} for
|
|
492 the \emph{Expat} library.
|
|
493
|
|
494 Internally, the dynamic linker interface of the OS is used via
|
|
495 \code{.dynload} and symbols get resolved via \code{.dynsym}:
|
|
496
|
|
497 \begin{example}
|
|
498 > sqrtAddr <- .dynsym(mLib, "sqrt")
|
|
499 \end{example}
|
|
500
|
|
501 Although R already contains support for loading shared libraries
|
|
502 and resolving of symbols, several issues have led to a reimplementation
|
|
503 of this part:
|
|
504
|
|
505 \begin{itemize}
|
|
506 \item System paths are not considered when loading libraries via
|
|
507 \code{dyn.load} of the package \pkg{base} but this is one part of the
|
|
508 search heuristics.
|
|
509 \item Automatic life-cycle management for loading and unloading of libraries
|
|
510 is a desired goal. Unloading of libraries should be done automatically
|
|
511 via finalizer code when no symbols are used anymore. External pointers
|
|
512 resolved via \code{.dynsym} hold a reference to the loaded library.
|
|
513 When all external pointers are garbage collected, the library handle is
|
|
514 not referenced anymore and the finalizer can unload the library.
|
|
515 \end{itemize}
|
|
516
|
|
517 \subsection{Wrapping C libraries}
|
|
518
|
|
519 Functional R interfaces to foreign code can be defined with small
|
|
520 R wrapper functions, which effectively delegates to \code{.dyncall}.
|
|
521 Each function interface is parameterized by a target address and
|
|
522 a matching call signature.
|
|
523
|
|
524 Since APIs often consist of hundreds of functions (see Table \ref{tab:libs}),
|
|
525 \code{dynbind} can create and install a batch of function wrappers for a library
|
|
526 with a single call by using a \emph{library signature} that
|
|
527 consists of concatenated function names and signatures separated by semicolons.
|
|
528
|
|
529 For example, to install wrappers to the C functions
|
|
530 \code{sqrt}, \code{sin} and \code{cos} from the math library, one
|
|
531 could use:
|
|
532
|
|
533 \begin{example}
|
|
534 > dynbind( c("msvcrt","m","m.so.6"),
|
|
535 + "sqrt(d)d;sin(d)d);cos(d)d;" )
|
|
536 \end{example}
|
|
537
|
|
538 The function call has the side-effect that three R wrapper functions are
|
|
539 created and stored in an environment which defaults to the global environment.
|
|
540 Let us review the \code{sin} wrapper (on the 64-bit Version of R running
|
|
541 on Mac OS X 10.6):
|
|
542 \begin{example}
|
|
543 > sin
|
|
544 function (...)
|
|
545 .dyncall.default(<pointer: 0x7fff81fd13f0>,
|
|
546 "d)d)", ...)
|
|
547 \end{example}
|
|
548
|
|
549 The wrapper directly uses the address of the resolved \code{sin} symbol.
|
|
550 In addition, the wrappers uses \code{.dyncall.default}, which is a
|
|
551 concrete selector of a particular calling convention, as outlined below.
|
|
552
|
|
553 \subsection{Calling Conventions}
|
|
554
|
|
555 Calling conventions specify how arguments and return values are passed
|
|
556 across sub-routines and functions at machine level. This information
|
|
557 is vital for interfacing with the binary interface of C libraries.
|
|
558 The package has support for multiple calling conventions.
|
|
559 Calling conventions are controlled by \code{.dyncall} via the named argument
|
|
560 \code{callmode} to specify a non-default calling convention.
|
|
561 Most current OSs and platforms only have support for a single \code{"default"} calling convention
|
|
562 at run-time.
|
|
563
|
|
564 An important exception is the Microsoft Windows platform
|
|
565 on the 32-bit \emph{i386} processor architecture:
|
|
566 While the default C calling convention on \emph{i386} is \code{"cdecl"} (which is the \code{"default"} on \emph{i386}),
|
|
567 system shared libraries from Microsoft such as \file{KERNEL32.DLL},
|
|
568 \file{USER32.DLL} and the OpenGL library \file{OPENGL32.DLL}
|
|
569 use the \code{"stdcall"} calling convention.
|
|
570 Only on this platform, the \code{callmode} argument has an effect and
|
|
571 selects the calling convention to be used when working on Microsoft Windows 32-Bit.
|
|
572 All other platforms currently ignore this argument.
|
|
573
|
|
574 \subsection{Handling of C Types in R}
|
|
575
|
|
576 C APIs often make use of high-level C \verb@struct@
|
|
577 and \verb@union@ types for exchanging information.
|
|
578 Thus, to make interoperability work at that level the handling of C
|
|
579 type information is addressed by the package.
|
|
580
|
|
581 Let us consider the following hypothetical example:
|
|
582 A user-interface library has a function to set the 2D coordinates
|
|
583 and dimension of a graphical output window. The coordinates are specified using a C
|
|
584 \code{struct Rect} data type and the C function receives a
|
|
585 pointer on that object:
|
|
586
|
|
587 \begin{smallverbatim}
|
|
588 void setWindowRect(struct Rect *pRect);
|
|
589 \end{smallverbatim}
|
|
590
|
|
591 The structure type is defined as follows:
|
|
592
|
|
593 \begin{smallverbatim}
|
|
594 struct Rect {
|
|
595 short x, y;
|
|
596 unsigned short w, h;
|
|
597 };
|
|
598 \end{smallverbatim}
|
|
599
|
|
600 Before we can issue a call, we have to allocate an object of that size and
|
|
601 initialize the fields with values encoded in C types, which are not
|
|
602 part of R data types.
|
|
603 The framework provides helper functions and objects to deal with C data types
|
|
604 in R. Type information objects can be created with a description of the
|
|
605 C aggregate structure.
|
|
606 First, we create a type information object in R for the \code{struct Rect}
|
|
607 C data type via \code{parseStructInfos} using a \emph{structure type signature}.
|
|
608
|
|
609 \begin{smallverbatim}
|
|
610 > parserStructInfos("Rect{ssSS}x y w h;")
|
|
611 \end{smallverbatim}
|
|
612
|
|
613 After registration, an R object named \code{Rect} is installed, which
|
|
614 contains C type information that corresponds to \code{struct Rect}.
|
|
615 The format of a \emph{structure type signature} has the following
|
|
616 pattern:
|
|
617
|
|
618 \begin{center}
|
|
619 \emph{Struct-name} \verb@'{'@ \emph{Field-types} \verb@'}'@ \emph{Field-names} \verb@';'@
|
|
620 \end{center}
|
|
621
|
|
622 \emph{Field-types} use the same type signature encoding as that of
|
|
623 \emph{call signatures} for argument and return types (Table \ref{tab:signature}).
|
|
624 \emph{Field-names} consist of a list of white-space separated names,
|
|
625 labeling each field component.
|
|
626
|
|
627 An instance of a C type can be allocated via \code{new.struct}:
|
|
628
|
|
629 \begin{smallverbatim}
|
|
630 > r <- new.struct(Rect)
|
|
631 \end{smallverbatim}
|
|
632
|
|
633 Finally, the extraction (\verb@'$'@, \verb@'['@) and
|
|
634 replacement(\verb@'$<-'@, \verb@'[<-'@) operators can be used to access
|
|
635 structure fields symbolically. During value transfer between R and C,
|
|
636 automatic conversion of values with respect to the underlying C field
|
|
637 type takes place.
|
|
638
|
|
639 \begin{smallverbatim}
|
|
640 > r$x <- -10 ; r$y <- -20 ; r$w <- 40 ; r$h <- 30
|
|
641 \end{smallverbatim}
|
|
642
|
|
643 In this example, R \code{numeric} values are converted on the fly to \code{signed}- and
|
|
644 \code{unsigned short} integers (usually 16-bit values). When the object gets printed on the prompt,
|
|
645 a detailed picture of the data object is given:
|
|
646
|
|
647 \begin{smallverbatim}
|
|
648 > r
|
|
649 struct Rect {
|
|
650 x: -10
|
|
651 y: -20
|
|
652 w: 40
|
|
653 h: 30
|
|
654 }
|
|
655 \end{smallverbatim}
|
|
656
|
|
657 At low-level, one can see that \code{r} is stored as an R \code{raw} vector object:
|
|
658
|
|
659 \begin{smallverbatim}
|
|
660 > r[]
|
|
661 [1] f6 ff ec ff 28 00 1e 00
|
|
662 attr(,"struct")
|
|
663 [1] "Rect"
|
|
664 \end{smallverbatim}
|
|
665
|
|
666 To follow the example, we issue a foreign function call to \code{setRect}
|
|
667 via \code{.dyncall} and pass in the \code{r} object,
|
|
668 assuming the library is loaded and the symbol is resolved and
|
|
669 stored in an external pointer object named \code{setWindowRectAddr}:
|
|
670
|
|
671 \begin{smallverbatim}
|
|
672 > .dyncall( setWindowRectAddr, "*<Rect>)v", r)
|
|
673 \end{smallverbatim}
|
|
674
|
|
675 We make use of a typed pointer expression \code{'*<Rect>'}
|
|
676 instead of the untyped pointer signature \code{'p'}, which would
|
|
677 also work but does not prevent users from passing other objects
|
|
678 that do not reference a \code{struct Rect} data object.
|
|
679 Typed pointer expressions increase type-safety and use the
|
|
680 pattern \verb@'*<@\emph{Type-Name}\verb@>'@.
|
|
681 The invocation will be rejected if the argument passed in is not
|
|
682 of C type \code{Rect}. As \code{r} is tagged with an attribute
|
|
683 \code{struct} that refers to \code{Rect}, the call will be issued.
|
|
684
|
|
685 Typed pointers can also occur as return types that - once the
|
|
686 type information is available - permit the manipulation of returned objects
|
|
687 in the same symbolic manner as above.
|
|
688
|
|
689 C \verb@union@ types are supported as well but use the \code{parseUnionInfos}
|
|
690 function instead for registration and a slightly different signature format:
|
|
691
|
|
692 \begin{center}
|
|
693 \emph{Union-name} \verb@'|'@ \emph{Field-types} \verb@'}'@ \emph{Field-names} \verb@';'@
|
|
694 \end{center}
|
|
695
|
|
696 The underlying low-level C type read- and write operations and conversions
|
|
697 from R data types are performed by the functions \code{.pack} and
|
|
698 \code{.unpack}. These can be used for various low-level operations as well,
|
|
699 such as dereferencing of pointers on pointers.
|
|
700
|
|
701 R objects such as external pointers and atomic raw, integer and numeric
|
|
702 vectors can be used as aggregate C types via the attribute \code{struct}.
|
|
703 To \emph{cast} a type in the style of C, one can use \code{as.struct}.
|
|
704
|
|
705 \subsection{Wrapping R functions as C callbacks}
|
|
706
|
|
707 Some C libraries, such as user-interface toolkits and I/O processing
|
|
708 frameworks, use \emph{callbacks} as part of their interface to enable
|
|
709 registration and activation of user-supplied event handlers.
|
|
710 A callback is a user-defined function that has a library-defined
|
|
711 function type. Call-backs are usually registered via a registration function
|
|
712 offered by the library interface and are activated later from within
|
|
713 a library run-time context.
|
|
714
|
|
715 \pkg{rdyncall} has support for wrapping ordinary R
|
|
716 functions as C callbacks via the function
|
|
717 \code{new.callback}. Callback wrappers are defined by a \emph{callback
|
|
718 signature} and the user-supplied R function to be wrapped. \emph{Callback signatures} look very
|
|
719 similar to \emph{call signatures} and should match the
|
|
720 functional type of the underlying C callback.
|
|
721 \code{new.callback} returns an external pointer that can
|
|
722 be used as a low-level function pointer for the registration as a C callback.
|
|
723 See Section \emph{Parsing XML using Expat} below for
|
|
724 applications of callback.
|
|
725
|
|
726 \subsection{Foreign Library Interface}
|
|
727
|
|
728 At the highest level, \pkg{rdyncall} provides the front-end function
|
|
729 \code{dynport} to dynamically setup an interface to a C Application
|
|
730 Programming Interface. This includes loading of the corresponding
|
|
731 shared C library and resolving of symbols. During the binding process,
|
|
732 a new R environment (this was a name space \citep{RNameSpace} till version 0.7.4) will be populated with thin R wrapper
|
|
733 objects that represent abstractions to C counter-parts such as
|
|
734 functions, pointer-to-functions, type-information objects for C struct and union
|
|
735 types and symbolic constant equivalents of C enums and macro defines.
|
|
736 The mechanism aims to work across platforms, given that the corresponding
|
|
737 shared libraries of a \emph{DynPort} have been installed in a
|
|
738 system standard location on the host.
|
|
739
|
|
740 An initial repository of \emph{DynPorts} is available in the package
|
|
741 that provides bindings for several popular C APIs, see Table \ref{tab:libs}
|
|
742 for examples of available bindings.
|
|
743
|
|
744 \section{Sample Applications}
|
|
745
|
|
746 We give two examples with different application contexts that demonstrate
|
|
747 the direct usage of C APIs from within R through the \pkg{rdyncall} package.
|
|
748 The R interface to C libraries looks very
|
|
749 similar to the actual C API. For details on the usage of a particular
|
|
750 C library, the programming manuals and documentation of the libraries
|
|
751 should be consulted.
|
|
752
|
|
753 Before loading R bindings via \code{dynport}, the shared library should
|
|
754 have been installed onto the system. Currently this is
|
|
755 to be done manually and the installation method depends on the target OS (See the manual
|
|
756 page about the 'rdyncall-demos' for details on this).
|
|
757 While \emph{OpenGL} is most often pre-installed on typical desktop-systems,
|
|
758 \emph{SDL} and \emph{Expat} sometimes have to be installed explicitly.
|
|
759
|
|
760 \subsection{OpenGL Programming in R}
|
|
761
|
|
762
|
|
763 In the first example, we make use of the Simple DirectMedia Layer library (SDL)
|
|
764 \citep{SDL} \citep{Pendleton:2003:GPS} \citep{www:sdl-alternative} and
|
|
765 the Open Graphics Library (OpenGL) \citep{Board05} to implement
|
|
766 a portable multimedia application skeleton in R.
|
|
767
|
|
768 We first need to load bindings to SDL and OpenGL via dynports:
|
|
769
|
|
770 \begin{example}
|
|
771 > dynport(SDL)
|
|
772 > dynport(GL)
|
|
773 \end{example}
|
|
774
|
|
775 Now we initialize the SDL library - in particular the video subsystem, and
|
|
776 open a window surface with a dimension of $640 x 480$ in 32-bit color
|
|
777 depths that has support for OpenGL rendering:
|
|
778
|
|
779 \begin{smallverbatim}
|
|
780 > SDL_Init(SDL_INIT_VIDEO)
|
|
781 > surface <- SDL_SetVideoMode(640,480,32,SDL_OPENGL)
|
|
782 \end{smallverbatim}
|
|
783
|
|
784 Next, we implement the application loop which updates the display repeatedly
|
|
785 and processes the event queue until a \emph{quit} request is
|
|
786 issued by the user via the window close button.
|
|
787
|
|
788 \begin{smallverbatim}
|
|
789 > mainloop <- function()
|
|
790 {
|
|
791 ev <- new.struct(SDL_Event)
|
|
792 quit <- FALSE
|
|
793 while(!quit) {
|
|
794 draw()
|
|
795 while(SDL_PollEvent(ev)) {
|
|
796 if (ev$type == SDL_QUIT) {
|
|
797 quit <- TRUE
|
|
798 }
|
|
799 }
|
|
800 }
|
|
801 }
|
|
802 \end{smallverbatim}
|
|
803
|
|
804 SDL event processing is implemented by collecting events that occur in a
|
|
805 queue.
|
|
806 Once per update frame, typical SDL applications poll the queue by
|
|
807 calling \code{SDL\_PollEvent} with a pointer to a user-allocated buffer
|
|
808 of C type \code{union SDL\_Event}.
|
|
809 Event records have a common type identifier which is set to \code{SDL\_QUIT}
|
|
810 when a quit event has occurred e.g. when users press a close button on a window.
|
|
811
|
|
812 Next, we implement our \code{draw} function making use of
|
|
813 the OpenGL 1.1 API. We clear the background with a blue color
|
|
814 and draw a light-green rectangle.
|
|
815
|
|
816 \begin{smallverbatim}
|
|
817 > draw <- function()
|
|
818 {
|
|
819 glClearColor(0,0,1,0)
|
|
820 glClear(GL_COLOR_BUFFER_BIT)
|
|
821 glColor3f(0.5,1,0.5)
|
|
822 glRectf(-0.5,-0.5,0.5,0.5)
|
|
823 SDL_GL_SwapBuffers()
|
|
824 }
|
|
825 \end{smallverbatim}
|
|
826
|
|
827 Now we can run the application mainloop.
|
|
828
|
|
829 \begin{smallverbatim}
|
|
830 > mainloop()
|
|
831 \end{smallverbatim}
|
|
832
|
|
833 To stop the application, we hit the close button of the window.
|
|
834 A similar example is also available via \code{demo(SDL)}. Here the \code{draw} function
|
|
835 displays a rotating 3D cube depict in Figure \ref{fig:demo_SDL}.
|
|
836
|
|
837 \begin{figure}
|
|
838 \centering
|
|
839 \includegraphics[scale=0.35]{img_SDL.png}
|
|
840 \caption{\label{fig:demo_SDL}
|
|
841 \code{demo(SDL)}}
|
|
842 \end{figure}
|
|
843
|
|
844 \code{demo(randomfield)} gives a slightly more scientific application of OpenGL and R:
|
|
845 Random fields of 512x512 size are generated via blending of 5000 texture mapped 2D gaussian kernels.
|
|
846 The \emph{frames per second} counter in the window title gives the number of matrices generated per second (see Figure \ref{fig:demo_randomfield}).
|
|
847 When clicking on the animation window, the current frame and matrix is passed to R and plotted.
|
|
848 While several dozens of matrices are computed per second using OpenGL,
|
|
849 it takes several seconds to plot a single matrix in R using \code{image()}.
|
|
850
|
|
851 \begin{figure}
|
|
852 \centering
|
|
853 \includegraphics[scale=0.35]{img_randomfield.png}
|
|
854 \caption{\label{fig:demo_randomfield}
|
|
855 \code{demo(randomfield)}}
|
|
856 \end{figure}
|
|
857
|
|
858 \subsection{Parsing XML using Expat}
|
|
859
|
|
860 In the second example, we use the Expat XML Parser library \citep{www:expat}
|
|
861 \citep{Kim:2001:TSJ} to implement a stream-oriented XML parser suitable
|
|
862 for very large documents.
|
|
863
|
|
864 The library, being very popular, is very likely to be
|
|
865 already installed on many OS distributions - otherwise it is
|
|
866 available from package repositories or can be built as a shared library
|
|
867 from source.
|
|
868
|
|
869 In Expat, custom XML parsers are implemented by defining
|
|
870 functions that are registered as callbacks to be invoked on
|
|
871 events that occur during parsing, such as the start and end of XML tags.
|
|
872 In our second example, we create a simple parser skeleton that
|
|
873 prints the start and end tag names.
|
|
874
|
|
875 First we load R bindings for Expat via \code{dynport}.
|
|
876
|
|
877 \begin{smallverbatim}
|
|
878 > dynport(expat)
|
|
879 \end{smallverbatim}
|
|
880
|
|
881 Next we create an abstract parser object via the C function
|
|
882 \code{XML\_ParserCreate} that receives one argument of type C string
|
|
883 to specify a desired character encoding that overrides the document
|
|
884 encoding declaration. We want to pass a null pointer (\code{NULL}) here.
|
|
885 In the \code{.dyncall} FFI C null pointer values for pointer types are
|
|
886 expressed via the R \code{NULL} value:
|
|
887
|
|
888 \begin{smallverbatim}
|
|
889 > p <- XML_ParserCreate(NULL)
|
|
890 \end{smallverbatim}
|
|
891
|
|
892 The C interface for registration of start and end-tag event handler
|
|
893 callbacks is given below:
|
|
894
|
|
895 \begin{smallverbatim}
|
|
896 /* Language C, from file expat.h: */
|
|
897 typedef void (*XML_StartElementHandler)
|
|
898 (void *userData, const XML_Char *name,
|
|
899 const XML_Char **atts);
|
|
900 typedef void (*XML_EndElementHandler)
|
|
901 (void *userData, const XML_Char *name);
|
|
902 void XML_SetElementHandler(XML_Parser parser,
|
|
903 XML_StartElementHandler start,
|
|
904 XML_EndElementHandler end);
|
|
905 \end{smallverbatim}
|
|
906
|
|
907 We implement the callbacks as R functions which print the event and
|
|
908 tag name. They are wrapped as C callback pointers via \code{new.callback}
|
|
909 using a matching \emph{callback signature}.
|
|
910 The second argument \code{name} of type C string in both callbacks, \code{XML\_StartElementHandler} and \code{XML\_EndElementHandler},
|
|
911 is of primnary interest ; this argument passes over the XML tag name.
|
|
912 C strings are handled in a special way by the \code{.dyncall} FFI, because they
|
|
913 have to be copied as R \code{character} objects.
|
|
914 The special type signature \code{'Z'} is used to denote a
|
|
915 C string type.
|
|
916 The other arguments are simply denoted as untyped pointers using \code{'p'}:
|
|
917
|
|
918 \begin{smallverbatim}
|
|
919 > start <- new.callback("pZp)v",
|
|
920 function(ignored1,tag,ignored2)
|
|
921 cat("Start tag:", tag, "\n")
|
|
922 )
|
|
923 > end <- new.callback("pZ)v",
|
|
924 function(ignored,tag)
|
|
925 cat("Stop tag:", tag, "\n")
|
|
926 )
|
|
927 > XML_SetElementHandler(p, start, end)
|
|
928 \end{smallverbatim}
|
|
929
|
|
930 To test the parser, we create a sample document stored in a \code{character}
|
|
931 object named \code{text} and pass it to the parse function \code{XML\_Parse}:
|
|
932
|
|
933 \begin{smallverbatim}
|
|
934 > text <- "<hello> <world> </world> </hello>"
|
|
935 > XML_Parse( p, text, nchar(text), 1)
|
|
936 \end{smallverbatim}
|
|
937
|
|
938 The resulting output is given below:
|
|
939
|
|
940 \begin{smallverbatim}
|
|
941 Start tag: hello
|
|
942 Start tag: world
|
|
943 End tag: world
|
|
944 End tag: hello
|
|
945 \end{smallverbatim}
|
|
946
|
|
947 Expat supports processing of very large XML documents in a chunk-based manner by
|
|
948 calling \code{XML\_Parse} several times, where the last argument is used
|
|
949 as indicator for the final chunk of the document.
|
|
950
|
|
951 \section{Architecture}
|
|
952
|
|
953 The core implementation of the FFI, callbacks and loading of
|
|
954 code are mainly based on the suite of libraries of the \emph{DynCall}
|
|
955 project \citep{dyncall}.
|
|
956
|
|
957 \subsection{Dynamic calls}
|
|
958
|
|
959 The FFI offered by \pkg{rdyncall} is based on the \pkg{dyncall}
|
|
960 library, which provides an abstraction for making arbitrary
|
|
961 machine-level calls with support for multiple calling conventions
|
|
962 and most C argument- and return-types. \footnote{\emph{Inline} structure types are currently not fully supported.}
|
|
963
|
|
964 For each processor architecture, the supported calling conventions
|
|
965 are abstracted in a \emph{Call Virtual Machine} (CallVM)
|
|
966 object. The \pkg{dyncall} library offers a universal C interface that can
|
|
967 be used from within scripting language interpreter contexts to build
|
|
968 up a machine-level call in a structured manner.
|
|
969
|
|
970 A CallVM comprises a state machine and a call kernel. The state machine
|
|
971 is implemented in C and keeps track of internal buffers for pre-loading argument
|
|
972 values that get arranged for specific storage locations, such as stack or
|
|
973 special register sets according to the processor architecture and the chosen
|
|
974 calling conventions.
|
|
975 The actual invocation of a foreign function call is conducted by
|
|
976 the Call Kernel - a small piece of code that is implemented in
|
|
977 Assembly and that provides a generic call facility for a particular
|
|
978 calling convention.
|
|
979 It prepares machine-level calls by copying data to registers and to the
|
|
980 call stack according to the relevant calling convention, and finally
|
|
981 executes the machine call to a target address.
|
|
982
|
|
983 From a scripting language interpreter perspective, the invocation of a
|
|
984 foreign function call through the CallVM is conducted in three consecutive
|
|
985 phases using the \pkg{dyncall} C API:
|
|
986
|
|
987 \begin{enumerate}
|
|
988 \item \emph{Setup Phase:} The desired calling convention has to be
|
|
989 chosen which, in most cases, is just the \emph{default C} calling convention.
|
|
990 However, more specialized and platform-specific calling conventions are
|
|
991 available as well, in particular for the 32-Bit Windows OS.
|
|
992 \item \emph{Argument Loading Phase:} Arguments are passed in a
|
|
993 \emph{left-to-right} order according to the declaration of the C/C++
|
|
994 function/method type declaration. Argument values are stored in buffers
|
|
995 according to the processor architecture and selected calling convention.
|
|
996 \item \emph{Call and Return-Value Receive Phase:}
|
|
997 A return-type specific call function is chosen and the target address
|
|
998 of the foreign code is passed, which gets called via the Call Kernel.
|
|
999 \end{enumerate}
|
|
1000
|
|
1001 The architecture makes it straight-forward to implement a FFI
|
|
1002 for a dynamic language interpreter using a text parser for call signatures
|
|
1003 to drive the conversion of arguments and results.
|
|
1004 Similar FFIs with a text-based interface have been implemented for other language
|
|
1005 interpreters such as Ruby, Python and Lua. See the DynCall source repository \citep{dyncall}.
|
|
1006
|
|
1007 Both the C interface of dyncall and the signature format use the abstract
|
|
1008 C/C++ type system and give no indication about the effective size of
|
|
1009 a particular type. In experiments with several C APIs bound via \pkg{rdyncall}
|
|
1010 it turns out that the signatures do work cross-platform,
|
|
1011 if the fundamental type definitions of the C API do not change across platforms.
|
|
1012 In our tests and the presented examples, a wide range of
|
|
1013 C APIs have this property and type signatures are valid across
|
|
1014 platforms even when switching between 32- and 64-bit platforms.
|
|
1015
|
|
1016 \subsection{Dynamic callbacks}
|
|
1017
|
|
1018 The \pkg{dyncallback} library provides a framework to implement
|
|
1019 dynamic callbacks for language interpreters to wrap scripting functions
|
|
1020 as C function pointers.
|
|
1021 The framework offers a universal C interface for callback handler that
|
|
1022 is implemented once for a particular interpreter.
|
|
1023 The handler receives callback calls from C and forwards the call,
|
|
1024 including conversion of arguments, to a scripting function.
|
|
1025
|
|
1026 Handlers need to access machine-level arguments whose location
|
|
1027 can be on the stack, or in registers,
|
|
1028 depending on the processor architecture and calling convention.
|
|
1029 For that reason, the handler interface receives an abstract argument
|
|
1030 iterator that gives structured access to the arguments for
|
|
1031 passing over to the high-level language.
|
|
1032 Call-backs are created via an interface that pools a handler,
|
|
1033 language context, scripting function reference,
|
|
1034 callback type-information and other user data into a
|
|
1035 \emph{single} native C function pointer, such that even very
|
|
1036 low-level C callbacks without user-supplied user-data can be
|
|
1037 addressed with the underlying technique. \footnote{This includes
|
|
1038 callbacks for sort routines of the Standard C library which lack user-data.}
|
|
1039
|
|
1040 \subsection{Portability and Stability}
|
|
1041
|
|
1042 The requirements for porting the \emph{DynCall} libraries to
|
|
1043 a new processor and/or platform are high: The calling conventions of a target processor platform have to be studied in detail,
|
|
1044 state machines have to be implemented in C and a small amount of code has to be written in
|
|
1045 Assembly which can be even non-portable across build tools on the same platform.
|
|
1046 Nevertheless \pkg{dyncall} (as of version 0.7) has support for many processor architectures such as
|
|
1047 Intel i386 (x86), AMD 64 (x64), PowerPC 32-bit,ARM (including Thumb extension), MIPS 32/64-bit and SPARC 32/64-bit
|
|
1048 including support for several platform-, processor- and compiler-specific calling conventions.
|
|
1049 \pkg{dyncallback} also supports major processor architectures such as Intel i386 (x86), AMD 64 (x64) and ARM and offers
|
|
1050 partial support for PowerPC 32-bit (support for Mac OS X/Darwin).
|
|
1051 Besides the processor architecture, the libraries are also explicitly ported and tested on
|
|
1052 various OS such as Linux, Mac OS X, Windows, the BSD family, Solaris, Haiku, Minix and Plan9.
|
|
1053 Support for embedded platforms such as Playstation Portable, Nintendo DS and iPhone OS is available as well.
|
|
1054
|
|
1055 \emph{DynCall} contains a suite of testing tools for quality assurance. Included are test-case generators written in
|
|
1056 Lua and Python. Extreme call and callback scenarios are tested here to ensure correct passing of arguments and results.
|
|
1057 Before a release, the libraries and tests are built for a large set of architectures on
|
|
1058 \pkg{DynOS} \citep{dynos} - a batch-build system using full system emulators such as
|
|
1059 \pkg{QEmu}\citep{qemu} and \pkg{GXEmul}\citep{gxemul} and various operating-system images
|
|
1060 to test release candidates and create pre-built binary releases of the library.
|
|
1061
|
|
1062 \subsection{Text-based Signature Interfaces}
|
|
1063
|
|
1064 A common property of the service interface presented here is the use of
|
|
1065 signature text formats. Signatures are used
|
|
1066 as descriptors for types, such as foreign function calls, callbacks and
|
|
1067 aggregate data types.
|
|
1068 The reasons that lead to the use of signatures as a high-level user-interface
|
|
1069 to interact with such services are given next:
|
|
1070
|
|
1071 \begin{enumerate}
|
|
1072 \item Cross-language interface: Text format interfaces are available across
|
|
1073 high-level languages. Examples for cross-language text-based
|
|
1074 interfaces include regular expressions or \code{printf}-style formatted output
|
|
1075 descriptions.
|
|
1076
|
|
1077 \item Developer-friendly:
|
|
1078 The simplicity and compactness of the text-format enables developers
|
|
1079 to bridge with foreign code in interactive and rapid development
|
|
1080 sessions.
|
|
1081 C type signatures can be derived by hand with minimum effort:
|
|
1082 Fundamental types are encoded with a single character and the
|
|
1083 upper-case encodes an \code{unsigned} type.
|
|
1084
|
|
1085 \item Machine-neutral:
|
|
1086 In contrast to binary encoded type libraries, the data format is not affected
|
|
1087 by the endian model of the underlying platform.
|
|
1088
|
|
1089 \item Parser-friendly:
|
|
1090 The signature format can be used as driver code to perform foreign function
|
|
1091 calls. Implementations of parsers match the sequential
|
|
1092 design of \pkg{dyncall}'s CallVM and \pkg{dyncallback}'s argument iterator interface.
|
|
1093 \end{enumerate}
|
|
1094
|
|
1095 \subsection{Creation of DynPort files}
|
|
1096
|
|
1097 In this section we describe the tool-chain that creates the
|
|
1098 universal bindings called \emph{DynPort}. The process described
|
|
1099 here is applied once on a build machine, the generated output
|
|
1100 is used later at run-time across platforms to drive the
|
|
1101 dynamic linkage and binding procedure.
|
|
1102 \emph{DynPort} files can be created automatically from
|
|
1103 C header files using a tool-chain as depicted in
|
|
1104 Figure \ref{fig:gen_dynport}.
|
|
1105
|
|
1106 \begin{figure}
|
|
1107 \centering
|
|
1108 \includegraphics[scale=0.45]{img_gen_dynport.pdf}
|
|
1109 \caption{\label{fig:gen_dynport}
|
|
1110 Tool-chain to create \emph{DynPort} files from C headers}
|
|
1111 \end{figure}
|
|
1112
|
|
1113 The tool-chain comprises several freely available components that
|
|
1114 are briefly described next:
|
|
1115 \pkg{GCC-XML} \citep{gccxml} is a modified version of the GCC compiler
|
|
1116 which translates C sources to XML document.
|
|
1117 \pkg{xsltproc}, distributed as part of the \pkg{libxslt} library
|
|
1118 \citep{libxslt}, is a XSLT processor that transforms XML documents to
|
|
1119 XML, text or binary formats according to style-sheets written in
|
|
1120 the \emph{XSL Transformations} \citep{Clark:01:XTV} language.
|
|
1121
|
|
1122 To extract library binding specifications, a main C source file is created that
|
|
1123 consists of one or more \code{\#include} statements that
|
|
1124 reference library and/or system header files to process.
|
|
1125 The header files should have been previously installed on
|
|
1126 the build machine.
|
|
1127 In a preprocessing phase, the GNU C Macro Processor is used to process
|
|
1128 all \code{\#include} statements using standard system search paths
|
|
1129 to create a concatenated \emph{All-In-One} source file free of any
|
|
1130 \code{\#include} statements.
|
|
1131 GCC-XML transforms C header declarations to XML.
|
|
1132 A XSL style-sheet implements the transformation of XML to
|
|
1133 type signature formats using a XSLT processor.
|
|
1134 C Macro \code{\#define} statements are handled separately by a custom
|
|
1135 C Preprocessor implemented in C++ using the boost wave library \citep{boostwave}.
|
|
1136 An optional filter stage is used to include only elements with
|
|
1137 a certain pattern such as a common prefix usually found in many
|
|
1138 libraries e.g. '\code{SDL\_}'.
|
|
1139 In a last step, the various fragments are assembled into a single
|
|
1140 text-file which represents the \emph{DynPort} file.
|
|
1141 The overall build process is managed by \emph{make} files and a repository of recipes
|
|
1142 has been setup to extend support for additional
|
|
1143 dynports and libraries in a structured and coordinated way.
|
|
1144
|
|
1145
|
|
1146 \section{Summary and Outlook}
|
|
1147
|
|
1148 This paper introduces the \pkg{rdyncall} package (Version 0.7.3 on CRAN as of this writing) that contributes an improved Foreign Function Interface for R.
|
|
1149 The FFI facilitates \emph{direct} invocation of foreign functions \emph{without} the need to compile additional wrapper in C.
|
|
1150 Based on the FFI, a dynamic cross-platform linkage framework to wrap and access \emph{whole} C interfaces of native libraries from R
|
|
1151 is discussed.
|
|
1152 Instead of \emph{compiling} bindings for every library-and-language combination,
|
|
1153 R bindings of a library are created dynamically at run-time in a data-driven manner via
|
|
1154 \emph{DynPort} files - a cross-platform universal type information format.
|
|
1155 C libraries are made accessible in R as though they were extension packages and
|
|
1156 the R interface looks very similar to that of C.
|
|
1157 This enables system-level programming in R and brings a new wave of possibilities for R developers
|
|
1158 such as using OpenGL directly in R across platforms as described in the example.
|
|
1159 An initial repository of \emph{DynPort}s for standard cross-platform portable
|
|
1160 C libraries comes with the package.
|
|
1161
|
|
1162 The implementation is based on libraries from the \emph{DynCall} project that implement non-trivial
|
|
1163 facilities such as an abstraction to machine-level function calls supporting
|
|
1164 multiple calling conventions and the handling of C callbacks from within scripting language interpreter environments.
|
|
1165 The libraries have been ported across major R platforms.
|
|
1166 Work is in progress to support missing architectures in \pkg{dyncallback} such as PowerPC System V 32-bit, PowerPC 64-bit, and, 32/64-bit MIPS and SPARC architectures.
|
|
1167 The handling of foreign aggregate data types, which is currently implemented in R and C,
|
|
1168 is planned to be reimplemented in portable C as part of \emph{DynCall}, in cooperation with the developers of \emph{BridJ}\citep{bridj}.
|
|
1169 Currently, \emph{DynPort} files are written as R scripts with
|
|
1170 inline text chunks created from the \emph{DynPort} tool chain.
|
|
1171 For the Lua Programming Language \citep{SPE::IerusalimschyFF1996}, a similar framework named \pkg{luadyncall} is in
|
|
1172 development using a language-neutral format for \emph{DynPort} files.
|
|
1173 The need to install additional shared libraries still represents a hurdle for ordinary R users.
|
|
1174 We plan to find a common abstraction layer for installation systems, package managers and software distribution services
|
|
1175 across OS-distributions, and to integrate meta installation information into the \emph{DynPort} file format.
|
|
1176
|
|
1177 The \emph{DynPort} facility in \pkg{rdyncall} consitutes an initial step in building up an infrastructure between
|
|
1178 scripting languages and C libraries.
|
|
1179 Analogous to the way in which R users enjoy quick access to the large pool of R software
|
|
1180 managed by CRAN, we envision an archive network in which C library developers can distribute
|
|
1181 their work across languages, and users get quick access to the pool of C libraries from within
|
|
1182 scripting languages via automatic installation of precompiled components and using
|
|
1183 universal type information for cross-platform and cross-language dynamic bindings.
|
|
1184
|
|
1185 \bibliography{FLI}
|
|
1186
|
|
1187 \end{document}
|