Skip to content

Implementation of Haskell Binding Modules

William Rusnack edited this page Dec 2, 2022 · 28 revisions

A discussion of binding modules, the principles behind the tool, and a discussion of related work can be found in a research paper located at https://citeseerx.ist.psu.edu/doc_view/pid/ac0483159935aacf838155a076100e120ee45b78. All features described in the paper, except enum define hooks are implemented in the tool, but since the publication of the paper, the tool has been extended further. The library interface essentially consists of the new Haskell FFI Marshalling Library. More details about this library are provided in the next section.

The remainder of this section describes the hooks that are available in binding modules.

Import Hooks

{#import [qualified] modid#}

This is translated into the same syntactic form in Haskell, which implies that it may be followed by an explicit import list. Moreover, it implies that the module modid is also generated by c2hs and instructs the tool to read the file modid.chi.

If an explicit output file name is given (--output option), this name determines the basename for the .chi file of the currently translated module.

Currently, only pointer and enumeration hooks generate information that is stored in a .chi file and needs to be incorporated into any client module that makes use of these types. It is, however, regarded as good style to use import hooks for any module generated by c2hs.

Restriction

c2hs does not use qualified names. This can be a problem, for example, if two pointer hooks are defined to have the same unqualified Haskell name in two different modules, which are then imported by a third module. To partially work around this problem, it is guaranteed that the declaration of the textually later import hook dominates.

Context Hooks

{#context [lib = libname] [prefix = pfx] [add prefix = addpfx]#}

Context hooks define a set of global configuration options. Currently, there are three parameters which are all strings

  • libname is a dynamic library that contains symbols needed by the present binding.

  • pfx is an identifier prefix that may be omitted in the lexemes of identifiers referring to C definitions in any binding hook. The is useful as C libraries often use a prefix, such as gtk_, as a form of poor man's name spaces. Any occurrence of underline characters between a prefix and the main part of an identifier must also be dropped. Case is not relevant in a prefix. In case of a conflict of the abbreviation with an explicitly defined identifier, the explicit definition takes preference.

  • addpfx is an identifier prefix that is applied to all generated Haskell identifiers for the purposes of disambiguation. An existing C prefix can be replaced with a different prefix by using both the prefix = and the add prefix = options.

All three parameters are optional. An example of a context hook is the following:

{#context prefix = "gtk" add prefix = "CGtk"#}

If a binding module contains a context binding hook, it must be the first hook in the module.

"Non-GNU" Hooks

{#nonGNU#}

A non-GNU hook forces certain GNU-specific preprocessor symbols (__GNUC__, __GNUC_MINOR__ and __GNUC_PATCHLEVEL__) to be undefined during the processing of C header files included in a CHS file.

Type Hooks

{#type ident#}

A type hooks maps a C type to a Haskell type. As an example, consider

type GInt = {#type gint#}

The type must be a defined type, primitive types, such as int, are not admissible.

Sizeof Hooks

{#sizeof ident#}

A sizeof hooks maps a C type to its size in bytes. As an example, consider

gIntSize :: Int
gIntSize  = {#sizeof gint#}

The type must be a defined type, primitive types, such as int, are not admissible. The size of primitive types can always be obtained using Foreign.Storable.sizeOf.

Alignof Hooks

{#alignof ident#}

An alignof hooks maps a C type to its alignment constraint in bytes, i.e. it returns the smallest n such that values of type ident are stored at addresses divisible by n. As an example, consider

gIntAlign :: Int
gIntAlign  = {#alignof gint#}

The type must be a defined type, primitive types, such as int, are not admissible. The size of primitive types can always be obtained using Foreign.Storable.alignment.

Enumeration Hooks

{#enum cid [as hsid] [nocode] {alias1 , ... , aliasn}
  [omit (ident1 , ... , identn)]
  [with prefix = pfx] [add prefix = addpfx]
  [deriving (clid1 , ... , clidn)]#}

Rewrite the C enumeration called cid into a Haskell data type declaration, which is made an instance of Enum such that the ordinals match those of the enumeration values in C. This takes explicit enumeration values in the C definitions into account. Enumeration values that alias other values within the same enumeration are elided, to avoid overlapping clauses in the fromEnum definition in the generated Haskell code.

Anonymous C enumerations can be handled by allowing the identifier cid to take one of the C enumeration value names instead of an enumeration type name. If hsid is given, this is the name of the Haskell data type. The identifiers clid1 to clidn are added to the deriving clause of the Haskell type. If the nocode keyword is present, the data declaration for the Haskell enumerated type is omitted and just an Enum instance declaration is produced. The Haskell identifier hsid can be a single-quote delimited piece of Haskell code to allow for the presence of associated types, e.g. 'OuterType EnumType'.

By default, the names of the C enumeration are used for the constructors in Haskell. If alias1 is underscoreToCase, the original C names are capitalised and the use of underscores is rewritten to caps. If it is upcaseFirstLetter or downcaseFirstLetter, the first letter of the original C name changes case correspondingly. It is also possible to combine underscoreToCase with one of upcaseFirstLetter or downcaseFirstLetter. Moreover, alias1 to aliasn may be aliases of the form cid as hsid, which map individual C names to Haskell names. Instead of the global prefixes introduced by a context hook, a local prefix to be removed pfx and a local replacement prefix to be added addpfx can optionally be specified. If any C enumeration values should be omitted when generating the Haskell data type, they can be listed in an omit clause (this is useful for the common case where a sentinel value counting the total number of enumeration values is included as the last value in a C enumeration).

As an example, consider

{#enum WindowType {underscoreToCase} deriving (Eq)#}

enum define hooks

Many C libraries do not use enum types, but macro definitions to implement constants. c2hs provides enum define hooks to generate a haskell datatype from a collection of macro definitions.

{#enum define hsid {alias1 , ... , aliasn} [deriving (clid1 , ... , clidn)]#}

Create a haskell datatype hsid, with nullary constructors as given by the aliases alias1 through aliasn. Each alias has to be of the form macrodef as hsid, where hsid is the name of the nullary haskell constructor, and macrodef the C macro which the haskell constructor should map to. The deriving part is handled as in ordinary enum hooks.

Here's an example

#define X 0
#define Y 1
{#enum define Axis {X as Axis0, Y as Axis1} deriving (Eq,Ord) #}

Const Hooks

{#const cid#}

A const hook is a convenient way to access the value of a C #defined constant without needing to define a Haskell enumeration data type. The hook {#const MYCONSTANT#} inserts the value of the C manifest constant MYCONSTANT inline into the generated Haskell code.

Call Hooks

{#call [pure] [unsafe] cid [as (hsid | ^)]#}

A call hook rewrites to a call to the C function cid and also ensures that the appropriate foreign import declaration is generated. The tags pure and unsafe specify that the external function is purely functional and cannot re-enter the Haskell runtime, respectively. If hsid is present, it is used as the identifier for the foreign declaration, which otherwise defaults to the cid. When instead of hsid, the symbol ^ is given, the cid after conversion from C's underscore notation to a capitalised identifier is used.

As an example, consider

sin :: Float -> Float
sin  = {#call pure sin as "_sin"#}

Function Hooks

{#fun [pure] [unsafe] (cid | variadic cid[ctype1, ..., ctypem] ) [as (hsid | ^)]
[ctxt =>] { parm1 , ... , parmn } -> parm #}

Function hooks are call hooks including parameter marshalling. Thus, for non-variadic functions, the components of a function hook up to and including the as alias are the same as for call hooks. However, an as alias has a different meaning; it specifies the name of the generated Haskell function. The remaining components use literals enclosed in backwards and foward single quotes to denote Haskell code fragments (or more precisely, parts of the Haskell type signature for the bound function). The first one is the phrase ctxt preceding =>, which denotes the type context. This is followed by zero or more type and marshalling specifications parm1 to parmn for the function arguments and one parm for the function result. Each such specification parm has the form

[inmarsh [* | -]] [%]hsty[&] [outmarsh [*] [-]]

where hsty is a Haskell code fragment denoting a Haskell type. The optional information to the left and right of this type determines the marshalling of the corresponding Haskell value to and from C; they are called the in and out marshaller, respectively.

Each marshalling specification parm corresponds to one or two arguments of the C function, in the order in which they are given. A marshalling specification in which the symbol & follows the Haskell type corresponds to two C function arguments; otherwise, it corresponds only to one argument. The parm following the left arrow -> determines the marshalling of the result of the C function and may not contain the symbol &.

The *- output marshal specification is for monadic actions that must be executed but whose results are discarded. This is very useful for e.g. checking an error value and throwing an exception if needed. The optional % possibly preceding the argument type indicates that the argument, which must be a C structure type, is passed bare, i.e. not as a pointer. This is not possible directly within the Haskell FFI, meaning that an extra wrapper function layer must be generated. These wrapper functions are written to a Modul.chs.c file and must be compiled and linked seperately in order for this wrapping to work.

Both inmarsh and outmarsh are identifiers of Haskell marshalling functions. By default they are assumed to be pure functions; if they have to be executed in the IO monad, the function name needs to be followed by a star symbol *. Alternatively, the identifier may be followed by a minux sign -, in which case the Haskell type does not appear as an argument (in marshaller) or result (out marshaller) of the generated Haskell function. In other words, the argument types of the Haskell function is determined by the set of all marshalling specifications where the in marshaller is not followed by a minus sign. Conversely, the result tuple of the Haskell function is determined by the set of all marshalling specifications where the out marshaller is not followed by a minus sign. The order of function arguments and components in the result tuple is the same as the order in which the marshalling specifications are given, with the exception that the value of the result marshaller is always the first component in the result tuple if it is included at all.

For a set of commonly occuring Haskell and C type combinations, default marshallers are provided by c2hs if no explicit marshaller is given. The out marshaller for function arguments is by default void-. The defaults for the in marshallers for function arguments are as follows:

  • Bool and integral C type (including chars): fromBool

  • Integral Haskell and integral C type: fromIntegral

  • Floating Haskell and floating C type: realToFrac

  • String and char*: withCString*

  • String and char* with explicit length: withCStringLen*

  • Char and C char: castCharToCChar

  • Char and C unsigned char: castCharToCUChar

  • T and T*: with*

  • T and T* where T is an integral type: (with . fromIntegral)*

  • T and T* where T is a floating type: (with . realToFrac)*

  • Bool and T* where T is an integral type: (with . fromBool)*

  • enumerated types define with enum hooks: fromIntegral . fromEnum

  • naked and newtype pointers defined with pointer hooks: id

  • foreign pointers defined with pointer hooks: withForeignPtr*

  • foreign newtype pointers defined with pointer hooks: withPointerType*, where PointerType is the Haskell type name defined by the pointer hook.

The defaults for the out marshaller of the result are the converse of the above; i.e., instead of the with functions, the corresponding peek functions are used. Moreover, when the Haskell type is (), the default marshaller is void-. (For foreign pointer hooks, the default out marshaller is newForeignPtr_ and for foreign newtype pointers, it is newForeignPtr_ >=> (return . PointerType).)

As a special case, it is possible to replace one of the input parameter specifications with a number of notations involving a single + sign. In this case, the output parameter of the function should be of a pointer type defined using a pointer hook. In the code generated for the function hook, space for the output object is allocated using the mallocForeignPtrBytes function, the corresponding pointer is passed to the C function as a parameter, and the wrapped foreign pointer is returned from the resulting Haskell function. This exception is provided to support the common use case where a C function is used to fill in values in an allocated structure and the allocated structure is returned to calling code. Without this facility, a seperate allocation and marshalling step must be written manually by the user. There are three variants: a bare + sign uses C2HS's native structure size determination to work out how much space to allocate; the notation +S assumes that the type for which space is to be allocated has a Storable instance, and the Storable sizeOf method is used to determine how much space to allocate; finally, the notation +<integer> (e.g. +16) allows one to provide the number of bytes to be allocated as an explicit count.

As an example, consider

{#fun notebook_query_tab_label_packing as ^
  `(NotebookClass nb, WidgetClass cld)' =>
  {notebook `nb'                ,
   widget   `cld'               ,
   alloca-  `Bool'     peekBool*,
   alloca-  `Bool'     peekBool*,
   alloca-  `PackType' peekEnum*} -> `()'#}

which results in the Haskell type signature

notebookQueryTabLabelPacking :: (NotebookClass nb, WidgetClass cld)
			     => nb -> cld -> IO (Bool, Bool, PackType)

which binds the following C function:

void gtk_notebook_query_tab_label_packing (GtkNotebook *notebook,
					   GtkWidget   *child,
					   gboolean    *expand,
					   gboolean    *fill,
					   GtkPackType *pack_type);

The variadic keyword is used to specify a binding to a variadic C function, i.e. one that uses a ... parameter and the va_args mechanism to process variant parameter types. A seperate function hook (and hence a seperate Haskell name) is required for each calling sequence used with the C function. The C types used for the variant parameters must be listed in square brackets following the name of the C function (note that only the variant parameter types need to be given here). For example:

{#fun variadic printf[const char *] as prints {`String', `String'} -> `()'#}

defines a Haskell function prints :: String -> String -> IO () that can be called as prints "Test: %s\n" "some text".

Get Hooks

{#get apath#}

A get hook supports accessing a member value of a C structure. The hook itself yields a function that, when given the address of a structure of the right type, performs the structure access. The member that is to be extracted is specified by the access path apath. Access paths are formed as follows (following a subset of the C expression syntax):

  • The root of any access path is a simple identifier, which denotes either a type name or struct tag. In order to disambiguate between type names and struct tags, the keyword struct may be inserted before the identifier: with the struct keyword, the tag namespace is searched before the type name namespace; without struct, the type name namespace is searched first.

  • An access path of the form *apath denotes dereferencing of the pointer yielded by accessing the access path apath.

  • An access path of the form apath.cid specifies that the value of the struct member called cid should be accessed.

  • Finally, an access path of the form apath->cid, as in C, specifies a combination of dereferencing and member selection.

For example, we may have

visualGetType              :: Visual -> IO VisualType
visualGetType (Visual vis)  = liftM cToEnum $ {#get Visual->type#} vis

Set Hooks

{#set apath#}

Set hooks are formed in the same way as get hooks, but yield a function that assigns a value to a member of a C structure. These functions expect a pointer to the structure as the first and the value to be assigned as the second argument. For example, we may have

{#set sockaddr_in.sin_family#} addr_in (cFromEnum AF_NET)

Offsetof Hooks

{#offsetof apath#}

An offsetof hook calculates the byte offset of a field in a structure accessed by the given access path. For example,

{#offsetof struct_t->somefield#}

calculates the buye offset of field somefield within the structure identified by the name struct_t.

Pointer Hooks

{#pointer [*] cid [as hsid] [foreign [finalizer fcide [as fhsid] ] | stable] [newtype | -> hsid2] [nocode]#}

A pointer hook facilitates the mapping of C to Haskell pointer types. In particular, it enables the use of ForeignPtr and StablePtr types and defines type name translations for pointers to non-basic types. In general, such a hook establishes an association between the C type cid or *cid and the Haskell type hsid, where the latter defaults to cid if not explicitly given. The identifier cid will usually be a type name, but in the case of *cid may also be a struct, union, or enum tag. If both a type name and a tag of the same name are available, the type name takes precedence. Optionally, the Haskell representation of the pointer can be by a ForeignPtr or StablePtr instead of a plain Ptr. For pointers of type ForeignPtr, a finalizer may be specified -- this is a C function that takes a pointer of the appropriate type and deallocates the previously allocated object. If the newtype tag is given, the Haskell type hsid is defined as a newtype rather than a transparent type synonym. In case of a newtype, the type argument to the Haskell pointer type will be hsid, which gives a cyclic definition, but the type argument is here really only used as a unique type tag. Without newtype, the default type argument is (), but another type can be specified after the symbol ->.

For example, we may have

{#pointer *GtkObject as Object newtype#}

This will generate a new type Object as follows:

newtype Object = Object (Ptr Object)

which enables exporting Object as an abstract type and facilitates type checking at call sites of imported functions using the encapsulated pointer. The latter is achieved by c2hs as follows. The tool remembers the association of the C type *GtkObject with the Haskell type Object, and so, it generates for the C function

void gtk_unref_object (GtkObject *obj);

the import declaration

foreign import gtk_unref_object :: Object -> IO ()

This function can obviously only be applied to pointers of the right type, and thus, protects against the common mistake of confusing the order of pointer arguments in function calls.

However, as the Haskell FFI does not permit to directly pass ForeignPtrs to function calls or return them, the tool will use the type Ptr HsName in this case, where HsName is the Haskell name of the type. So, if we modify the above declaration to be

{#pointer *GtkObject as Object foreign newtype#}

the type Ptr Object will be used instead of a plain Object in import declarations; i.e., the previous import declaration will become

foreign import gtk_unref_object :: Ptr Object -> IO ()

To simplify the required marshalling code for such pointers, the tool automatically generates a function

withObject :: Object -> (Ptr Object -> IO a) -> IO a

As an example that does not represent the pointer as an abstract type, consider the C type declaration:

typedef struct {int x, y;} *point;

We can represent it in Haskell as

data Point = Point {x :: Int, y :: Int}
{#pointer point as PointPtr -> Point#}

which will translate to

data Point = Point {x :: Int, y :: Int}
type PointPtr = Ptr Point

and establish a type association between point and PointPtr.

If the type after the -> is a parameterized type, it should be included in a backquote-forward quote pair, e.g.

data Hit2 a b = Hit2 a b
{#pointer *hit_double as HitEg -> `Hit2 Double [Int]'#}

If the keyword nocode is added to the end of a pointer hook, c2hs will not emit a type declaration. This is useful when a c2hs module wants to make use of an existing type declaration in a binding not generated by c2hs (i.e., where there are no .chi files).

Restriction

The name cid cannot be a basic C type (such as int), it must be a defined name.

Class Hooks

{#class [hsid1 =>] hsid2 hsid3#}

Class hooks facilitate the definition of a single inheritance class hierachy for external pointers including up and down cast functionality. This is meant to be used in cases where the objects referred to by the external pointers are order in such a hierachy in the external API — such structures are encountered in C libraries that provide an object-oriented interface. Each class hook rewrites to a class declaration and one or more instance declarations.

All classes in a hierarchy, except the root, will have a superclass identified by hsid1. The new class is given by hsid2 and the corresponding external pointer is identified by hsid3. Both the superclass and the pointer type must already have been defined by binding hooks that precede the class hook.

The pointers in a hierachy must either all be foreign pointers or all be normal pointers. Stable pointers are not allowed. Both pointer defined as newtypes and those defined by type synonyms may be used in class declarations and they may be mixed. In the case of synonyms, Haskell's usual restrictions regarding overlapping instance declarations apply.

The newly defined class has two members whose names are derived from the type name hsid3. The name of first member is derived from hsid3 by converting the first character to lower case. This function casts from any superclass to the current class. The name of the second member is derived by prefixing hsid3 with the from. It casts from the current class to any superclass. A class hook generates an instance for the pointer in the newly defined class as well as in all its superclasses.

As an example, consider

{#pointer *GtkObject newtype#}
{#class GtkObjectClass GtkObject#}

{#pointer *GtkWidget newtype#}
{#class GtkObjectClass => GtkWidgetClass GtkWidget#}

The second class hook generates an instance for GtkWidget for both the GtkWidgetClass as well as for the GtkObjectClass.

Typedef and default marshaller hooks

A common requirement is to be able to make a link between a given C typedef and a corresponding Haskell type, for example, marshalling C size_t values to Haskell CSize values. In these cases, it's convenient for the user to be able to specify default input and output marshallers for these pairs of types to avoid having to give marshallers on individual function hooks. These capabilities are provided by the typedef and default hooks.

A typedef hook defines an association between a C typedef name and a Haskell type with default identity marshalling:

{#typedef cid hsid#}

For example:

{#typedef size_t CSize#}
{#fun foo {`Int'} -> `CSize'#}

This associates the C typedef size_t with the Haskell type CSize and permits marshalling of values into and out of functions -- the C function foo is defined as

size_t foo(int n);

Note that the default id marshalling produced by a typedef hook can lead to type errors in the generated Haskell code. For more complex marshalling needs, default hooks can be used to define default marshallers for different situations. A default hook is of the form:

{#default (in|out) hsty [ctype] marsh#}

and defines an in or out marshaller between a given Haskell type (type name provided in the "verbatim Haskell" syntax) and a C type (which can either be a simple typedef name or a pointer to a typedef, using a marshaller given as a Haskell function name, optionally followed by a * to mark that marshalling happens in the IO monad.

For example, the following definitions associate the Haskell CWchar type with the C wchar_t typedef name (from the standard wchar.h header) and defines default in and out marshallers between Haskell strings and wchar_t pointers. These typedef and marshalling definitions allow the wcscmp and wcscat functions to be defined simply as functions taking and returning Haskell String arguments:

{#typedef wchar_t CWchar#}
{#default in `String' [wchar_t *] withCWString*#}
{#default out `String' [wchar_t *] peekCWString*#}
{#fun wcscmp {`String', `String'} -> `Int'#}
{#fun wcscat {`String', `String'} -> `String'#}

Note that it generally doesn't make much sense to have default hooks without a corresponding typedef hook.

CPP Directives and Inline C Code

A Haskell binding module may include arbitrary C pre-processor directives using the standard C syntax. The directives are used in two ways: Firstly, they are included in the C header file generated by c2hs in exactly the same order in which they appear in the binding module. Secondly, all conditional directives are honoured by c2hs in that all Haskell binding code in alternatives that are discarded by the C pre-processor are also discarded by c2hs. This latter feature is, for example, useful to maintain different bindings for multiple versions of the same C API in a single Haskell binding module.

In addition to C pre-processor directives, vanilla C code can be maintained in a Haskell binding module by bracketing this C code with the pseudo directives #c and #endc. Such inline C code is emitted into the C header generated by c2hs at exactly the same position relative to CPP directives as it occurs in the binding module. Pre-processor directives may encompass the #include directive, which can be used instead of specifying a C header file as an argument to c2hs. In particular, this enables the simultaneous use of multiple header files without the need to provide a custom header file that binds them together. If a header file lib.h is specified as an argument to c2hs, the tool will emit the directive #include "lib.h" into the generated C header before any other CPP directive or inline C code.

As an artificial example of these features consider the following code:

#define VERSION 2

#if (VERSION == 1)
foo :: CInt -> CInt
foo = {#call pure fooC#}
#else
foo :: CInt -> CInt -> CInt
foo = {#call pure fooC#}
#endif

#c
int fooC (int, int);
#endc

One of two versions of the Haskell function foo (having different arities) is selected in dependence on the value of the CPP macro VERSION, which in this example is defined in the same file. In realistic code, VERSION would be defined in the header file supplied with the C library that is made accessible from Haskell by a binding module. The above code fragment also includes one line of inline C code that declares a C prototype for fooC.

Current limitation of the implementation

Inline C code can currently not contain any code blocks; i.e., only declarations as typically found in header files may be included.

Grammar Rules

The following grammar rules define the syntax of binding hooks:

hook     -> '{#' inner '#}'
inner    -> 'import' ['qualified'] ident
          | 'context' ctxt
          | 'type' ident
          | 'sizeof' ident
          | 'alignof' ident
          | 'enum' idalias trans ['with' prefix] [deriving]
          | 'call' ['pure'] ['unsafe'] idalias
          | 'fun' ['pure'] ['unsafe'] idalias parms
          | 'get' apath
          | 'set' apath
          | 'offsetof' apath
          | 'pointer' ['*'] idalias ptrkind
          | 'class' [ident '=>'] ident ident
          | 'typedef' ident ident
          | 'default' ('in'|'out') verbhs '[' ident [*] ']' ident [*]

ctxt     -> ['lib' '=' string] [prefix]
idalias  -> ident [('as' ident | '^')]
prefix   -> 'prefix' '=' string
deriving -> 'deriving' '(' ident_1 ',' ... ',' ident_n ')'
parms    -> [verbhs '=>'] '{' parm_1 ',' ... ',' parm_n '}' '->' parm
parm     -> '+' | [ident_1 ['*' | '-']] ['%'] verbhs ['&'] [ident_2 ['*'] ['-']]
apath    -> ident
          | '*' apath
          | apath '.' ident
          | apath '->' ident
trans    -> '{' alias_1 ',' ... ',' alias_n '}' [omit]
omit     -> 'omit' '(' ident_1 ',' ... ',' ident_n ')'
alias    -> 'underscoreToCase' | 'upcaseFirstLetter' | 'downcaseFirstLetter'
          | ident 'as' ident
ptrkind  -> ['foreign' ['finalizer' idalias] | 'stable'] ['newtype' | '->' ident]

Identifier ident follow the lexis of Haskell. They may be enclosed in single quotes to disambiguate them from c2hs keywords.

Clone this wiki locally