Jump to content

Generic programming

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Bluemoose (talk | contribs) at 15:20, 26 August 2006 (minor formatting). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Generic programming is a datatype-independent way of computer programming. This means that the same source code will be used regardless of the datatype the code will be instantiated with or passed as parameters[1]. Generic programming is implemented and supported differently within different languages. Generic programming facilities first appeared in the 1970s in such languages as CLU and Ada, and were subsequently adopted by many object-based and object-oriented languages, including BETA, C++, D, Eiffel, Java, and DEC's now defunct Trellis-Owl language.

As an example of the benefits of generic programming, when creating container of objects it is common to write specific implementations for each datatype contained, even if the code is virtually identical except for different datatypes. Instead, a possible declaration using generic programming could be to define a template class (using the C++ idiom):

template<typename T>
class List
{
/* class contents */
};
List<Animal> list_of_animals;
List<Cars> list_of_cars;

Above T represents the type to be instantiated. The list generated is then treated as a list of whichever type is specified. These "containers-of-type-T", commonly called generics, are a generic programming technique that allows the defining of a class that takes and contains different datatypes (not to be confused with polymorphism, which is the algorithmic usage of exchangable sub-classes) as long as certain contracts such as subtypes and signature are kept. Though the example above is the most common use of generic programming, and some languages implement only this aspect of it, generic programming as a concept is not limited to generics. Another applicaton is type-independent functions as in the Swap example below:

template<typename T>
void Swap(T & a, T & b) //"&" passes parameters by reference
{
T temp = b;
b = a;
a = temp;
}
string hello = "world!", world = "Hello, ";
Swap( world, hello );
cout << hello << world << endl; //Output is "Hello, world!"

The template construct of C++ used in the examples above is widely cited as the generic programming construct that popularized the notion among programmers and language designers and provides full support for all generic programming idioms. D also offers fully generic-capable templated based on the C++ precedent but with a simplified syntax. Java has provided generic programming facilities syntactically based on C++'s since the introduction of J2SE 5.0 and implements the generics, or "containers-of-type-T", subset of generic programming.

C# 2.0, Chrome 1.5 and Visual Basic .NET 2005 have constructs that take advantage of the support for generics present in the Microsoft .NET Framework since version 2.0. The ML family of programming languages encourage generic programming through parametric polymorphism and generic modules called functors. The type class mechanism of Haskell supports generic programming.

Dynamic typing, such as is featured in Objective-C, and, if necessary, judicious use of protocols circumvent the need for use of generic programming techniques, since there exists a general type to contain any object. Whilst Java does so also, the casting that needs to be done breaks the discipline of static typing, and generics are one way of achieving some of the benefits of dynamic typing with the advantages of having static typing.

Generics in Ada

Ada has had generics since it was first designed in 1977-1980. The standard library uses generics to provide many services. Ada 2005 adds a comprehensive generic container library to the standard library, which was inspired by C++'s standard template library.

A generic unit is a package or a subprogram that takes one or more generic formal parameters.

A generic formal parameter is a value, a variable, a constant, a type, a subprogram, or even an instance of another, designated, generic unit. For generic formal types, the syntax distinguishes between dicscrete, floating-point, fixed-point, access (pointer) types, etc. Some formal parameters can have default values.

To instantiate a generic unit, the programmer passes actual parameters for each formal. The generic instance then behaves just like any other unit. It is possible to instantiate generic units at run-time, for example inside a loop.

Ada example

The specification of a generic package:

generic
Max_Size : Natural; -- a generic formal value
type Element_Type is private; -- a generic formal type; accepts any nonlimited type
package Stacks is
type Size_Type is range 0 .. Max_Size;
type Stack is limited private;
procedure Create (S : out Stack;
Initial_Size : in Size_Type := Max_Size);
procedure Push (Into : in out Stack; Element : in Element_Type);
procedure Pop (From : in out Stack; Element : out Element_Type);
Overflow : exception;
Underflow : exception;
private
subtype Index_Type is Size_Type range 1 .. Max_Size;
type Vector is array (Index_Type range <>) of Element_Type;
type Stack (Allocated_Size : Size_Type := 0) is record
Top : Index_Type;
Storage : Vector (1 .. Allocated_Size);
end record;
end Stacks;

Instantiating the generic package:

type Bookmark_Type is new Natural;
-- records a location in the text document we are editing
package Bookmark_Stacks is new Stacks (Max_Size => 20,
Element_Type => Bookmark_Type);
-- Allows the user to jump between recorded locations in a document

Using an instance of a generic package:

type Document_Type is record
Contents : Ada.Strings.Unbounded.Unbounded_String;
Bookmarks : Bookmark_Stacks.Stack;
end record;
procedure Edit (Document_Name : in String) is
Document : Document_Type;
begin
-- Initialise the stack of bookmarks:
Bookmark_Stacks.Create (S => Document.Bookmarks, Initial_Size => 10);
-- Now, open the file Document_Name and read it in...
end Edit;

Advantages and limitations

The language syntax allows very precise specification of constraints on generic formal parameters. For example, it is possible to specify that a generic formal type will only accept a modular type as the actual. It is also possible to express constraints between generic formal parameters; for example:

generic
type Index_Type is (<>); -- must be a discrete type
type Element_Type is private; -- can be any nonlimited type
type Array_Type is array (Index_Type range <>) of Element_Type;

In this example, Array_Type is constrained by both Index_Type and Element_Type. When instantiating the unit, the programmer must pass an actual array type that satisfies these constraints.

The disadvantage of this fine-grained control is a complicated syntax, but, because all generic formal parameters are completely defined in the specification, the compiler can instantiate generics without looking at the body of the generic.

Unlike C++, Ada does not allow specialised generic instances, and requires that all generics be instantiated explicitly. These rules have several consequences:

  • the compiler can implement shared generics: the object code for a generic unit can be shared between all instances (unless the programmer requests inlining of subprograms, of course). As further consequences:
    • there is no possibility of code bloat (code bloat is common in C++ and requires special care, as explained below).
    • it is possible to instantiate generics at run time, as well as at compile time, since no new object code is required for a new instance.
    • actual objects corresponding to a generic formal object are always considered to be nonstatic inside the generic; see Generic formal objects in the Wikibook for details and consequences.
  • all instances of a generic being exactly the same, it is easier to review and understand programs written by others; there are no "special cases" to take into account.
  • all instantiations being explicit, there are no hidden instantiations that might make it difficult to understand the program.
  • Ada does not permit "template metaprogramming", because it does not allow specialisations.

Templates in C++

Templates are of great utility to programmers in C++, especially when combined with multiple inheritance and operator overloading. The C++ Standard Template Library (STL) provides many useful functions within a framework of connected templates.

As the templates in C++ are very expressive they may be used for things other than generic programming. One such use is called template metaprogramming, which is a way of pre-evaluating some of the code at compile-time rather than run-time. Further discussion here only relates to templates as a method of generic programming.

Technical overview

There are two kinds of templates: function templates and class templates. A function template behaves like an ordinary function, but that can accept arguments of many different possibly unrelated types. For example, the C++ Standard Template Library contains the function template max(x, y) which returns either x or y, whichever is larger. max() could be defined like this:

template <typename T>
T max(T x, T y)
{
if (x < y)
return y;
else
return x;
}

This template can be called just like a function:

cout << max(3, 7); // outputs 7

The compiler determines by examining the arguments that this is a call to max(int, int) and instantiates a version of the function where the type T is int.

This works whether the arguments x and y are integers, strings, or any other type for which it makes sense to say "x < y", or more specifically, for any type that has an operator< specified. There does not need to be any common inheritance for the set of types that can be used, and so it is actually a form of static duck typing. If a program defines a custom data type, all it needs to do is to use operator overloading to define the meaning of < for that type, thus allowing it to be used by the max() function. While this may seem a minor benefit in this isolated example, in the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining < allows a type to be used with the standard sort(), stable_sort(), and binary_search() algorithms; or put inside data structures such as sets, heaps, and associative arrays; and more.

C++ templates are completely type safe at compile time. As a demonstration, the standard type complex does not define the < operator, because there is no strict order on complex numbers. Therefore max(x, y) will fail with a compile error if x and y are complex values. Likewise, other templates that rely on < cannot be applied to complex data. Unfortunately, compilers historically generate somewhat esoteric and unhelpful error messages for this sort of error. Ensuring that a certain object adheres to a method protocol can alleviate this issue.

The second kind of template, a class template extends the same concept to classes. Class templates are often used to make generic containers. For example, the STL has a linked list container. To make a linked list of integers, one writes list<int>. A list of strings is denoted list<string>. A list has a set of standard functions associated with it, which work no matter what you put between the brackets.

Template specialization

A powerful feature of C++'s templates is template specialization. This allows alternative implementations to be provided based upon certain characteristics of the parameterized type that is being instantiated. Template specialization has two purposes, to allow certain forms of optimization, and to help reduce code bloat.

For example, consider a sort() template function. One of the primary activities that such a function does is to swap or exchange the values in two of the container's positions. If the values are large (in terms of the number of bytes it takes to store each of them), then it is often quicker to first build a separate list of pointers to the objects, sort those pointers, and then build the final sorted sequence. If the values are quite small though it is usually fastest to just swap the values in-place as needed. Furthermore if the parameterized type is already of some pointer-type, then there is no need to build a separate pointer array. Template specialization allows the template creator to write several different implementations and to specify the characteristics that the parameterized type(s) must have for each implementation to be used.

Advantages and disadvantages

Some uses of templates, such as the max() function, were previously filled by function-like preprocessor macros (a legacy of the C programming language). For example, here is a possible max() macro:

#define max(a,b) ((a) < (b) ? (b) : (a))

Both macros and templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.

However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe. Templates avoid some of the common errors found in code that makes heavy use of function-like macros. Perhaps most importantly, templates were designed to be applicable to much larger problems than macros.

There are three primary drawbacks to the use of templates: compiler support, poor error messages, and code bloat. Many compilers historically have very poor support for templates, so the use of templates can make code somewhat less portable. Support may also be poor when a C++ compiler is being used with a linker which is not C++-aware, or when attempting to use templates across shared library boundaries. Most modern compilers though now have fairly robust and standard template support, and the new C++ standard, C++0x, is expected to further addressing these issues.

Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates. This can make templates difficult to develop.

Finally, the use of a templates may cause the compiler to generate extra code (an instantiation of the template), so the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables. However, judicious use of template specialization can dramatically reduce such code bloat in some cases. The extra instantiations generated by templates can also cause debuggers to have difficulty working gracefully with templates. For example, setting a debug breakpoint within a template from a source file may either miss setting the breakpoint in the actual instantiation desired or may set a breakpoint in every place the template is instantiated.

Templates in D

The D programming language supports templates that are evolved from those in C++. Most C++ template idioms will carry over to D without alteration, but D adds some functionality that streamlines some common cases.

The most obvious differences are some syntax changes. D uses parenthesis ( ) instead of angle-brackets < > in a template definition. It also uses the !( ) construct (that is, parentheses preceded by an exclamation point) instead of angle-brackets in template instantiation. Therefore, a!(b) in D is the equivalent of a<b> in C++. These changes were made in the interest of making templates easier to parse, as using angle-brackets can lead to some ambiguous syntax.

Static-if

D provides a static if construct that checks conditions at compile-time. This is vaguely analogous to C++'s #if and #endif preprocessor macros. The major difference is that static if has access to all compile-time values, including template arguments. Therefore, many situations which require template specialization in C++ may be written inline in D. Recursive templates in D can look almost identical to their runtime equivalents. This is shown in the classic compile-time factorial template.

template factorial(int n) {
static if (n == 1)
const int factorial = 1;
else
const int factorial = n * factorial!(n-1);
}

Alias parameters

Templates in D may also accept alias parameters. Alias parameters are similar to C++'s typedef but can also substitute templates parameters. This is a superset of the functionality of template template arguments in C++, and will be added in the forthcoming C++0x specification. Alias parameters may be templates, functions, types, or any other compile-time symbol. This allows a coder to, for example, "inject" a function into the middle of a template function.

template wrapper(alias Fn) {
// Wraps a D function with an "extern(C)" interface.
extern(C) void wrapper() {
Fn();
}
}

This sort of template might be useful when interfacing D code with a C API. If a hypothetical C API wants a function pointer, you might use the template like this:

void foo() {
// ...
}
some_c_function(&wrapper!(foo));

Generics in Java

Generics were added to the Java programming language in 2004 as part of J2SE 5.0. Unlike C++ templates, generic Java code generates only one compiled version of a generic class. Generic Java classes can only use object types as type parameters — primitive types are not allowed. Thus a List<Integer> is legal, while a List<int> is not.

In Java, generics are checked at compile time for type correctness. The generic type information is then removed via a process called type erasure, generic type information is retained only for superclass. For example, List<Integer> will be converted into a non-generic (raw) List, which can contain arbitrary objects. However, due to the compile-time check, the resulting code is guaranteed to be type correct, as long the code generated no unchecked compiler warnings.

One side-effect of this process is that the generic type information is typically not known at runtime. Thus, at runtime, a List<Integer> and a List<String> refer to the same class, List. One way to mitigate this side effect is through the use of Java's Collections.checkedSet() method, which will decorate the declared Collection and check for improper use (i.e. insertion of an inappropriate type) of a typed Collection at runtime. This can be useful in situations when legacy code is interoperating with code that makes use of generics.

Like C++ and C#, Java allows nested generic types. Thus List<Map<Integer, String>> is a valid type.

Wildcards

Generic type parameters in Java are not limited to specific classes. Java allows the use of wildcards to specify bounds on what type of parameters a given generic object may have. For example, List<?> indicates a list which has an unknown object type. Methods which take such a list as an argument could take any type of list. Reading from the list will return objects of type Object, and writing non-null elements to the list is not allowed, since the parameter type is not known.

To specify the upper bound of a generic element, the extends keyword is used, which indicates that the generic type is a subclass (either extends the class, or implements the interface) of the bounding class. So List<? extends Number> means that the given list contains objects which extend the Number class; for example, the list could be List<Float> or List<Number>. Thus, reading an element from the list will return a Number, while writing non-null elements is once again not allowed, since it is not known what type of element the list holds.

To specify the lower bound of a generic element, the super keyword is used, which indicates that the generic type is a superclass of the bounding class. So List<? super Number> could be List<Number> or List<Object>. Reading from the list returns objects of type Object, while any element of type Number can be added to the list, since it is guaranteed to be a valid type to store in the list.

Limitations

A limitation of the Java implementation of generics makes it impossible to create an array of a generic type, since there is no way to determine what the array type should be. Thus if a method had a type parameter T, the programmer cannot create a new array of that type, such as via new Tsize;. (It is possible to work around this limitation using Java's reflection mechanisms: if an instance of class T is available, one can obtain from that object the Class object corresponding to T and use java.lang.reflect.Array.newInstance to create the array.) Another limitation of the Java implementation of generics is that it is impossible to create an array of a generic class with a type parameter type other than <?>. This is due to the way arrays are handled in the language, and is necessary to assure that all code that doesn't cause compile warnings without using explicit casts is guaranteed to be type safe.

Generic programming in Haskell

The Haskell programming language has parameterized types, parametric polymorphism and type classes, which together support a style of programming somewhat similar to what is possible with Java generics or common usage idioms of C++ templates. Use of these constructs in Haskell programs is ubiquitous and even difficult to avoid. Haskell also has some more unusual generic programming features, which have inspired programmers and language designers to strive for even more genericity and code re-use than polymorphism can provide.

Six of the predefined type classes in Haskell (including Eq, the types that can be compared for equality, and Show, the types whose values can be rendered as strings) have the special property of supporting derived instances. This means that a programmer defining a new type can state that this type is to be an instance of one of these special type classes, without providing implementations of the class methods as is usually necessary when declaring class instances. All the necessary methods will be "derived" -- that is, constructed automatically -- based on the structure of the type. For instance, the following declaration of a type of binary trees states that it is to be an instance of the classes Eq and Show:

data BinTree a = Leaf a | Node (BinTree a) a (Bintree a)
deriving (Eq, Show)

This results in an equality function (==) and a string representation function (show) being automatically defined for any type of the form BinTree T provided that T itself supports those operations.

The support for derived instances of Eq and Show makes their methods == and show generic in a qualitatively different way from parametrically polymorphic functions: these "functions" (more accurately, type-indexed families of functions) can be applied to values of many different types, and although they behave differently for every argument type, very little work is needed to add support for a new type. Ralf Hinze (2004) has shown that a similar effect can be achieved for user-defined type classes by certain programming techniques. Many other researchers have proposed approaches to this and other kinds of genericity in the context of Haskell and extensions to Haskell (discussed below).

Generic programming in C# and .NET

Generics in C# (and other .NET languages) were added as part of .NET Framework 2.0 in November 2005. Although similar to Java, .NET generics are implemented in the runtime and not as a conversion from generic types to non-generic types in the compiler. This design choice is leveraged to provide additional features. [2]

Features of .NET generics:

  • Because there is no type erasure and because generics are built into the CLR — and not solely on the compiler) — there is no performance hit from casting and performing dynamic checks. In addition, the programmer is able to access generic information via reflection. [3] [4]
    • The lack of type erasure allows the creation of an array of a generic type, which is not possible in Java.
  • .NET allows primitives as generic type parameters because primitives are also objects in .NET.
  • Like Java, .NET allows generic type parameters to be generic types themselves. That is, code like List<List<Dictionary<int, int>>> x = new List<List<Dictionary<int, int>>>(); works.
  • C# (and .NET in general) allows a wide range of generic type constraints using the where keyword [5] including restricting generic types to be value types, to be classes, to have constructors, and to inherit from interfaces; type constraints allow C# generics to become more latent like C++ templates. Type constraints allow code like person.GetFirstName [6]:
interface IPerson {
string GetFirstName();
string GetLastName();
}
class Speaker {
public void speakTo<T>(T person) where T : IPerson {
string name = person.GetFirstName();
this.say("Hello, " + name);
}
}

Without type constraints on T, person could only have the members of the base object type. Java supports a similar mechanism via using the extends keyword in generic type signatures. [7]

Generic programming features in other languages

Many functional programming languages support small-scale generic programming in the form of parameterized types and parametric polymorphism. In addition, Standard ML and OCaml provide functors, which are similar to class templates and to Ada's generic packages.

Recent and experimental developments

As of 2006, the development of language constructs and programming techniques to support a greater degrees of genericity and code reuse, particularly in functional programming languages, is an active research area. Many of the recent contributions are presented as extensions to Haskell and rely to some extent on that language's type class mechanism.

PolyP

PolyP was the first generic programming language extension to Haskell. In PolyP, generic functions are called polytypic. The language introduces a special construct in which such polytypic functions can be defined via structural induction over the structure of the pattern functor of a regular datatype. Regular datatypes in PolyP are a subset of Haskell datatypes. A regular datatype t must be of kind * → *, and if a is the formal type argument in the definition, then all recursive calls to t must have the form t a. These restrictions rule out higher kinded datatypes as well as nested datatypes, where the recursive calls are of a different form. The flatten function in PolyP is here provided as an example:

   flatten :: Regular d => d a -> [a]
   flatten = cata fl
   
   polytypic fl :: f a [a] -> [a]
     case f of
       g+h -> either fl fl
       g*h -> \(x,y) -> fl x ++ fl y
       () -> \x -> []
       Par -> \x -> [x]
       Rec -> \x -> x
       d@g -> concat . flatten . pmap fl
       Con t -> \x -> []
   
   cata :: Regular d => (FunctorOf d a b -> b) -> d a -> b

Generic Haskell

Generic Haskell is another extension to Haskell, developed at Utrecht University. The extensions it provides are:

  • Type-indexed values are defined as a value indexed over the various Haskell type constructors (unit, primitive types, sums, products, and user-defined type constructors). In addition, we can also specify the behaviour of a type-indexed values for a specific constructor using constructor cases, and reuse one generic definition in another using default cases.

The resulting type-indexed value can be specialised to any type.

  • Kind-indexed types are types indexed over kinds, defined by giving a case for both * and k → k'. Instances are obtained by applying the kind-indexed type to a kind.
  • Generic definitions can be used by applying them to a type or kind. This is called generic application. The result is a type or value, depending on which sort of generic definition is applied.
  • Generic abstraction enables generic definitions be defined by abstracting a type parameter (of a given kind).
  • Type-indexed types are types which are indexed over the type constructors. These can be used to give types to more involved generic values. The resulting type-indexed types can be specialised to any type.

As an example, the equality function in Generic Haskell:

   type Eq {[ * ]} t1 t2 = t1 -> t2 -> Bool
   type Eq {[ k -> l ]} t1 t2 = forall u1 u2. Eq {[ k ]} u1 u2 -> Eq {[ l ]} (t1 u1) (t2 u2)
   
   eq {| t :: k |} :: Eq {[ k ]} t t
   eq {| Unit |} _ _ = True
   eq {| :+: |} eqA eqB (Inl a1) (Inl a2) = eqA a1 a2
   eq {| :+: |} eqA eqB (Inr b1) (Inr b2) = eqB b1 b2
   eq {| :+: |} eqA eqB _ _ = False
   eq {| :*: |} eqA eqB (a1 :*: b1) (a2 :*: b2) = eqA a1 a2 && eqB b1 b2
   eq {| Int |} = (==)
   eq {| Char |} = (==)
   eq {| Bool |} = (==)

The "Scrap your boilerplate" approach

The Scrap your boilerplate approach is a lightweight generic programming approach for Haskell (Lämmel and Peyton Jones, 2003). The approach is supported in the GHC >= 6.0 implementation of Haskell. Using this approach, the programmer can write generic functions such as traversal schemes (e.g., everywhere and everything), as well as generic read, generic show and generic equality (i.e., gread, gshow, and geq). This approach is based on just a few primitives for type-safe cast and processing constructor applications.

See also