Generic programming
Generic programming is a datatype-independent way of computer programming. This means that the same source code will be used regardless of the datatype the code will be instantiated with or passed as parameters[1]. Generic programming is implemented and supported differently within different languages. Generic programming facilities first appeared in the 1970s in such languages as CLU and Ada, and were subsequently adopted by many object-based and object-oriented languages, including BETA, C++, D, Eiffel, Java, and DEC's now defunct Trellis-Owl language.
As an example of the benefits of generic programming, when creating container of objects it is common to write specific implementations for each datatype contained, even if the code is virtually identical except for different datatypes. Instead, a possible declaration using generic programming could be to define a template class (using the C++ idiom):
template<typename T> class List { /* class contents */ };
List<Animal> list_of_animals; List<Cars> list_of_cars;
Above T represents the type to be instantiated. The list generated is then treated as a list of whichever type is specified. These "containers-of-type-T", commonly called generics, are a generic programming technique that allows the defining of a class that takes and contains different datatypes (not to be confused with polymorphism, which is the algorithmic usage of exchangable sub-classes) as long as certain contracts such as subtypes and signature are kept. Though the example above is the most common use of generic programming, and some languages implement only this aspect of it, generic programming as a concept is not limited to generics. Another applicaton is type-independent functions as in the Swap
example below:
template<typename T> void Swap(T & a, T & b) //"&" passes parameters by reference { T temp = b; b = a; a = temp; }
string hello = "world!", world = "Hello, "; Swap( world, hello ); cout << hello << world << endl; //Output is "Hello, world!"
The template
construct of C++ used in the examples above is widely cited as the generic programming construct that popularized the notion among programmers and language designers and provides full support for all generic programming idioms. D also offers fully generic-capable templated based on the C++ precedent but with a simplified syntax. Java has provided generic programming facilities syntactically based on C++'s since the introduction of J2SE 5.0 and implements the generics, or "containers-of-type-T", subset of generic programming.
C# 2.0, Chrome 1.5 and Visual Basic .NET 2005 have constructs that take advantage of the support for generics present in the Microsoft .NET Framework since version 2.0. The ML family of programming languages encourage generic programming through parametric polymorphism and generic modules called functors. The type class mechanism of Haskell supports generic programming.
Dynamic typing, such as is featured in Objective-C, and, if necessary, judicious use of protocols circumvent the need for use of generic programming techniques, since there exists a general type to contain any object. Whilst Java does so also, the casting that needs to be done breaks the discipline of static typing, and generics are one way of achieving some of the benefits of dynamic typing with the advantages of having static typing.
Generics in Ada
Ada has had generics since it was first designed in 1977-1980. The standard library uses generics to provide many services. Ada 2005 adds a comprehensive generic container library to the standard library, which was inspired by C++'s standard template library.
A generic unit is a package or a subprogram that takes one or more generic formal parameters.
A generic formal parameter is a value, a variable, a constant, a type, a subprogram, or even an instance of another, designated, generic unit. For generic formal types, the syntax distinguishes between dicscrete, floating-point, fixed-point, access (pointer) types, etc. Some formal parameters can have default values.
To instantiate a generic unit, the programmer passes actual parameters for each formal. The generic instance then behaves just like any other unit. It is possible to instantiate generic units at run-time, for example inside a loop.
Ada example
The specification of a generic package:
generic Max_Size : Natural; -- a generic formal value type Element_Type is private; -- a generic formal type; accepts any nonlimited type package Stacks is type Size_Type is range 0 .. Max_Size; type Stack is limited private; procedure Create (S : out Stack; Initial_Size : in Size_Type := Max_Size); procedure Push (Into : in out Stack; Element : in Element_Type); procedure Pop (From : in out Stack; Element : out Element_Type); Overflow : exception; Underflow : exception; private subtype Index_Type is Size_Type range 1 .. Max_Size; type Vector is array (Index_Type range <>) of Element_Type; type Stack (Allocated_Size : Size_Type := 0) is record Top : Index_Type; Storage : Vector (1 .. Allocated_Size); end record; end Stacks;
Instantiating the generic package:
type Bookmark_Type is new Natural; -- records a location in the text document we are editing
package Bookmark_Stacks is new Stacks (Max_Size => 20, Element_Type => Bookmark_Type); -- Allows the user to jump between recorded locations in a document
Using an instance of a generic package:
type Document_Type is record Contents : Ada.Strings.Unbounded.Unbounded_String; Bookmarks : Bookmark_Stacks.Stack; end record;
procedure Edit (Document_Name : in String) is Document : Document_Type; begin -- Initialise the stack of bookmarks: Bookmark_Stacks.Create (S => Document.Bookmarks, Initial_Size => 10); -- Now, open the file Document_Name and read it in... end Edit;
Advantages and limitations
The language syntax allows very precise specification of constraints on generic formal parameters. For example, it is possible to specify that a generic formal type will only accept a modular type as the actual. It is also possible to express constraints between generic formal parameters; for example:
generic type Index_Type is (<>); -- must be a discrete type type Element_Type is private; -- can be any nonlimited type type Array_Type is array (Index_Type range <>) of Element_Type;
In this example, Array_Type is constrained by both Index_Type and Element_Type. When instantiating the unit, the programmer must pass an actual array type that satisfies these constraints.
The disadvantage of this fine-grained control is a complicated syntax, but, because all generic formal parameters are completely defined in the specification, the compiler can instantiate generics without looking at the body of the generic.
Unlike C++, Ada does not allow specialised generic instances, and requires that all generics be instantiated explicitly. These rules have several consequences:
- the compiler can implement shared generics: the object code for a generic unit can be shared between all instances (unless the programmer requests inlining of subprograms, of course). As further consequences:
- there is no possibility of code bloat (code bloat is common in C++ and requires special care, as explained below).
- it is possible to instantiate generics at run time, as well as at compile time, since no new object code is required for a new instance.
- actual objects corresponding to a generic formal object are always considered to be nonstatic inside the generic; see Generic formal objects in the Wikibook for details and consequences.
- all instances of a generic being exactly the same, it is easier to review and understand programs written by others; there are no "special cases" to take into account.
- all instantiations being explicit, there are no hidden instantiations that might make it difficult to understand the program.
- Ada does not permit "template metaprogramming", because it does not allow specialisations.
Templates in C++
Templates are of great utility to programmers in C++, especially when combined with multiple inheritance and operator overloading. The C++ Standard Template Library (STL) provides many useful functions within a framework of connected templates.
As the templates in C++ are very expressive they may be used for things other than generic programming. One such use is called template metaprogramming, which is a way of pre-evaluating some of the code at compile-time rather than run-time. Further discussion here only relates to templates as a method of generic programming.
Technical overview
There are two kinds of templates: function templates and class templates. A function template behaves like an ordinary function, but that can accept arguments of many different possibly unrelated types. For example, the C++ Standard Template Library contains the function template max(x, y)
which returns either x or y, whichever is larger. max()
could be defined like this:
template <typename T> T max(T x, T y) { if (x < y) return y; else return x; }
This template can be called just like a function:
cout << max(3, 7); // outputs 7
The compiler determines by examining the arguments that this is a call to max(int, int)
and instantiates a version of the function where the type T
is int
.
This works whether the arguments x
and y
are integers, strings, or any other type for which it makes sense to say "x < y
", or more specifically, for any type that has an operator<
specified. There does not need to be any common inheritance for the set of types that can be used, and so it is actually a form of static duck typing. If a program defines a custom data type, all it needs to do is to use operator overloading to define the meaning of <
for that type, thus allowing it to be used by the max()
function. While this may seem a minor benefit in this isolated example, in the context of a comprehensive library like the STL it allows the programmer to get extensive functionality for a new data type, just by defining a few operators for it. Merely defining <
allows a type to be used with the standard sort()
, stable_sort()
, and binary_search()
algorithms; or put inside data structures such as set
s, heaps, and associative arrays; and more.
C++ templates are completely type safe at compile time. As a demonstration, the standard type complex
does not define the <
operator, because there is no strict order on complex numbers. Therefore max(x, y)
will fail with a compile error if x and y are complex
values. Likewise, other templates that rely on <
cannot be applied to complex
data. Unfortunately, compilers historically generate somewhat esoteric and unhelpful error messages for this sort of error. Ensuring that a certain object adheres to a method protocol can alleviate this issue.
The second kind of template, a class template extends the same concept to classes. Class templates are often used to make generic containers. For example, the STL has a linked list container. To make a linked list of integers, one writes list<int>
. A list of strings is denoted list<string>
. A list
has a set of standard functions associated with it, which work no matter what you put between the brackets.
Template specialization
A powerful feature of C++'s templates is template specialization. This allows alternative implementations to be provided based upon certain characteristics of the parameterized type that is being instantiated. Template specialization has two purposes, to allow certain forms of optimization, and to help reduce code bloat.
For example, consider a sort()
template function. One of the primary activities that such a function does is to swap or exchange the values in two of the container's positions. If the values are large (in terms of the number of bytes it takes to store each of them), then it is often quicker to first build a separate list of pointers to the objects, sort those pointers, and then build the final sorted sequence. If the values are quite small though it is usually fastest to just swap the values in-place as needed. Furthermore if the parameterized type is already of some pointer-type, then there is no need to build a separate pointer array. Template specialization allows the template creator to write several different implementations and to specify the characteristics that the parameterized type(s) must have for each implementation to be used.
Advantages and disadvantages
Some uses of templates, such as the max()
function, were previously filled by function-like preprocessor macros (a legacy of the C programming language). For example, here is a possible max()
macro:
#define max(a,b) ((a) < (b) ? (b) : (a))
Both macros and templates are expanded at compile time. Macros are always expanded inline; templates can also be expanded as inline functions when the compiler deems it appropriate. Thus both function-like macros and function templates have no run-time overhead.
However, templates are generally considered an improvement over macros for these purposes. Templates are type-safe. Templates avoid some of the common errors found in code that makes heavy use of function-like macros. Perhaps most importantly, templates were designed to be applicable to much larger problems than macros.
There are three primary drawbacks to the use of templates: compiler support, poor error messages, and code bloat. Many compilers historically have very poor support for templates, so the use of templates can make code somewhat less portable. Support may also be poor when a C++ compiler is being used with a linker which is not C++-aware, or when attempting to use templates across shared library boundaries. Most modern compilers though now have fairly robust and standard template support, and the new C++ standard, C++0x, is expected to further addressing these issues.
Almost all compilers produce confusing, long, or sometimes unhelpful error messages when errors are detected in code that uses templates. This can make templates difficult to develop.
Finally, the use of a templates may cause the compiler to generate extra code (an instantiation of the template), so the indiscriminate use of templates can lead to code bloat, resulting in excessively large executables. However, judicious use of template specialization can dramatically reduce such code bloat in some cases. The extra instantiations generated by templates can also cause debuggers to have difficulty working gracefully with templates. For example, setting a debug breakpoint within a template from a source file may either miss setting the breakpoint in the actual instantiation desired or may set a breakpoint in every place the template is instantiated.
Templates in D
The D programming language supports templates that are evolved from those in C++. Most C++ template idioms will carry over to D without alteration, but D adds some functionality that streamlines some common cases.
The most obvious differences are some syntax changes. D uses parenthesis ( ) instead of angle-brackets < > in a template definition. It also uses the !( ) construct (that is, parentheses preceded by an exclamation point) instead of angle-brackets in template instantiation. Therefore, a!(b)
in D is the equivalent of a<b>
in C++. These changes were made in the interest of making templates easier to parse, as using angle-brackets can lead to some ambiguous syntax.
Static-if
D provides a static if
construct that checks conditions at compile-time. This is vaguely analogous to C++'s #if
and #endif
preprocessor macros. The major difference is that static if
has access to all compile-time values, including template arguments. Therefore, many situations which require template specialization in C++ may be written inline in D. Recursive templates in D can look almost identical to their runtime equivalents. This is shown in the classic compile-time factorial template.
template factorial(int n) {
static if (n == 1)
const int factorial = 1;
else
const int factorial = n * factorial!(n-1);
}
Alias parameters
Templates in D may also accept alias parameters. Alias parameters are similar to C++'s typedef
but can also substitute templates parameters. This is a superset of the functionality of template template arguments in C++, and will be added in the forthcoming C++0x specification. Alias parameters may be templates, functions, types, or any other compile-time symbol. This allows a coder to, for example, "inject" a function into the middle of a template function.
template wrapper(alias Fn) {
// Wraps a D function with an "extern(C)" interface.
extern(C) void wrapper() {
Fn();
}
}
This sort of template might be useful when interfacing D code with a C API. If a hypothetical C API wants a function pointer, you might use the template like this:
void foo() {
// ...
}
some_c_function(&wrapper!(foo));
Generics in Java
Generics were added to the Java programming language in 2004 as part of J2SE 5.0. Unlike C++ templates, generic Java code generates only one compiled version of a generic class. Generic Java classes can only use object types as type parameters — primitive types are not allowed. Thus a List<Integer>
is legal, while a List<int>
is not.
In Java, generics are checked at compile time for type correctness. The generic type information is then removed via a process called type erasure, generic type information is retained only for superclass. For example, List<Integer>
will be converted into a non-generic (raw) List
, which can contain arbitrary objects. However, due to the compile-time check, the resulting code is guaranteed to be type correct, as long the code generated no unchecked compiler warnings.
One side-effect of this process is that the generic type information is typically not known at runtime. Thus, at runtime, a List<Integer>
and a List<String>
refer to the same class, List
. One way to mitigate this side effect is through the use of Java's Collections.checkedSet()
method, which will decorate the declared Collection and check for improper use (i.e. insertion of an inappropriate type) of a typed Collection at runtime. This can be useful in situations when legacy code is interoperating with code that makes use of generics.
Like C++ and C#, Java allows nested generic types. Thus List<Map<Integer, String>>
is a valid type.
Wildcards
Generic type parameters in Java are not limited to specific classes. Java allows the use of wildcards to specify bounds on what type of parameters a given generic object may have. For example, List<?>
indicates a list which has an unknown object type. Methods which take such a list as an argument could take any type of list. Reading from the list will return objects of type Object
, and writing non-null elements to the list is not allowed, since the parameter type is not known.
To specify the upper bound of a generic element, the extends
keyword is used, which indicates that the generic type is a subclass (either extends the class, or implements the interface) of the bounding class. So List<? extends Number>
means that the given list contains objects which extend the Number
class; for example, the list could be List<Float>
or List<Number>
. Thus, reading an element from the list will return a Number, while writing non-null elements is once again not allowed, since it is not known what type of element the list holds.
To specify the lower bound of a generic element, the super
keyword is used, which indicates that the generic type is a superclass of the bounding class. So List<? super Number>
could be List<Number>
or List<Object>
. Reading from the list returns objects of type Object
, while any element of type Number
can be added to the list, since it is guaranteed to be a valid type to store in the list.
Limitations
A limitation of the Java implementation of generics makes it impossible to create an array of a generic type, since there is no way to determine what the array type should be. Thus if a method had a type parameter T, the programmer cannot create a new array of that type, such as via new Tsize;
. (It is possible to work around this limitation using Java's reflection mechanisms: if an instance of class T
is available, one can obtain from that object the Class
object corresponding to T
and use java.lang.reflect.Array.newInstance
to create the array.) Another limitation of the Java implementation of generics is that it is impossible to create an array of a generic class with a type parameter type other than <?>
. This is due to the way arrays are handled in the language, and is necessary to assure that all code that doesn't cause compile warnings without using explicit casts is guaranteed to be type safe.
Generic programming in Haskell
The Haskell programming language has parameterized types, parametric polymorphism and type classes, which together support a style of programming somewhat similar to what is possible with Java generics or common usage idioms of C++ templates. Use of these constructs in Haskell programs is ubiquitous and even difficult to avoid. Haskell also has some more unusual generic programming features, which have inspired programmers and language designers to strive for even more genericity and code re-use than polymorphism can provide.
Six of the predefined type classes in Haskell (including Eq
, the types that can be compared for equality, and Show
, the types whose values can be rendered as strings) have the special property of supporting derived instances. This means that a programmer defining a new type can state that this type is to be an instance of one of these special type classes, without providing implementations of the class methods as is usually necessary when declaring class instances. All the necessary methods will be "derived" -- that is, constructed automatically -- based on the structure of the type.
For instance, the following declaration of a type of binary trees states that it is to be an instance of the classes Eq
and Show
:
data BinTree a = Leaf a | Node (BinTree a) a (Bintree a) deriving (Eq, Show)
This results in an equality function (==
) and a string representation function (show
) being automatically defined for any type of the form BinTree T
provided that T
itself supports those operations.
The support for derived instances of Eq
and Show
makes their methods ==
and show
generic in a qualitatively different way from parametrically polymorphic functions: these "functions" (more accurately, type-indexed families of functions) can be applied to values of many different types, and although they behave differently for every argument type, very little work is needed to add support for a new type. Ralf Hinze (2004) has shown that a similar effect can be achieved for user-defined type classes by certain programming techniques. Many other researchers have proposed approaches to this and other kinds of genericity in the context of Haskell and extensions to Haskell (discussed below).
Generic programming in C# and .NET
![]() | This article may require copy editing for grammar, style, cohesion, tone, or spelling. |
Generics in C# (and other .NET languages) were added as part of .NET Framework 2.0 in November 2005. Although similar to Java, .NET generics are implemented in the runtime and not as a conversion from generic types to non-generic types in the compiler. This design choice is leveraged to provide additional features. [2]
Features of .NET generics:
- Because there is no type erasure and because generics are built into the CLR — and not solely on the compiler) — there is no performance hit from casting and performing dynamic checks. In addition, the programmer is able to access generic information via reflection. [3] [4]
- The lack of type erasure allows the creation of an array of a generic type, which is not possible in Java.
- .NET allows primitives as generic type parameters because primitives are also objects in .NET.
- Like Java, .NET allows generic type parameters to be generic types themselves. That is, code like
List<List<Dictionary<int, int>>> x = new List<List<Dictionary<int, int>>>();
works. - C# (and .NET in general) allows a wide range of generic type constraints using the
where
keyword [5] including restricting generic types to be value types, to be classes, to have constructors, and to inherit from interfaces; type constraints allow C# generics to become more latent like C++ templates. Type constraints allow code likeperson.GetFirstName
[6]:
interface IPerson { string GetFirstName(); string GetLastName(); }
class Speaker { public void speakTo<T>(T person) where T : IPerson { string name = person.GetFirstName(); this.say("Hello, " + name); } }
Without type constraints on T
, person
could only have the members of the base object type. Java supports a similar mechanism via using the extends
keyword in generic type signatures. [7]
Generic programming features in other languages
Many functional programming languages support small-scale generic programming in the form of parameterized types and parametric polymorphism. In addition, Standard ML and OCaml provide functors, which are similar to class templates and to Ada's generic packages.
Recent and experimental developments
![]() | This article contains promotional content. |
As of 2006, the development of language constructs and programming techniques to support a greater degrees of genericity and code reuse, particularly in functional programming languages, is an active research area. Many of the recent contributions are presented as extensions to Haskell and rely to some extent on that language's type class mechanism.
PolyP
PolyP was the first generic programming language extension to Haskell. In PolyP, generic functions are called polytypic. The language introduces a special construct in which such polytypic functions can be defined via structural induction over the structure of the pattern functor of a regular datatype. Regular datatypes in PolyP are a subset of Haskell datatypes. A regular datatype t must be of kind * → *, and if a is the formal type argument in the definition, then all recursive calls to t must have the form t a. These restrictions rule out higher kinded datatypes as well as nested datatypes, where the recursive calls are of a different form. The flatten function in PolyP is here provided as an example:
flatten :: Regular d => d a -> [a] flatten = cata fl polytypic fl :: f a [a] -> [a] case f of g+h -> either fl fl g*h -> \(x,y) -> fl x ++ fl y () -> \x -> [] Par -> \x -> [x] Rec -> \x -> x d@g -> concat . flatten . pmap fl Con t -> \x -> [] cata :: Regular d => (FunctorOf d a b -> b) -> d a -> b
Generic Haskell
Generic Haskell is another extension to Haskell, developed at Utrecht University. The extensions it provides are:
- Type-indexed values are defined as a value indexed over the various Haskell type constructors (unit, primitive types, sums, products, and user-defined type constructors). In addition, we can also specify the behaviour of a type-indexed values for a specific constructor using constructor cases, and reuse one generic definition in another using default cases.
The resulting type-indexed value can be specialised to any type.
- Kind-indexed types are types indexed over kinds, defined by giving a case for both * and k → k'. Instances are obtained by applying the kind-indexed type to a kind.
- Generic definitions can be used by applying them to a type or kind. This is called generic application. The result is a type or value, depending on which sort of generic definition is applied.
- Generic abstraction enables generic definitions be defined by abstracting a type parameter (of a given kind).
- Type-indexed types are types which are indexed over the type constructors. These can be used to give types to more involved generic values. The resulting type-indexed types can be specialised to any type.
As an example, the equality function in Generic Haskell:
type Eq {[ * ]} t1 t2 = t1 -> t2 -> Bool type Eq {[ k -> l ]} t1 t2 = forall u1 u2. Eq {[ k ]} u1 u2 -> Eq {[ l ]} (t1 u1) (t2 u2) eq {| t :: k |} :: Eq {[ k ]} t t eq {| Unit |} _ _ = True eq {| :+: |} eqA eqB (Inl a1) (Inl a2) = eqA a1 a2 eq {| :+: |} eqA eqB (Inr b1) (Inr b2) = eqB b1 b2 eq {| :+: |} eqA eqB _ _ = False eq {| :*: |} eqA eqB (a1 :*: b1) (a2 :*: b2) = eqA a1 a2 && eqB b1 b2 eq {| Int |} = (==) eq {| Char |} = (==) eq {| Bool |} = (==)
The "Scrap your boilerplate" approach
The Scrap your boilerplate approach is a lightweight generic programming approach for Haskell (Lämmel and Peyton Jones, 2003). The approach is supported in the GHC >= 6.0 implementation of Haskell. Using this approach, the programmer can write generic functions such as traversal schemes (e.g., everywhere
and everything
), as well as generic read, generic show and generic equality (i.e., gread
, gshow
, and geq
). This approach is based on just a few primitives for type-safe cast and processing constructor applications.
See also
External references and links
- Dæv Clarke, Johan Jeuring and Andres Löh, The Generic Haskell user's guide
- Gabriel Dos Reis and Jaakko Järvi (LCSD 2005), What is Generic Programming?
- Ralf Hinze. "Generics for the Masses." In Proceedings of the ACM SIGPLAN International Conference on Functional Programming (ICFP), 2004.
- Ralf Lämmel and Simon Peyton Jones. "Scrap Your Boilerplate: A Practical Design Pattern for Generic Programming." In Proceedings of the ACM SIGPLAN International Workshop on Types in Language Design and Implementation (TLDI'03), 2003. Also see the web page devoted to this research.
- Andres Löh. Exploring Generic Haskell. Ph.D. thesis, Utrecht University, 2004. ISBN 90-393-3765-9
- Bertrand Meyer. Genericity vs Inheritance. In OOPSLA (First ACM Conference on Object-Oriented Programming Systems, Languages and Applications), Portland (Oregon), September 29 - October 2, 1986, pages 391-405.
- Simon Peyton Jones, editor. The Haskell 98 Language Report. Revised 2002.
- Peter Sestoft. Java Precisely, Second Edition. MIT Press, 2005. ISBN 0-262-69325-9
- David Vandevoorde, C++ Templates: The Complete Guide, 2003 Addison-Wesley. ISBN 0-201-73484-2
- Papers by Alexander A. Stepanov (creator of the STL)
- Introducing Generics in the Microsoft CLR (MSDN Magazine)
- More on Generics in the Microsoft CLR (MSDN Magazine)
- Generic Haskell, a language for generic programming
- Generic Programming In Java
- Java Generics Programming Tutorial (PDF)
- Templates Revisited, by Walter Bright.