Jump to content

Programming language

From Wikipedia, the free encyclopedia
This is an old revision of this page, as edited by Pi Delport (talk | contribs) at 16:16, 12 May 2006 (Definitions of programming language: wording around HTML Turing completeness). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.
Computer code (HTML with JavaScript) in a tool that uses syntax highlighting (colors) to help the developer see the purpose of each piece of code.

A programming language is a stylized communication technique intended to be used for controlling the behaviour of a machine (often a computer). Like human languages programming languages have syntactic and semantic rules used to define meaning.

Thousands[1] of different programming languages have been created and new ones are created every year. (see list of programming languages). Few languages ever become sufficiently popular that they are used by more than a few people, but professional programmers are likely to use dozens of different languages during their career.

Definitions of programming language

There is no universally agreed definition for the term programming language. The following is a list of some of the methods that have been used to categorize a language as being a programming language.

  • What it is used for. For instance, a programming language is a language used to write programs.
  • Those involved in the interaction. For instance, a programming language differs from natural languages in that natural languages are used for interaction between people, while programming languages are used for communication from people to machines (this rules out languages used for computer to computer interaction).
  • The constructs it contains. For instance, a programming language contains constructs for defining and manipulating data structures, and for controlling the flow of execution.
  • Its expressive power. The theory of computation provides a classification of languages based on the range of computations expressible by them, with the most expressive language being one that is Turing complete (the language needs to contain at least a goto construct and a method of storing values, ie, variables). Any algorithm that can be implemented in a language that is Turing complete can also be implemented in another language that is Turing complete. Some examples of languages that are not Turing complete are HTML (although it can contain Turing complete languages such as PHP and Javascript) and SQL (SQL vendors invariably add language extensions that create a Turing complete language, e.g., PL/SQL).

Specification

The specification of a language can take several forms, including:

  • a description of the language's syntax and semantics (e.g., C language). The latter usually lists the results of performing an operation and restrictions on the kinds of operations that can be performed.
  • a description of the behavior of a translator (e.g., C++). The syntax and semantics of the language can be inferred from this.
  • the language is defined in terms of an executable specification (e.g., Prolog), often written in that language.

Features of programming language

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

Those languages that are widely used – or have been used for a considerable period of time – have standardization bodies that meet regularly to create and publish formal definitions of the language and discuss the extension of existing definitions.

Type system

A type system defines how a programming language classifies values and variables into types, how it can manipulate those types and how they interact. The design and study of type systems is known as type theory.

Internally, all data in modern digital computers are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements, so the low-level binary data are organized by programming languages into these high-level concepts as data types. There are also more abstract types whose purpose is just to warn the programmer about semantically meaningless statements.

Languages can be classified with respect to their type systems.

static vs. dynamic typing
This distinction refers to when values have a type associated with them. In a dynamically typed language types are an attribute of data, and thus a variable acquires a type by being assigned a particular value at runtime. In statically typed languages, variables are given a type by the programmer at compile time, and data assigned to the variable must match its type.
strong vs. weak typing
This distinction refers to the behavior of the language when values are operated on. In a strongly typed language, operations must match the types of the values operated on (addition must be performed on numbers, for example). In weakly typed languages operations might be made on values which are inappropriate.

Type inference is a mechanism whereby the type specifications can often be omitted completely, if it is possible for the compiler to infer the types of values from the contexts in which they are used – for example, if a variable is assigned the value 1, a type-inferring compiler does not need to be told explicitly that the variable is an integer. There are however many different uses for integers; it might e.g. make sense in a program to prevent inadvertent adding of a phone number to the number of apples in a box. Therefore some languages such as Ada allow defining different kinds of incompatible integers; this is called strong typing. Type-inferred languages can be more flexible to use, particularly when they also implement parametric polymorphism.

It is possible to perform type inference on programs written in a dynamically typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes dynamically typed languages are called latently typed.

Note that strong vs. weak is a continuum; Java is a strongly typed language relative to C, but is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.

Strong and static are orthogonal concepts. See the cross reference table below. But beware that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed – in the latter usage, C would be called strongly typed, despite the fact that C doesn't catch that many type errors and that it's both trivial and common to defeat its type system (even accidentally).

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often restrict names to denoting run-time computed values directly, instead of naming memory locations where values may be stored, and in some cases refuse to allow the value denoted by a name to be modified at all. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.

Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Type system cross reference list

This is a comparison of the features of the type systems and type checking of multiple programming languages.

Brief definitions

  • A nominal type system means that the language decides whether types are compatible and/or equivalent based on explicit declarations and names.
  • A structural type system means that the language decides whether types are compatible and/or equivalent based on the definition and characteristics of the types.
  • Type checking determines whether and when types are verified. Static checking means that type errors are reported based on a program's text (source code). Dynamic checking means that type errors are reported based on a program's dynamic (run-time) behavior.
Language Type safety Type expression Type compatibility and equivalence Type checking
A+ strong dynamic
ActionScript 3.0 strong implicit with optional explicit typing static
ABC strong
ABAP strong nominal static
Ada strong[TS 1] explicit nominal static
Agda strong nominal static
Aldor weak partially implicit[2] static
Alef strong static
ALGOL 58 strong explicit static
ALGOL 60 strong explicit static
ALGOL 68 strong explicit structural static & tagged unions
ALGOL W strong static
Alice strong implicit with optional explicit static
Alma-0 static
AmbientTalk strong dynamic
AMOS BASIC static
AngelScript strong static
APL strong dynamic
AppleScript weak dynamic
Arc dynamic
Assembly ? ? ? ?
AutoHotkey typeless
AutoLISP dynamic
Ateji PX strong explicit nominal static
AWK weak implicit dynamic
B typeless
Ballerina strong structural static
Bash ? ? ? ?
BASIC strong explicit nominal static
BCPL typeless
BeanShell strong nominal dynamic
BLISS typeless
Boo strong implicit with optional explicit typing static with optional dynamic typing
Bro strong implicit with optional explicit typing nominal static
C weak explicit nominal static
C-- weak static
C++ (ISO/IEC 14882) strong explicit with optional implicit typing (by using auto in C++11) nominal static[TS 2]
C* weak explicit static
C# strong [3] implicit with optional explicit typing nominal static[TS 3]
C shell ? ? ? ?
Caml strong implicit with optional explicit typing static
Cecil dynamic with optional static typing
Clean strong implicit static
Ceylon strong static
Chapel implicit with optional explicit typing static
CHILL strong static
ChucK strong
Cilk weak explicit static
Claire strong implicit with optional explicit typing dynamic with optional static typing
Clean strong ?
Clojure strong implicit with optional explicit typing dynamic
CLU strong
COBOL strong explicit nominal static
Cobra strong explicit with optional implicit typing static with optional dynamic typing
CoffeeScript implicit dynamic
ColdFusion (CFML) strong implicit dynamic
COMAL strong
Common Lisp strong implicit with optional explicit typing structural for implicit typing, nominal for explicit typing dynamic, some static checking(depending on implementation)
Component Pascal strong static
Cool strong explicit static
CORAL strong static
Crystal implicit with optional explicit typing[4] structural static
Cuneiform explicit static
Curl strong nominal
Curry strong implicit with optional explicit typing static
Cython strong implicit with optional explicit typing nominal (extension types) and structural (Python) dynamic with optional static typing
D weak[TS 4] explicit nominal static
Dart strong[5] gradual typing nominal static with optional dynamic typing
Dylan strong dynamic
Eiffel strong nominal static
Elixir strong implicit dynamic
Erlang strong implicit dynamic
Euphoria strong explicit, implicit with objects nominal static, dynamic with objects
F# strong implicit nominal static
Forth typeless
Fortran strong explicit[TS 5] nominal static
Gambas strong explicit nominal
GLBasic strong explicit. Non-explicit declarations available through project options nominal static
Gleam strong implicit with optional explicit nominal static
Go[6] strong partially implicit (local type inference) structural static
Gosu strong partially implicit (local type inference) nominal (subclassing) and structural static
Groovy strong implicit with optional explicit typing dynamic with optional static typing
Harbour strong implicit with optional explicit typing dynamic
Haskell strong implicit with optional explicit typing nominal[7][8] static
Haxe strong implicit with optional explicit typing nominal (subclassing) and structural static with optional dynamic typing
Io strong implicit dynamic
icon strong implicit dynamic
ISLISP strong dynamic
J strong dynamic
Java strong[9] explicit nominal static
JavaScript weak implicit dynamic
Julia strong implicit with optional explicit typing[10] structural for implicit typing, nominal for explicit typing dynamic
Joy strong dynamic
Kotlin strong partially implicit (local type inference) nominal static
LabVIEW strong
Lua strong implicit dynamic
Maple strong dynamic
Mercury strong static
Mathematica strong dynamic
MATLAB M-code strong dynamic
Modula-2 weak[TS 4] explicit nominal static
Modula-3 weak[TS 4] explicit structural static
MUMPS (M) typeless
Neko dynamic
Nemerle strong implicit nominal static
NetLogo strong implicit dynamic
NetRexx strong implicit with optional explicit dynamic with optional static typing
newLisp implicit dynamic
NEWP strong static
Newspeak dynamic
NewtonScript dynamic
Nial dynamic
Nim strong partially implicit (type inference) static
Nickle strong
Nu dynamic
Oberon strong explicit nominal static and partially dynamic[TS 6]
Objective-C weak explicit nominal dynamic with optional static typing[11]
OCaml strong implicit with optional explicit typing nominal for records,[12] structural for objects[8][13] static
Object Pascal strong explicit nominal static
Opa strong implicit with optional explicit typing structural static
Oxygene weak implicit static
Oz-Mozart strong implicit structural dynamic
Pascal weak[TS 4] explicit nominal static
Perl 5 implicit dynamic
PHP weak implicit with optional explicit typing nominal dynamic
Plus strong explicit structural static, dynamic (optional)
Prolog dynamic
Pure dynamic
PureScript strong implicit with optional explicit typing nominal static
Python strong implicit (with optional explicit typing as of 3.5) nominal dynamic
R implicit dynamic
Raku partially implicit[TS 7] dynamic with optional static typing
REBOL strong implicit dynamic
Rexx typeless —, implicit wrt numbers static+dynamic wrt numbers
RPG weak static
Ruby strong implicit dynamic
Rust strong explicit with optional implicit typing[14] mostly nominal static
S dynamic
S-Lang strong implicit dynamic
Scala strong partially implicit (local type inference) nominal (subclassing) and structural static
Scheme strong implicit dynamic (latent)
Seed7 strong explicit nominal static
Simula strong static[TS 8]
Smalltalk strong implicit dynamic
Swift strong partially implicit (local type inference) nominal (subclassing) and structural static
Standard ML strong implicit with optional explicit typing structural static
Tcl dynamic
TypeScript strong optional structural static
Unicon strong implicit dynamic
Visual Basic strong implicit with optional explicit typing nominal static
Visual Basic (.NET) weak[TS 4] explicit static
Visual Prolog strong partially implicit nominal static
Wolfram Language strong dynamic
Windows PowerShell strong implicit dynamic
XL strong nominal static
Xojo strong explicit nominal static
XPath/XQuery strong partially implicit nominal dynamic with optional static typing
Language Type safety Type expression Type compatibility and equivalence Type checking

Notes

  1. ^ Unsafe operations are well isolated by a "Unchecked_" prefix.
  2. ^ with optional dynamic type casting (see dynamic cast)
  3. ^ with optional dynamic type (see dynamic member lookup)
  4. ^ a b c d e It is almost safe, unsafe features are not commonly used.
  5. ^ Optionally, typing can be explicitly implied by the first letter of the identifier (known as implicit typing within the Fortran community).
  6. ^ dynamic checking of type extensions i.e. inherited types
  7. ^ explicit for static types
  8. ^ optional for formal and virtual procedures

References

  1. ^ As of March 2006 The Encyclopedia of Computer Languages by Murdoch University, Australia lists 8276 computer languages.
  2. ^ Aldor User Guide (PDF). Aldor.org. 2002. pp. 40, 61. Retrieved 3 June 2021.
  3. ^ https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/types/
  4. ^ "Type Inference Crystal". Crystal Language Reference. Retrieved 3 June 2021.
  5. ^ "The Dart type system". dart.dev. Retrieved 2020-04-08.
  6. ^ The Go Programming Language Specification
  7. ^ Löh, Andres. "Why does Haskell not have records with structural typing?". Stack Overflow. Archived from the original on 2016-03-14. Retrieved 2020-04-13.
  8. ^ a b King, Alexis (2020-01-19). "No, dynamic type systems are not inherently more open". lexi-lambda.github.io. Archived from the original on 2020-03-01. Retrieved 2020-04-13.
  9. ^ Sheng Liang, Gilad Bracha. Dynamic class loading in the Java virtual machine. Volume 33, Issue 10 of ACM SIGPLAN Notices, October 1998.
  10. ^ "Types · the Julia Language". Archived from the original on 2018-07-24. Retrieved 2018-07-24.
  11. ^ Developer.apple.com Archived June 10, 2009, at the Wayback Machine
  12. ^ "Record · Reason". reasonml.github.io. Archived from the original on 2020-03-31. Retrieved 2020-04-13.
  13. ^ "Structural type system", Wikipedia, 2019-03-29, retrieved 2020-04-13
  14. ^ "rust-lang/rustc-dev-guide". GitHub. Retrieved 2020-04-08.


Data structures

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).

Object oriented languages allow the programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Languages such as Python and Ruby have developed as OO (Object oriented) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. It is commonly thought that object-orientation makes languages more intuitive, increasing the public availability and power of customized computer applications.

Instruction and control flow

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure.

Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).

Design philosophy

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognizing and using the special vocabulary is reduced by help messages generated by the programming language implementation.

There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language, Prolog, PostScript, and the members of the Lisp family (Common Lisp, Scheme) provide this capability. In MUMPS language this technique is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

Compilation and interpretation

There are, broadly, two approaches to execute a program written in a given language. These approaches are known as compilation, done by a program known as a compiler; and interpretation, done by an interpreter. Some programming language implementations support both interpretation and compilation.

An interpreter parses a computer program and executes it directly. One can imagine this as following the instructions of the program line-by-line. In contrast, a compiler translates the program into machine code – the native instructions understood by the computer's processor. The compiled program can then be run by itself.

Compiled programs usually run faster than interpreted ones, because the overhead of understanding and translating the programming language syntax has already been done. However, interpreters are frequently easier to write than compilers, and can more easily support interactive debugging of a program.

Many modern languages use a mixture of compilation and interpretation. For example, the "compiler" for a bytecode-based language translates the source code into a partially compiled intermediate format, which is later run by a fast interpreter called a virtual machine. Some "interpeters" actually use a just-in-time compiler, which compiles the code to machine language immediately before running it. These techniques are often combined. Like other aspects of programming languages, "compiled" and "interpreted" may be best understood as opposite ends of a spectrum, rather than the only two options.

History of programming languages

The development of programming languages follows closely the development of the physical and electronic processes used in today's computers.

Programming languages have been under development for years and will remain so for many years to come. They got their start with a list of steps to wire a computer to perform a task. These steps eventually found their way into software and began to acquire newer and better features. The first major languages were characterized by the simple fact that they were intended for one purpose and one purpose only, while the languages of today are differentiated by the way they are programmed in, as they can be used for almost any purpose. And perhaps the languages of tomorrow will be more natural with the invention of quantum and biological computers.

Charles Babbage is often credited with designing the first computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.

In the 1940s the first recognizably modern, electrically powered computers were created. Some military calculation needs were a driving force in early computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as Plankalkül) like those of today for solving problems.

Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of increasingly reliable and more usable computers. The first widely used high-level programming language was FORTRAN, developed during 1954–57 by an IBM team led by John W. Backus. It is still widely used for numerical work, with the latest international standard released in 2004. A Computer Languages History graphic shows a timeline from FORTRAN in 1954.

Shortly after, Lisp was introduced. Lisp was based on lambda calculus, and is far more regular in its syntax than most non-Lisp derived languages.

Dennis Ritchie developed the C programming language, initially for DEC PDP-11 in 1970.

During the 1970s, Xerox PARC developed Smalltalk, an object oriented language.

Based on the development of Smalltalk and other object oriented languages, Bjarne Stroustrup developed a programming language based on the syntax of C, called C++ in 1985.

Sun Microsystems released Java in 1995 which became very popular as an introductory programming language taught in universities. Microsoft presented the C# programming language in 2001 which is very similar to C++ and Java. There are many, many other languages (cf. List of programming languages).

Classifications of programming languages

Formal semantics

The rigorous definition of the meaning of programming languages is the subject of formal semantics.

See also

Notes

Template:Major programming languages small

Template:Link FA Template:Link FA Template:Link FA