Programming language

A programming language is a stylized communication technique intended to be used for controlling the behaviour of a machine (often a computer). Like human languages programming languages have syntactic and semantic rules used to define meaning.
Thousands[1] of different programming languages have been created and new ones are created every year. (see list of programming languages). Few languages ever become sufficiently popular that they are used by more than a few people, but professional programmers are likely to use dozens of different languages during their career.
Definitions of programming language
There is no universally agreed definition for the term programming language. The following is a list of some of the methods that have been used to categorize a language as being a programming language.
- What it is used for. For instance, a programming language is a language used to write programs.
- Those involved in the interaction. For instance, a programming language differs from natural languages in that natural languages are used for interaction between people, while programming languages are used for communication from people to machines (this rules out languages used for computer to computer interaction).
- The constructs it contains. For instance, a programming language contains constructs for defining and manipulating data structures, and for controlling the flow of execution.
- Its expressive power. The theory of computation provides a classification of languages based on the range of computations expressible by them, with the most expressive language being one that is Turing complete (the language needs to contain at least a goto construct and a method of storing values, ie, variables). Any algorithm that can be implemented in a language that is Turing complete can also be implemented in another language that is Turing complete. Some examples of languages that are not Turing complete are HTML (although it can contain Turing complete languages such as PHP and Javascript) and SQL (SQL vendors invariably add language extensions that create a Turing complete language, e.g., PL/SQL).
Specification
The specification of a language can take several forms, including:
- a description of the language's syntax and semantics (e.g., C language). The latter usually lists the results of performing an operation and restrictions on the kinds of operations that can be performed.
- a description of the behavior of a translator (e.g., C++). The syntax and semantics of the language can be inferred from this.
- the language is defined in terms of an executable specification (e.g., Prolog), often written in that language.
Features of programming language
Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.
These specifications usually include:
- Type system
- Data structures
- Instruction and control flow
- Design philosophy
- Compilation and interpretation
Those languages that are widely used – or have been used for a considerable period of time – have standardization bodies that meet regularly to create and publish formal definitions of the language and discuss the extension of existing definitions.
Type system
A type system defines how a programming language classifies values and variables into types, how it can manipulate those types and how they interact. The design and study of type systems is known as type theory.
Internally, all data in modern digital computers are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements, so the low-level binary data are organized by programming languages into these high-level concepts as data types. There are also more abstract types whose purpose is just to warn the programmer about semantically meaningless statements.
Languages can be classified with respect to their type systems.
- static vs. dynamic typing
- This distinction refers to when values have a type associated with them. In a dynamically typed language types are an attribute of data, and thus a variable acquires a type by being assigned a particular value at runtime. In statically typed languages, variables are given a type by the programmer at compile time, and data assigned to the variable must match its type.
- strong vs. weak typing
- This distinction refers to the behavior of the language when values are operated on. In a strongly typed language, operations must match the types of the values operated on (addition must be performed on numbers, for example). In weakly typed languages operations might be made on values which are inappropriate.
Type inference is a mechanism whereby the type specifications can often be omitted completely, if it is possible for the compiler to infer the types of values from the contexts in which they are used – for example, if a variable is assigned the value 1, a type-inferring compiler does not need to be told explicitly that the variable is an integer. There are however many different uses for integers; it might e.g. make sense in a program to prevent inadvertent adding of a phone number to the number of apples in a box. Therefore some languages such as Ada allow defining different kinds of incompatible integers; this is called strong typing. Type-inferred languages can be more flexible to use, particularly when they also implement parametric polymorphism.
It is possible to perform type inference on programs written in a dynamically typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes dynamically typed languages are called latently typed.
Note that strong vs. weak is a continuum; Java is a strongly typed language relative to C, but is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.
Strong and static are orthogonal concepts. See the cross reference table below. But beware that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed – in the latter usage, C would be called strongly typed, despite the fact that C doesn't catch that many type errors and that it's both trivial and common to defeat its type system (even accidentally).
Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often restrict names to denoting run-time computed values directly, instead of naming memory locations where values may be stored, and in some cases refuse to allow the value denoted by a name to be modified at all. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.
Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.
Type system cross reference list
This is a comparison of the features of the type systems and type checking of multiple programming languages.
Brief definitions
- A nominal type system means that the language decides whether types are compatible and/or equivalent based on explicit declarations and names.
- A structural type system means that the language decides whether types are compatible and/or equivalent based on the definition and characteristics of the types.
- Type checking determines whether and when types are verified. Static checking means that type errors are reported based on a program's text (source code). Dynamic checking means that type errors are reported based on a program's dynamic (run-time) behavior.
Language | Type safety | Type expression | Type compatibility and equivalence | Type checking |
---|---|---|---|---|
A+ | strong | dynamic | ||
ActionScript 3.0 | strong | implicit with optional explicit typing | static | |
ABC | strong | |||
ABAP | strong | nominal | static | |
Ada | strong[TS 1] | explicit | nominal | static |
Agda | strong | nominal | static | |
Aldor | weak | partially implicit[2] | static | |
Alef | strong | static | ||
ALGOL 58 | strong | explicit | static | |
ALGOL 60 | strong | explicit | static | |
ALGOL 68 | strong | explicit | structural | static & tagged unions |
ALGOL W | strong | static | ||
Alice | strong | implicit with optional explicit | static | |
Alma-0 | static | |||
AmbientTalk | strong | dynamic | ||
AMOS BASIC | static | |||
AngelScript | strong | static | ||
APL | strong | dynamic | ||
AppleScript | weak | dynamic | ||
Arc | dynamic | |||
Assembly | ? | ? | ? | ? |
AutoHotkey | typeless | — | — | — |
AutoLISP | dynamic | |||
Ateji PX | strong | explicit | nominal | static |
AWK | weak | implicit | dynamic | |
B | typeless | |||
Ballerina | strong | structural | static | |
Bash | ? | ? | ? | ? |
BASIC | strong | explicit | nominal | static |
BCPL | typeless | |||
BeanShell | strong | nominal | dynamic | |
BLISS | typeless | — | — | — |
Boo | strong | implicit with optional explicit typing | static with optional dynamic typing | |
Bro | strong | implicit with optional explicit typing | nominal | static |
C | weak | explicit | nominal | static |
C-- | weak | static | ||
C++ (ISO/IEC 14882) | strong | explicit with optional implicit typing (by using auto in C++11) | nominal | static[TS 2] |
C* | weak | explicit | static | |
C# | strong [3] | implicit with optional explicit typing | nominal | static[TS 3] |
C shell | ? | ? | ? | ? |
Caml | strong | implicit with optional explicit typing | static | |
Cecil | dynamic with optional static typing | |||
Clean | strong | implicit | static | |
Ceylon | strong | static | ||
Chapel | implicit with optional explicit typing | static | ||
CHILL | strong | static | ||
ChucK | strong | |||
Cilk | weak | explicit | static | |
Claire | strong | implicit with optional explicit typing | dynamic with optional static typing | |
Clean | strong | ? | ||
Clojure | strong | implicit with optional explicit typing | dynamic | |
CLU | strong | |||
COBOL | strong | explicit | nominal | static |
Cobra | strong | explicit with optional implicit typing | static with optional dynamic typing | |
CoffeeScript | implicit | dynamic | ||
ColdFusion (CFML) | strong | implicit | dynamic | |
COMAL | strong | |||
Common Lisp | strong | implicit with optional explicit typing | structural for implicit typing, nominal for explicit typing | dynamic, some static checking(depending on implementation) |
Component Pascal | strong | static | ||
Cool | strong | explicit | static | |
CORAL | strong | static | ||
Crystal | implicit with optional explicit typing[4] | structural | static | |
Cuneiform | explicit | static | ||
Curl | strong | nominal | ||
Curry | strong | implicit with optional explicit typing | static | |
Cython | strong | implicit with optional explicit typing | nominal (extension types) and structural (Python) | dynamic with optional static typing |
D | weak[TS 4] | explicit | nominal | static |
Dart | strong[5] | gradual typing | nominal | static with optional dynamic typing |
Dylan | strong | dynamic | ||
Eiffel | strong | nominal | static | |
Elixir | strong | implicit | dynamic | |
Erlang | strong | implicit | dynamic | |
Euphoria | strong | explicit, implicit with objects | nominal | static, dynamic with objects |
F# | strong | implicit | nominal | static |
Forth | typeless | — | — | — |
Fortran | strong | explicit[TS 5] | nominal | static |
Gambas | strong | explicit | nominal | |
GLBasic | strong | explicit. Non-explicit declarations available through project options | nominal | static |
Gleam | strong | implicit with optional explicit | nominal | static |
Go[6] | strong | partially implicit (local type inference) | structural | static |
Gosu | strong | partially implicit (local type inference) | nominal (subclassing) and structural | static |
Groovy | strong | implicit with optional explicit typing | dynamic with optional static typing | |
Harbour | strong | implicit with optional explicit typing | dynamic | |
Haskell | strong | implicit with optional explicit typing | nominal[7][8] | static |
Haxe | strong | implicit with optional explicit typing | nominal (subclassing) and structural | static with optional dynamic typing |
Io | strong | implicit | dynamic | |
icon | strong | implicit | dynamic | |
ISLISP | strong | dynamic | ||
J | strong | dynamic | ||
Java | strong[9] | explicit | nominal | static |
JavaScript | weak | implicit | — | dynamic |
Julia | strong | implicit with optional explicit typing[10] | structural for implicit typing, nominal for explicit typing | dynamic |
Joy | strong | dynamic | ||
Kotlin | strong | partially implicit (local type inference) | nominal | static |
LabVIEW | strong | |||
Lua | strong | implicit | dynamic | |
Maple | strong | dynamic | ||
Mercury | strong | static | ||
Mathematica | strong | dynamic | ||
MATLAB M-code | strong | dynamic | ||
Modula-2 | weak[TS 4] | explicit | nominal | static |
Modula-3 | weak[TS 4] | explicit | structural | static |
MUMPS (M) | typeless | — | — | — |
Neko | dynamic | |||
Nemerle | strong | implicit | nominal | static |
NetLogo | strong | implicit | dynamic | |
NetRexx | strong | implicit with optional explicit | dynamic with optional static typing | |
newLisp | implicit | dynamic | ||
NEWP | strong | static | ||
Newspeak | dynamic | |||
NewtonScript | dynamic | |||
Nial | dynamic | |||
Nim | strong | partially implicit (type inference) | static | |
Nickle | strong | |||
Nu | dynamic | |||
Oberon | strong | explicit | nominal | static and partially dynamic[TS 6] |
Objective-C | weak | explicit | nominal | dynamic with optional static typing[11] |
OCaml | strong | implicit with optional explicit typing | nominal for records,[12] structural for objects[8][13] | static |
Object Pascal | strong | explicit | nominal | static |
Opa | strong | implicit with optional explicit typing | structural | static |
Oxygene | weak | implicit | static | |
Oz-Mozart | strong | implicit | structural | dynamic |
Pascal | weak[TS 4] | explicit | nominal | static |
Perl 5 | implicit | dynamic | ||
PHP | weak | implicit with optional explicit typing | nominal | dynamic |
Plus | strong | explicit | structural | static, dynamic (optional) |
Prolog | dynamic | |||
Pure | dynamic | |||
PureScript | strong | implicit with optional explicit typing | nominal | static |
Python | strong | implicit (with optional explicit typing as of 3.5) | nominal | dynamic |
R | implicit | dynamic | ||
Raku | partially implicit[TS 7] | dynamic with optional static typing | ||
REBOL | strong | implicit | dynamic | |
Rexx | typeless | —, implicit wrt numbers | — | static+dynamic wrt numbers |
RPG | weak | static | ||
Ruby | strong | implicit | — | dynamic |
Rust | strong | explicit with optional implicit typing[14] | mostly nominal | static |
S | dynamic | |||
S-Lang | strong | implicit | dynamic | |
Scala | strong | partially implicit (local type inference) | nominal (subclassing) and structural | static |
Scheme | strong | implicit | dynamic (latent) | |
Seed7 | strong | explicit | nominal | static |
Simula | strong | static[TS 8] | ||
Smalltalk | strong | implicit | dynamic | |
Swift | strong | partially implicit (local type inference) | nominal (subclassing) and structural | static |
Standard ML | strong | implicit with optional explicit typing | structural | static |
Tcl | dynamic | |||
TypeScript | strong | optional | structural | static |
Unicon | strong | implicit | dynamic | |
Visual Basic | strong | implicit with optional explicit typing | nominal | static |
Visual Basic (.NET) | weak[TS 4] | explicit | static | |
Visual Prolog | strong | partially implicit | nominal | static |
Wolfram Language | strong | dynamic | ||
Windows PowerShell | strong | implicit | dynamic | |
XL | strong | nominal | static | |
Xojo | strong | explicit | nominal | static |
XPath/XQuery | strong | partially implicit | nominal | dynamic with optional static typing |
Language | Type safety | Type expression | Type compatibility and equivalence | Type checking |
Notes
- ^ Unsafe operations are well isolated by a "Unchecked_" prefix.
- ^ with optional dynamic type casting (see dynamic cast)
- ^ with optional dynamic type (see dynamic member lookup)
- ^ a b c d e It is almost safe, unsafe features are not commonly used.
- ^ Optionally, typing can be explicitly implied by the first letter of the identifier (known as implicit typing within the Fortran community).
- ^ dynamic checking of type extensions i.e. inherited types
- ^ explicit for static types
- ^ optional for formal and virtual procedures
References
- ^ As of March 2006 The Encyclopedia of Computer Languages by Murdoch University, Australia lists 8276 computer languages.
- ^ Aldor User Guide (PDF). Aldor.org. 2002. pp. 40, 61. Retrieved 3 June 2021.
- ^ https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/types/
- ^ "Type Inference Crystal". Crystal Language Reference. Retrieved 3 June 2021.
- ^ "The Dart type system". dart.dev. Retrieved 2020-04-08.
- ^ The Go Programming Language Specification
- ^ Löh, Andres. "Why does Haskell not have records with structural typing?". Stack Overflow. Archived from the original on 2016-03-14. Retrieved 2020-04-13.
- ^ a b King, Alexis (2020-01-19). "No, dynamic type systems are not inherently more open". lexi-lambda.github.io. Archived from the original on 2020-03-01. Retrieved 2020-04-13.
- ^ Sheng Liang, Gilad Bracha. Dynamic class loading in the Java virtual machine. Volume 33, Issue 10 of ACM SIGPLAN Notices, October 1998.
- ^ "Types · the Julia Language". Archived from the original on 2018-07-24. Retrieved 2018-07-24.
- ^ Developer.apple.com Archived June 10, 2009, at the Wayback Machine
- ^ "Record · Reason". reasonml.github.io. Archived from the original on 2020-03-31. Retrieved 2020-04-13.
- ^ "Structural type system", Wikipedia, 2019-03-29, retrieved 2020-04-13
- ^ "rust-lang/rustc-dev-guide". GitHub. Retrieved 2020-04-08.
Data structures
Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).
Object oriented languages allow the programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Languages such as Python and Ruby have developed as OO (Object oriented) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. It is commonly thought that object-orientation makes languages more intuitive, increasing the public availability and power of customized computer applications.
Instruction and control flow
Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure.
Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).
Design philosophy
For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.
Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognizing and using the special vocabulary is reduced by help messages generated by the programming language implementation.
There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language, Prolog, PostScript, and the members of the Lisp family (Common Lisp, Scheme) provide this capability. In MUMPS language this technique is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.
Compilation and interpretation
There are, broadly, two approaches to execute a program written in a given language. These approaches are known as compilation, done by a program known as a compiler; and interpretation, done by an interpreter. Some programming language implementations support both interpretation and compilation.
An interpreter parses a computer program and executes it directly. One can imagine this as following the instructions of the program line-by-line. In contrast, a compiler translates the program into machine code – the native instructions understood by the computer's processor. The compiled program can then be run by itself.
Compiled programs usually run faster than interpreted ones, because the overhead of understanding and translating the programming language syntax has already been done. However, interpreters are frequently easier to write than compilers, and can more easily support interactive debugging of a program.
Many modern languages use a mixture of compilation and interpretation. For example, the "compiler" for a bytecode-based language translates the source code into a partially compiled intermediate format, which is later run by a fast interpreter called a virtual machine. Some "interpeters" actually use a just-in-time compiler, which compiles the code to machine language immediately before running it. These techniques are often combined. Like other aspects of programming languages, "compiled" and "interpreted" may be best understood as opposite ends of a spectrum, rather than the only two options.
History of programming languages
The development of programming languages follows closely the development of the physical and electronic processes used in today's computers.
Programming languages have been under development for years and will remain so for many years to come. They got their start with a list of steps to wire a computer to perform a task. These steps eventually found their way into software and began to acquire newer and better features. The first major languages were characterized by the simple fact that they were intended for one purpose and one purpose only, while the languages of today are differentiated by the way they are programmed in, as they can be used for almost any purpose. And perhaps the languages of tomorrow will be more natural with the invention of quantum and biological computers.
Charles Babbage is often credited with designing the first computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.
In the 1940s the first recognizably modern, electrically powered computers were created. Some military calculation needs were a driving force in early computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as Plankalkül) like those of today for solving problems.
Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of increasingly reliable and more usable computers. The first widely used high-level programming language was FORTRAN, developed during 1954–57 by an IBM team led by John W. Backus. It is still widely used for numerical work, with the latest international standard released in 2004. A Computer Languages History graphic shows a timeline from FORTRAN in 1954.
Shortly after, Lisp was introduced. Lisp was based on lambda calculus, and is far more regular in its syntax than most non-Lisp derived languages.
Dennis Ritchie developed the C programming language, initially for DEC PDP-11 in 1970.
During the 1970s, Xerox PARC developed Smalltalk, an object oriented language.
Based on the development of Smalltalk and other object oriented languages, Bjarne Stroustrup developed a programming language based on the syntax of C, called C++ in 1985.
Sun Microsystems released Java in 1995 which became very popular as an introductory programming language taught in universities. Microsoft presented the C# programming language in 2001 which is very similar to C++ and Java. There are many, many other languages (cf. List of programming languages).
Classifications of programming languages
- Array
- Aspect-oriented
- Assembly
- Concatenative
- Concurrent
- Curly-bracket
- Data-structured
- Dataflow
- Declarative
- Domain-specific
- Dynamic
- Educational
- Esoteric
- Functional
- General-purpose
- Imperative
- Interface description
- Logic
- Multiparadigm
- Object-oriented
- Pattern directed invocation
- Procedural
- Quantum
- Reflective
- Scripting
- Synchronous
- Visual
- Lists of all programming languages
Formal semantics
The rigorous definition of the meaning of programming languages is the subject of formal semantics.
See also
- Computer language
- Compiler
- Interpreter
- Binding
- Hello world program, examples of a simple program in many different programming languages
- Software engineering and List of software engineering topics
- Lists of computer syntax patterns
- Reserved word
- Keyword (Computer)
- Programming language dialect
- Category:Programming languages
- Comparison of programming languages
Notes
External links
- Syntax Patterns for Various Languages
- 99 Bottles of Beer - a collection of implementations in many languages
- History of Programming Languages (HOPL)
- Open Directory - Computer Programming Languages
- Dictionary of Programming Languages
- Programming Language Comparison
- Which Programming Language is right for you?