PostgreSQL

This is an old revision of this page, as edited by 213.76.102.26 (talk) at 00:10, 28 May 2004 (Linked the words "proprietary" and "free software"). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

PostgreSQL is a free software object-relational database server (database management system), released under the flexible BSD-style license. It is an alternative to other open-source database systems (such as MySQL, Firebird, and MaxDB), as well as proprietary systems such as Oracle, IBM's DB2 and Microsoft SQL Server.

The official pronunciation of "PostgreSQL" is "Post-Gress-Q-L" (listen here (5.6k MP3)).

History

PostgreSQL is the ultimate result of a long evolution starting with the Ingres project at UC Berkeley. The project lead, Michael Stonebraker had left Berkeley to commercialize Ingres in 1982, but eventually returned to academia. After returning to Berkeley in 1985, Stonebraker started a post-Ingres project to address the problems with the relational database model that had become increasing clear during the early 1980s. Primary among these was the relational model's inability to understand "types", combination of simpler data that make up a single unit. Today we typically refer to these as objects.

The resulting project, named Postgres, aimed at introducing the minimum number of features need to add complete types support. These included the ability to define types, but also the ability to fully describe relationships – which up until this time had been widely used but maintained entirely by the user. In Postgres the database "understood" relationships, and could retrieve information in related tables in a natural way using rules.

Starting in 1986 the team released a number of papers describing the basis of the system, and by 1988 had a prototype version up and running. Version 1 was released to a small number of users in June 1989, followed by Version 2 with a re-written rules system in June 1990. 1991's Version 3 re-wrote the rules system again, but also added support for multiple storage managers, and an improved query engine. By 1993 there were a huge number of users and the project was being overwhelmed with requests for support and features. After releasing a Version 4 primarily as a cleanup, the project ended.

Although the Postgres project had officially ended, the BSD license under which Postgres was released enabled Open Source developers to obtain a copy and develop it further. In 1994 two UC Berkeley graduate students, Andrew Yu and Jolly Chen, added a SQL language interpreter to replace the earlier QUEL system Ingres had been based on, creating Postgres95. The code was subsequently released to the web to find its own way in the world. Postgres95 was an open source descendant of this original Berkeley code. In 1996 it was decided that the project should be renamed; in order to reflect the database's new SQL query language, Postgres95 was renamed to PostgreSQL. The first PostgreSQL release was version 6.0. The software has been subsequently maintained by a group of database developers and volunteers from around the world, coordinated via the Internet. Since 6.0, many subsequent releases have been made, and many improvements have been made to the system; as of this writing, the current release series is 7.4, with 7.5 expected in mid to late 2004.

Although the license allowed for the commercialization of Postgres, the Postgres code was not developed commercially with the same rapidity as Ingres, which is somewhat surprising considering the advantages the product offered. The main offshoot was created when Michael Stonebraker and Paula Hawthorn, an original Ingres team member who moved from Ingres, formed Illustra Information Technologies to commercialize Postgres.

Illustra's product was first introduced in 1991, where it was used in the Sequoia 2000 project late that year. By 1995 the product had added an ability to write plug-in modules they referred to as DataBlades. Unlike other plug-in technologies, with DataBlades external authors could write code to create new low-level datatypes, and tell the database how to store, index and manage it. For instance, one of the most popular DataBlades was used to create a time-series, a list of one particular variable over time, often with gaps. For instance, the price of a stock over time changes, but there are times, like weekends, where the data does not change and there is no entry. Traditional databases have difficultly handling this sort of task; while they can find a record for a particular date, finding the one that is "active" in one of the gaps is time consuming. With the Time Series DataBlade, this was fast and easy.

DataBlades were incredibly successful and started to generate considerable industry "buzz", eventually leading Informix to purchase the company outright in 1996. Industry insiders claimed that it would not be possible to merge the two products, but in fact this was fairly easy because both were based on the original Ingres code and concepts. Informix released their Illustra-based Universal Server in 1997, leaving them in an unchallenged position in terms of technical merit.

Description

A cursory examination of PostgreSQL might suggest that the system is very similar to other relational database systems. PostgreSQL uses the SQL language to run queries on data that is organized as a series of tables with foreign keys linking related data together. The primary advantage of PostgreSQL over some of its competitors is best described as programmability: PostgreSQL makes it much easier to build real-world applications using data taken from the database.

The relational model stores simple data types in "flat" tables, requiring the user to gather up related information using the SQL language. This contrasts with the way the data itself ends up being used, typically in a high-level language with rich data types where all of the related data is considered to be a complete unit of its own, typically referred to as a record or object depending on the language.

Converting information from the SQL world into the programming world is difficult because the two simply have very different models of the way data is organized. This problem is widely known as impedance mismatch in the industry, and mapping from one model to the other typically takes up about 40% of a project's time. A number of mapping solutions, typically referred to as object-relational mapping, are on the market, but they tend to be expensive and have problems of their own, notably performance.

In PostgreSQL many of these issues can be solved directly in the database. PostgreSQL allows the user to define new types based out of the normal SQL types, allowing the database itself to understand complex data. For instance, you can define an address to consist of several strings for things like street number, city and country. From that point on one can easily create tables containing all the fields needed to hold an address with a single line.

PostgreSQL also allows types to include inheritance, one of the major concepts in object-oriented programming. For instance, one could define a post_code type, and then create us_zip_code and canadian_postal_code based on them. Addresses could then be specialized for us_address and canadian_address, including specialized rules to validate the data in each case.

Another very useful feature is that PostgreSQL is capable of directly understanding the relationships that exist between tables. People in the real world typically have several addresses, which in the relational model is stored by placing the addresses in one table and the rest of the user information in another. The addresses are "related" to a particular user by storing some unique information, say the user's name, in the address table itself. In order to find all the addresses for "Bob Smith", the user writes a query that "joins" the data back together, by selecting a particular name from the users table and then searching for that name in the address table. Doing a search for all the users in New York is somewhat complex, requiring the database to find all the user names in the address table, then search the user table for those users. A typical search might look like this:

SELECT u.* FROM user u, address a WHERE a.city='New York' and a.user_name=u.user_name

In PostgreSQL the relationship between users and addresses can be explicitly defined. Once defined the address becomes a property of the user, so the search above can be simplified greatly to:

SELECT * FROM user WHERE address.city='New York'

No "join" is required, the database itself understands the user.address relationship. A related example shows the power of types, if one uses Postgres to do:

SELECT address FROM user

The results will be broken out automatically, returning only those addresses for users, not those for companies or other objects that might be using the address table.

Finally the programming of the database itself is greatly enhanced due to functions. Most SQL systems allow you to write a stored procedure, a block of SQL code that can be called in other SQL statements. However SQL itself is a poor programming language, and complex logic is very difficult to create. Worse, many of the most basic operations in a programming language, like branching and looping, are not supported in SQL itself. Instead each vendor has written their own extensions to the SQL language to add these features, which are not cross platform.

In PostgreSQL you can write the logic in most any language you choose, and the list of supported languages is growing with every release. The code is then inserted into the server as a function, a small wrapper that makes the code appear as if it were a stored procedure. In this way SQL code can call (for instance) C code and vice-versa, dramatically increasing simplicity and performance.

These advantages add up to making PostgreSQL easily the most advanced database system from a programming perspective, which is one reason for the success of Illustra. Using PostgreSQL can dramatically reduce overall programming time on many projects, with the advantages growing with project complexity.

Features

Some features of PostgreSQL rarely found in other relational databases include:

  • User-defined types
  • User-defined operators
  • Availability of multiple stored procedure languages, including C, SQL, Perl, Python, Tcl, Ruby, shell script, or the native PL/PgSQL
  • Support for geographic objects via PostGIS
  • Concurrency is managed via a Multi-Version Concurrency Control (MVCC) design, which ensures excellent performance even under heavy concurrent access
  • Inheritance of tables
  • Rules -- a way of implementing server-side logic that allows the application developer to modify the "query tree" of an incoming query
  • Expressional indexes -- an index that is created on a SQL expression, not necessarily a single column from a table.
  • Partial indexes -- an index that is only created on a portion of a table. This saves disk space and improves performance if only part of the table actually requires an index.

In addition, PostgreSQL supports almost all the constructs expected from an enterprise-level database, including:

  • Referential integrity constraints including foreign key constraints, column constraints, and row checks
  • Triggers
  • Views
  • Outer joins
  • Sub-selects
  • Transactions
  • Strong compliance with the SQL standard (SQL92, SQL1999)
  • Encrypted connections via SSL
  • Binary and textual large-object storage
  • Online backup

Criticisms

The set of features that an enterprise DBMS is expected to provide is very large, and there are some features that PostgreSQL does not yet provide. The following list of PostgreSQL deficiencies is therefore incomplete. For more information on functionality not yet available in PostgreSQL, refer to the online TODO list.

  • Lack of point-in-time recovery.
  • Lack of nested transactions.
  • Lack of a native port to Win32. PostgreSQL can run on Win32 using the Cygwin emulation package, but there are some known issues that make this configuration less than ideal. A native Win32 port is currently in development, and is expected to be included in the 7.5 release of PostgreSQL.
  • Lack of updateable views. A view can manually be made updateable by defining the appropriate rewrite rules for it, but the database is not currently able to derive the necessary rules itself.
  • Lack of materialized views.
  • Lack of good replication solution; some packages exist, however.
PostgreSQL Documentation
PostgreSQL