Portuguese alphabet
The official Portuguese alphabet consists of the letters of the Latin alphabet minus K, W, and Y:
Although not found in vernacular terms, the letters K, W, and Y are still used for proper names and Portuguese words derived from them. Portuguese also uses several digraphs and diacritics, described below.
Introduction
The history of the Portuguese script began in the 12th century, when scribes in the Western Iberian peninsula started using the local vernacular in documents, in place of Latin. The script evolved naturally until the close of the 19th century, the golden age of of Portuguese literature. At about that time the national Academias de Letras ("Literary Academies") were created in Brazil and Portugal, and legally empowered to standardize orthography.
Today, Portuguese orthography is defined by national laws and international treaties, which are binding for most administrative and educational uses. The orthography suffered a major reform around 1940, when a large fraction of the words had their spelling radically simplified. A second reform around 1990 had much smaller impact.
The general result of those reforms was to make Portuguese orthography — which until 1940s has been determined chiefly by etymology — much closer to a phonetic writing system. However, its rules are still rather complex and non-algorithmic, and still somewhat based on etymology. Thus, spelling and pronunciation are still partly determined by tradition, on a word by word basis. In particular, many letters have two or more phonetic values ("X" has four), and many sounds can be written in more than one way.
Digraphs
Portuguese orthography uses several character combinations to represent additional phonemes:
- "CH": approximately as in English "shoe".
- "LH": as in English "million".
- "NH": as in French "champignon".
- "RR": trilled "r".
- "SS" (in all contexts): as in English "sun".
- "SC" (before "E" and "I"): the same as "SS".
- "QU" (before "E" and "I"): as in "kettle".
- "XC": as in "easy", "ask", "axis", or "essence", depending on the word.
- "ZZ": as in "Betsy".
The "ZZ" digraph is used in only one Portuguese word, pizza, and its derivatives. (Italian words generally had "ZZ" replaced by "SS", "Ç", or "Z" when borrowed into Portuguese; however the change was prevented in this single case due to collision with a preexisting obscene word.)
It must be noted that each of these digraphs is treated as two separate letters for the purpose of sorting or indexing (as opposed to Spanish, for example, where each digraph counts as a single special letter). In fact, the Portuguese hyphenation rules require a syllable break between the two letters of RR, SS, and XC: pro-ces-so, car-ro, ex-ce-to. Portuguese digraphs are broken into separate letters also for the purposes of crossword puzzles.
Diacritics
Portuguese also uses diacritics — acute, circumflex, tilde, grave, umlaut, and cedilla — on some letters:
- Á, É, Í, Ó, Ú
- Â, Ê, Ô
- Ã, Õ
- À,
- Ü
- Ç
Acute and circumflex accents
The diacritics "acute accent" (acento agudo) and "circumflex accent" (acento circunflexo) are used primarily to indicate the stressed syllable of a word. The stress diacritic is either written or omitted according to detailed rules that depend primarily on the position of that syllable (first, second, or third from the end) and on the final letter of the word. The rules are such that the stress of an un-accented written word can (almost) always be deduced through them, even if the word was never heard before.
When the stress diacritic (acute or circumflex) is present, it also indicates the vowel's quality: namely, "Á", "É", and "Ó" have the so-called "open" sounds, whereas "Â", "Ê", "Ô" have the "closed" sounds. (When the vowels "A", "E", "O" carry no diacritics, their sound may be either open or closed, and this attribute cannot always be deduced from the printed word. Thus, for example, seco can be either an adjective ("dry") or a verb ("I dry"); the "E" is "closed" in the first case, and "open" in the second.) The unmarked vowels "I" and "U" have only one possible sound each, so they may take only the acute accent.
In a few written words, the acute accent is traditionally used even when the letter in question has the "closed" sound: também ("also"), porém ("however"), ninguém (nobody"). It is also used to distinguish in print the members of certain homophonous word pairs: para ("for", "to") and pára (it stops), por ("by", "through") and pôr ("to put").
Tilde
The tilde (til) is used over the vowels "A" and "O" to indicate two additional "nasalized" vowel sounds, which are a characteristic feature of Portuguese among the Romance languages. Unlike the acute and circumflex accents, the tilde does not indicate stress, and indeed a few words carry both a tilde and a stress diacritic, e.g. ímã ("magnet") and órgão ("organ").
Historically, the nasalized vowel sounds derive from vowel + "N" groups in the parent Latin words, e.g. mão ("hand") from Latin mano. The tilde sign originates from the Medieval scribal convention of writing the (contracting) letter "N" over the preceding vowel.
Grave accent
The grave accent diacritic (acento grave is presently used only over a word-initial "A", to indicate the presence of a contracted preposition a ("to", "for", etc.) This grave-marked contraction occurs with only a handful of words, chefly the article a and the various forms of the pronoun aquele ("that"). Thus, a ("to") + a ("the") = à ("to the"); a + aquela = àquela ("to that"); and so on. In all these cases the "À" sounds exactly like "Á" in most dialects.
Until about 1990, the grave accent also replaced the acute accent to indicate the secondary (stem) stress in adverbs formed with the suffix -mente, e.g. hábil ("deft") + -mente = hàbilmente ("deftly"). Circumflex accents on the stem were retained, e.g. sôfrego ("eager") + -mente = sôfregamente ("eagerly"). All the -mente adverbs are now written without any stress diacritic or vowel quality indication, e.g. habilmente, sofregamente.
Umlaut
The umlaut or diaeresis (trema) may be used only over the U in the combinations gue, gui, que and qui. These are pronounced [ge], [gi], [ke], [ki] when unmarked; with the umlaut — namely, güe, güi, qüe, qüi — the "U" is pronounced, yielding [gwe], [gwi], [kwe], and [kwi]; e.g. agüentar ("to bear") or freqüência ("frequency").
The umlaut is increasingly omitted in Portugal, in which case the correct pronunciation of those trigraphs must be learned word-by-word.
The umlaut does not indicate stress, and indeed a word may contain multiple umlauts — possibly with a tilde, as in argüição ("questioning"), and/or a stress diacritic, as in qüinqüelíngüe ("in five languages", conjectured to be the Portuguese word with most diacritics).
Cedilla
The cedilla (cedilha) is used only under the letter "C", only before "A", "O", or "U", and never at the beginning of a word: poça ("puddle"), moço ("lad"), açúcar ("sugar"). The combination "Ç" always sounds [s] as in "sun", even in contexts where the letter "S" would sound [z]. (Originally the cedilla was a small "Z" or "S" written under the "C".) The "Ç" and "SS" are therefore phonetically equivalent, and only tradition determines which of them is correct in a given word. Indeed, writing one for the other is perhaps the most common kind of spelling error made by native speakers. Incidentally, several homophonic pairs or words are distinguished only by the use of "Ç" or "SS" in writing: paço ("palace") and passo ("step"), ruço ("red-haired") and russo ("Russian"), seção ("section") and sessão("session"), etc.
Apostrophe
Although not properly a letter of the alphabet, the apostrophe (') can be part of certain words, almost always to indicate the loss of a vowel in the contraction of a preprosition with the next word: de + amigo = d'amigo.
Hyphen
The hyphen (-) is used to make compound words, especially animal names like papagaio-de-rabo-vermelho ("Red Tailed Parrot"). It is also extensively used to append weak pronouns to the verb, as in quero-o ("I want it"), or even to embed them inside the verb, as in levaria + te + os = levar-te-los-ia ("I would take them to you").
Portuguese-language typewriters
Typewriters in Portuguese-speaking countries generally have a separate extra key for "Ç", and a dead key for each diacritic except the cedilla; so that "Á" is obtained by typing first the acute accent, then the letter "A".
Brazilian vs. Portuguese orthography
There are significant and pervasive differences between the spoken languages of Brazil and Portugal, as well as within each country. Indeed, much of the orthographic complexity of the language results from the struggle by the national spelling reform authorities to define a single written language for the whole Lusophonic community. In spite of those efforts, there remain numerous discrepancies between the spelling standards of Brazil and Portugal.
The main difference is a general switch from acute accents in Portugal (sinónimo) to circumflexes in Brazil (sinônimo), reflecting a switch in pronuciation, from "open" to "closed" vowels. Another important difference is that Brazilian spelling often omits a "P" or "C" that comes before another consonant other than "L" or "R", such as ótimo ("optimum", in Brazil) vs. óptimo (Portugal), or fato ("fact") vs. facto. Some of these spelling differences are reflected in the pronuciation of those words.
Status of K, W, Y
The letters "K", "W", and "Y" were heavily used in the Portuguese alphabet until the 1940s, when they were officially removed from the alphabet by a broad spelling reform agreement between Portugal and Brazil. The corresponding phonemes were to be written with C, U or V, and I, respectively.
In practice, however, those three letters are necessarily used for vernacular words which are derived from foreign names, such as keynesiano and newtoniano, which are listed even in the most authoritative Portuguese dictionaries. They are also mandatory for some metric units (watt, henry) and abbreviations thereof ("W", "km") which are legally mandated measurement units in Brazil and Portugal.
Spelling of proper names
In practice, if not by law, the letters "K", "W", and "Y" are also widely accepted for personal names, in all official records and documents. In Brazil, in fact, those three letters are quite popular in made-up first and middle names, such as Waldirci and Deyvide, or in the names of Japanese-Brazilians, such as Satiko and Yojiro. Family names have often retained their pre-1940 spellings — in particular the final "Y" was retained in many names of native (chefly Tupi-Guarani) origin, such as Guaracy.
However, the use of diacritics in personal or family names is generally restricted to the letter-diacritic combinations above, and often also by the applicable Portuguese spelling rules. So, for example, a Brazilian birth registrar may accept Niccoló, Schwartz, or Agüeiro; but he is likely to object to Niccolò, Nuñez, Molière, or Gödel, and possibly even to Çambel or Qadi.
Portugal is far more restrictive than Brazil in this regard, only accepting names of Roman, Jewish (Biblical) and Arabic origin, taken from a list fixed by law. However, in the wake of increased immigration (especially from Eastern European countries), a regime of exception has been instituted for immigrants. The main reason given by Portuguese authorities to justify these restrictions is that an unusual name may lead to discrimination in school by other children, a thesis that was backed by some phychological studies.