Misplaced Pages

User:K.lee/Archived local PL thingie: Difference between revisions

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
< User:K.lee Browse history interactively← Previous editNext edit →Content deleted Content addedVisualWikitext
Revision as of 22:42, 5 December 2005 edit82.131.14.16 (talk) []← Previous edit Revision as of 10:12, 5 March 2006 edit undoLambiam (talk | contribs)Autopatrolled, Extended confirmed users, Page movers63,480 edits Educational languages: added ABCNext edit →
Line 360: Line 360:


==== Educational languages ==== ==== Educational languages ====
], ], *] ], ], *], ]


<nowiki>*</nowiki> designed partially as an educational language <nowiki>*</nowiki> designed partially as an educational language

Revision as of 10:12, 5 March 2006

Note: This article is intended to someday replace programming language. It is in my namespace for the moment only because it's not yet complete enough to "go live". You may edit this article, if you wish, just as freely as any other. In fact, contributions from other users will most likely be improvements, and may also hasten the day that we can make this page live. See the talk page for the main article for discussion related to the article and this rewrite. k.lee


A programming language is a systematic method to describe computations, usually thought of as a translation of an algorithm into a specific context wherein variables are "bound" to some values from outside the system. In computer science proper, this is the only valid view of programs, according to Edgar Dykstra's The cruelty of really teaching computer science.

readable/literate vs. mathematically exact

Programming languages are most commonly used to describe an intended computation in a form that will both allow it to be performed by an electronic computer but still allow it to be read, understood and modified by a human. Though it is hotly contested how reliable this is - see write-once programming language. A computer program is also often thought of as a form of communication between two human beings, much as any other piece of text, and is subject to similar editorial process. This is the view of Donald Knuth's literate programming and Ward Cunningham creator of the Extreme Programming methods.

In either sense, programming languages are often regarded as the primary bridge between human thinking and machine computation: Programming languages exist because human natural languages are too ambiguous to be well-suited to describing computation in sufficient detail to be performed reliably and deterministically by an electronic computer - accordingly a more exact layer is required to explain the human-to-human communication to the machine. Much as equations, they are comprehensible (barely) to humans who wish to share knowledge about how to solve a problem - accordingly the notation used should reflect mathematics notation and domain expert knowledge of a problem. Most programming languages reflect an explicit tradeoff between these major goals: to be easily understood by the next programmer (to be literate), and to express solutions at least as clearly as do mathematical proofs (to be exact).

There have been attempts to create computer programming languages that can be spoken between human beings in ordinary conversation, with the computer as the bystander, processing implied instructions as a form of sociological byplay. These have failed. Accordingly, there is a continued need to trade off goals in the design or selection of a programming language for any particular task: some more literate, some more exact, depending on the constraints of a problem domain. For instance, the wordy COBOL is often seen as verbose by scientific programmers who find APL appealing, though COBOL programmers find it opaque. This may be because APL is generally used to translate equations already well understood in a scientific domain, while COBOL is used to catalogue exception handling in a very poorly studied business domain where most information comes from conversations between people about phenomena observed only anecdotally.

Regardless of how one views the purpose of programming or the languages, the way the knowledge is expressed is similar in all languages:

Programs

Computations expressed in programming languages are generally called programs, the act of constructing these programs is called programming or, colloquially, coding. The human-readable description of a program written in a programming language is called code. Code that is written in a high-level programming language is usually called source code. The task of transforming the source code into a sequence of operations performed by a computer can be compared to translation between human languages, and indeed the same word, translation, is sometimes used. The difference is that this translation can be automatically performed by the usage of programming languages. There are two basic families of translation, interpretation and compilation, designated according to whether the translation takes place during or before the process of performing the computation.

An interpreter is a program that causes the computer to perform (execute) a computation according to the semantics of the language while taking code in the source language instruction by instruction. A compiler is a program that takes an entire program in the source language and translates it into code in another, lower-level programming language without executing it. An interpreter for the lower-level language can then be used to execute the program. The lowest-level language is a sequence of instructions which can be directly interpreted by the computing machinery itself; this is known as machine language or machine code. Most compilers compile directly into this machine language, so the most common way of executing a piece of source code is to compile it into machine language and then having the hardware interpret it, although this is expected to change in the future as virtual-machine-driven programming languages evolve and gradually obsolete the older compilers.

As descriptions of computations expressed in a programming language are highly detailed and precise, they are frequently rather difficult for other humans to follow, and may even give the original programmer difficulty after some time has elapsed. The literate programming view emphasizes the code as merely the most specific layer of documentation, demanding that the documentation be correct first before any code is written.

Even among those who reject the literate programming view, it is considered good programming practice to add comments in some natural language to the source code. These comments are ignored by the computer, but are often critical to allowing another human to easily understand the code. Whether these comments are "part of the program" is a question perhaps best left to linguistics.

It is not entirely well-defined what methods of computations are regarded as being "programming languages". It is almost universally accepted that in order to be a programming language, it must at least be able to do basic arithmetics, but after this, opinions vary greatly. Some apply standards based on the realm of application of the language, while others apply more theoretical criteria including primitive recursion or Turing completeness. There are no languages that are not Turing-complete in use today: even Microsoft Word macros satisfy this minimal criteria.

Why programming languages?

Like other specialized languages, such as musical notation and mathematical formulae, programming languages facilitate the communication of a specific kind of knowledge---namely, computation, or the task of organizing and manipulating information. For example, a programming language might enable its user to express the following:

Programming languages differ from most other forms of human expression in that they force the author to write instructions with exceeding precision and completeness. In a natural language, or even mathematical notation, authors have considerable freedom to be vague and incomplete. For example, consider natural language:

  • Speakers can leave things out, because humans excel at "filling in the gaps" of a partial statement. If someone says: "Going to the store," the listener might use the context to fill in the missing words: "I am going to the store on the corner."
  • Speakers can make grammatical errors, because humans excel at compensating for minor errors in language. If someone says, "I am going to the store am the corner," the listener can usually tell that "am" is meant to be "on".

But because computers are not yet able to "fill in the gaps", and have only limited ability to "understand" human language, every part of every computation must now be expressed explicitly and exactly in a programming language. Each phrase in a program corresponds unambiguously to its literal meaning, and no more. If the author of a program states that the program should perform an incorrect step, the program's meaning will include that incorrect step. If the author omits a necessary step, the program's meaning will not include that step. Therefore, in order to write a "correct" program, the author must be correct in every detail.

In return for this exacting discipline, programming languages reward the user with a unique power: Many programming languages are executable by an electronic computer. In other words, tasks expressed in most programming languages can be performed autonomously by a computer, without human intervention. Therefore, programming languages have enormous practical utility — they enable the construction of programs that automatically perform tasks. The entire information technology industry is built around the construction and use of programs. A programming language implementation is a system that enables a computer to execute a program written in a programming language. Programming languages can be implemented by an interpreter, a compiler, or some combination of both.

However, the main purpose of a programming language is not always to be executable. In particular, most foundational languages — also called core languages, calculi (singular calculus), or specification languages — are mathematical constructs. The most prominent foundational language is Alonzo Church's lambda calculus. Although such languages can be executed, their principal aim is to allow computer scientists to study the fundamental properties of programming languages; often, language designers apply the lessons learned from formal calculi in the design of practical programming languages. For example, the lambda calculus is the basis of Lisp. See also type theory.

Since the dawn of programming languages, countless languages have been designed from scratch, altered to meet new needs, combined with other languages, and fallen into disuse. Although there have been attempts to make one "universal" computer language that serves all purposes, all of them have failed. The need for diverse computer languages arises from the diversity of contexts in which languages are used:

  • Programs range from tiny scripts written by individual hobbyists to huge systems written by hundreds of programmers.
  • Programmers range in expertise from novices (who need simplicity above all else) to experts (who may be comfortable with considerable complexity in the language).
  • Programs may need to extract the right amount of performance on platforms ranging from tiny microcontrollers to supercomputers.
  • Programs may be written at one time, to reflect some exacting constraints, and then not changed for generations, until these constraints change; This is often the case when programs reflect laws especially in the financial sector.
  • Finally, programmers may simply differ in their tastes or habits: they may be accustomed to discussing problems and expressing them in a particular language. Languages like COBOL and 'C' proved surprisingly persistent despite some deficiencies often observed. Some credit William H. Gates III's devotion to BASIC for keeping that language alive to this day.

Despite the need for different programming languages to solve different kinds of problems, one common trend in the development of programming languages has been to add more ability to solve problems on a higher level. The earliest programming languages were tied very closely to the underlying hardware of the computer. As new programming languages have developed, features have been added that let programmers express ideas that are more removed from simple translation into underlying hardware instructions. Because programmers are less tied to the needs of the computer, their programs can do more computing with less effort from the programmer. This lets them write programs in ways that are closer to the way they think about the problems they are solving. It also lets them write more programs in the same amount of time. And because the programs are written more like how they think about the problems, it is easier for a new programmer to understand a program that was written by somebody else.

The difference between programming languages and computer languages

Programming languages are a subset of computer languages. Generally, a "programming language" is designed to express computation, whereas the term "computer language" can denote any language used by a computer or its users. For example, a markup language is a computer language, but generally markup languages are designed to describe primarily data, not computation.

Elements of programming languages

A programming language's surface form — that is, how programs are represented to a reader---is its syntax. Programming languages vary widely in this surface form. Most programming languages are textual — that is, they consist of sequences of "words" and "punctuation marks", much like written natural languages. However, some exotic programming languages are partly or entirely graphical — such graphical programming languages (or "visual programming languages") use spatial relationships among stylized pictorial representations to communicate their meaning, much like musical notation and Venn diagrams.

Regardless of their surface form, all programming languages also specify a semantics, or "meaning", for each syntactic construct in the language. In textual programming languages, the syntactic constructs are words and phrases; in graphical languages, the "syntax" consists of spatial arrangements of pictures, text, or both. In either case, the essence of a programming language is the meaning that it assigns to the various constructs it provides. However the single most important determiner in a language is its view of binding:

Binding

As algorithms, computer programs all specify variables that are to be bound to some specific sequences of bits at some future point in time. If they didn't have this flexibility, they'd be useless. Binding theory can be seen as the study of when is the best time to make this decision. All programming languages set particular times when binding can occur, and permit variables to exist that are bound earlier or later in the process of writing, interpreting, compiling, linking, loading, running and using the program. The range of options allowed for any given type of variable will determine the power, reliability and comprehensibility of the code.

The section static and dynamic typing below details the most common tradeoff in programming language design, which is whether to treat a given sequence of bits as having a single known type at write-compile-and-link time, or whether that type can be changed as late as load-run-and-use time. If this was an all-or-nothing choice then it would not be necessary to explain binding before syntax, semantics, and data types. But many programming languages require more subtle distinctions:

  • very early programming languages required all non-numerical information to be written directly into the computer program as 'constants' only - binding them at write time
  • the first viable languages for common every day use, including COBOL and FORTRAN, provided static typing for non-numerical information but allowed more dynamic and flexible ways of altering the type of the variables they were designed to handle (text for COBOL, numbers for FORTRAN)
  • later languages distinguished between the 'compile-time' distinctions that let a compiler optimize the program to run on less processor time, and 'run-time' that let a programmer make choices about handling of bit sequences without the benefit of this optimization; the interpretation, link, load and use options were ignored
  • some languages such as LISP specialized in the construction of interpretable sequences, which could be 'EVALuated' within the program itself, invoking compiler functions at the programmer's discretion; these were very powerful but had some reliability problems as these reflexive capabilities were very often badly abused
  • by the 1980s, some languages, optimized for performance, made more subtle distinctions between compiling and linking (permitting a variable to be statically bound when it was discovered by the compiler that no instructions to keep its type flexible existed)
  • also in the 1980s constraint languages were emerging that required statements of assertions, invariants, exceptions and temporal logic statements - which developed later into the more complex alternating-time temporal logic systems.
  • in the 1990s, other languages such as Smalltalk, optimized for flexibility in a network environment, made more subtle distinctions between loading and running (permitting a variable to be bound dynamically until such time as a library was loaded, at which point it might well be possible to statically optimize the runtime code knowing the options required);
  • these and other languages designed specifically for use with CASE tools were much more disciplined about distinctions between compile, link, load and run time.
  • languages specialized for the Internet eventually emerged, including C# and Python, which allowed for dynamic lookup of updates to libraries, the signing and authentication of these, and user choices that could constrain the type usage

The syntax and semantics of programming languages is always constrained by its main binding choices: the periods of time over which the type might be varying, or the bits in the sequence constrained by the type might be varying. Within a given line of code, there may be implied reference to several time horizons. Some languages require this be explicitly stated, others allow subtle statements that can lead to confusion but which also allow for extreme program flexibility.

Syntax

The syntax of a language describes what sequences or arrangements of symbols is legal in the language. The syntax does not describe meaning; that is the job of semantics. Most commonly used languages are textual, so here we describe only the syntax of textual languages.

Programming language syntax is usually defined using a combination of regular expressions (for lexical structure) and Backus-Naur Form (for context-free structure). The use of these devices gives programming language syntax a precise definition founded in formal language theory. For example, a simplified grammar for Pure Lisp is given by the following:

expression ::= atom | list
atom   ::= number | symbol
number ::= +
symbol ::= .*
list   ::= '(' expression* ')'

This indicates that:

  • an expression is an atom or a list;
  • an atom is a number or a symbol;
  • a number is one or more digits;
  • a symbol is a letter followed by zero or more of any characters; and
  • a list is a pair of parentheses, with any number of expressions inside it.

The following are all expressions in this grammar:

12345
()
(a b c232 (1))

Not all syntactically correct programs are semantically correct — in other words, it is possible for a program to have the proper syntax, but mean nothing, or mean something incorrect. This is analogous to the fact that, in natural languages, a sentence may appear grammatically correct but have no meaning, or a false meaning. For example:

  • The following English sentence is grammatically well-formed but means nothing: "The grundlefarb timbles over the slothrop."
  • This English sentence is syntactically well-formed, and has a meaning, but its meaning is incorrect: "The grass on a baseball field is usually purple."

Therefore, it naturally arises that a programming language must have a semantics, or "a set of rules about meaning", as well as its syntax.

Semantics

Although programmers often notice the syntax of a programming language first, the essence of a programming language is its semantics. In fact, it is possible to have a single language with multiple syntaxes — the ambient calculus of Cardelli and Gordon, for example, has both a pictorial and a textual syntax, both exactly equivalent in meaning.

Data and types

Internally, all data in a modern digital computer are stored simply as on-off (binary) states. The data typically represent information in the real world such as names, bank accounts and measurements and so the low-level binary data are organised by programming languages into these high-level concepts.

The particular system by which data are organized in a program is the type system of the programming language; the design and study of type systems is known as type theory. There are two main systems of classification, static/dynamic and weak/strong.

Static and dynamic typing

Static typing is a system whereby all values are defined, at compile-time, as having a single fixed type. For example, in such languages 1 is typically defined as an integer; the value 1 cannot be passed to a function that expects a string, or given to a variable that is defined as holding dates.

Statically typed languages come in two flavors, namely manifestly typed and type-inferred. The former, which is more common, denotes a system whereby the type of every single value is explicitly declared by the programmer; the latter refers to languages where this requirement is relaxed, and, where types are not specified, the compiler works out for itself what type each value should have based on the contexts in which it is used. This makes programs easier to write and modify, but it can also make them harder to read, and it is not trivial to implement. Most mainstream statically-typed languages are currently manifestly typed; examples are C++ and Java. Examples of type-inferring languages are Haskell and ML.

The opposite of static typing is dynamic typing, also referred to as latent typing. In this system, types are assigned when the program is run, rather than when it is compiled; this has many of the benefits of type inferencing, since it is not necessary for the programmer to specify any types, and adds additional features such as the ability to store more than one type in a single variable; the canonical example is a list containing both strings and numbers. The disadvantage of this approach is that type errors are often not detected until a piece of code is actually executed, so it is theoretically possible for a user, trying something that was not tested by the programmer, to crash the program. Unit testing, which if properly implemented ensures that all code is tested, can solve this problem. Popular dynamically-typed languages include Lisp, JavaScript, Python, and Tcl.

Since it is possible to express programs in dynamically typed languages which are not (currently) accepted by any static type-system, advocates of dynamic typing claim their languages are more flexible; advocates of static typing argue that the cases in question are of dubious value in practice, and that the compile-time guarantees provided by static type-checking are of greater benefit. Whether either system is noticeably superior in practice is unclear; it is interesting to note that many statically-typed languages now provide some way of simulating dynamic typing (for example, Java can use object subclassing to create heterogeneous lists), while many dynamically-typed languages provide tools which use various forms of type inference to check the soundness of a program before it is compiled.

Weak and strong typing

A weakly-typed language is one in which it is possible to treat a value of one type as a value of another, for example by treating a string as a number; this can be useful in some circumstances, but it also requires considerable care to avoid disastrous results, and such languages are often referred to as unsafe (whether this is considered a positive or negative label depends on the user). Well-known weakly-typed languages include C, Perl, and most assembly languages.

A strongly-typed language prevents all such uses; attempting to do anything dangerous will raise an error, either at compile-time or runtime, instead of potentially crashing the program. This prevents careless programmers from writing incorrect code, but it also prevents the clever tricks that type coercions make possible, and sometimes decreases performance marginally as well. Strongly-typed languages are often referred to as safe, although that should not be mistaken for meaning that they make errors impossible. This class of language includes Ada, Python, and ML.

Sometimes weak is used as a synonym for dynamic, and strong for static. These usages are frowned on by computer scientists, as they confuse two aspects of languages which are demonstrably unrelated. For example, while ML is both statically and strongly typed, the C type system is static but weak; and while Tcl is both dynamically and weakly typed, Python's type system is dynamic but strong.

Types and data structures

Most languages provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, and files). Object oriented languages allow the programmer to define new data-types, "Objects", along with the functions to operate upon these new data-types, "Methods", by assembling complex structures along with behaviors specific to those newly defined data structures.

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values).

Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well. Functional languages often require variables to name run-time computed values directly instead of naming memory locations where values may be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values. Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Control

Template:Wikibooks Module Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, conditionals, loops, backtracking, and functional composition).

Naming, abstraction, and parameterization

The core of the idea of reference is that there must be a method of indirectly designating storage space. The most common method is through named variables. Depending on the language, further indirection may include references that are pointers to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of naming groups of instructions. Most programming language use macro calls, procedure calls or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data divisions allow many application-oriented languages to integrate typical operations as if the programming language included them as higher level instructions.

Specifying language semantics

There are six ways in which programming language semantics are described; all languages use at least one, and some languages combine more than one:

The first three of these are grounded in mathematics, and have the advantage of being precise, compact, and unambiguous. Programming languages whose semantics are described using one of these methods can reap many benefits. For example:

  • Formal semantics enable mathematical proofs of program correctness;
  • Formal semantics facilitate the design of type systems, and proofs about the soundness of those type systems;
  • Formal semantics establish unambiguous and uniform standards for implementations of the language, making it more likely that programs written in those languages will be portable across implementations.

By contrast, natural language descriptions tend to be imprecise, verbose, and ambiguous. They do not lend themselves to proofs, either about individual programs or about the programming language's type system. On the other hand, it is relatively easy for inexperienced language designers to write a natural-language description of a programming language's semantics. Additionally, formulating a rigorous mathematical semantics of a large, complex, practical programming language is a daunting task even for experienced specialists, and the resulting specification can be difficult to understand except by a small priesthood of experts.

To these objections, advocates of formal semantics reply that if a language is so complicated that a formal semantics cannot be defined for it, then a natural language description is likely to fare no better. Natural language description can always be defined as a supplement to a formal semantics. Formal semantics advocates also point out that the imprecision of natural language as a vehicle for programming language semantics has caused problems in the real world: for example, the semantics of Java threads were specified in English, and it was later discovered that the specification did not provide adequate guidance for implementors. Writing truly portable multithreaded Java programs remains challenging to this day.

Regardless of the relative merits of formal and natural-language semantics, in practice most widely-used languages are specified using natural language description. This description usually takes the form of a reference manual for the language. The manuals for widely used languages usually run in the hundreds of pages. For example, the print version of The Java Language Specification, 3rd Ed. is 596 pages long.

By contrast, The Definition of Standard ML, Revised, which uses operational semantics to describe ML, is 114 pages long. The Revised Report on the Scheme (R5RS) uses denotational semantics to describe Scheme, and is 50 pages long. (These comparisons should be taken with the caveat that Scheme and ML are both arguably simpler languages than Java.)

The fourth means of specifying language semantics is with a reference implementation. In this approach, a single implementation of the programming language is designated as authoritative, and its behavior is held to define the proper behavior of a program written in this language. This approach has several attractive properties. First, it is precise, and requires no human interpretation: disputes as to the meaning of a program can be settled simply by executing the program on this implementation (provided that the implementation behaves deterministically for that program).

On the other hand, this approach also has several drawbacks. Chief among them is that it conflates limitations of the reference implementation with properties of the language. For example, if the reference implementation has a bug, then that bug must be considered to be an authoritative behavior. Another drawback is that programs written in this language are likely to rely on quirks in the reference implementation, hindering portability across different implementations.

Nevertheless, several languages effectively take the reference implementation approach. For example, the Perl interpreter is considered to define the authoritative behavior of Perl programs. In the case of Perl, nobody has ever produced an independent implementation of the language, and the Perl executable itself is highly portable, so some of the drawbacks of using a reference implementation to define the language semantics are moot.

The final way of specifying the meaning of a language is with a test suite. In this approach, the language designer writes a number of example programs in the language, and then describes how those programs ought to behave — perhaps by writing down their correct outputs. The programs, plus their outputs, are called the "test suite" of the language. Any correct language implementation must then produce exactly the correct outputs on the test suite programs.

This technique's chief advantage that it is easy to determine whether a language implementation passes a test suite. The user can simply execute all the programs in the test suite, and compare the outputs to the desired outputs. If the outputs are the same, the language implementation passes. If not, the implementation fails, and therefore must be incorrect. However, when used by itself, the test suite approach has major drawbacks as well. For example, users want to run their own programs, which are not part of the test suite; indeed, a language implementation that could only run the programs in its test suite would be largely useless. But a test suite does not, by itself, describe how the language implementation should behave on any program not in the test suite; determining that behavior requires some extrapolation on the implementor's part, and different implementors may disagree. In addition, it is difficult to use a test suite to test behavior that is intended or allowed to be nondeterministic.

Therefore, in common practice, test suites are used only in combination with one of the other language specification techniques, such as a natural language description or a reference implementation.

History of programming languages

Prehistory: foundations and assembly

The first foundational programming languages predate the modern computer entirely. The most important of these is the lambda calculus, invented in the 1930s by Alonzo Church of Princeton University. Alternatively, the instructions of the Turing machine, developed in the 1930s by Alan Turing (who studied with Church at Princeton), can be viewed as a "foundational assembly language". However, unlike the lambda calculus, Turing's code does not serve well as a foundation for higher-level languages — its principle use is in rigorous analyses of algorithmic complexity. It is well-suited to this task because every operation in a Turing machine takes constant time, whereas the lambda calculus's reduction — its most fundamental operation — can take linear time proportional to the size of the function body.

Like many "firsts" in computer history, the first non-foundational programming language is hard to identify. The designation depends on how much power and human-readability one requires of a form of communication before granting it the status of "programming language". Jacquard looms and Charles Babbage's Difference Engine both had simple, extremely limited languages for describing the actions that these machines should perform. One can even regard the punch holes on a player piano scroll as a limited domain-specific programming language, albeit one not fit for human consumption.

However, these machine languages were not Turing complete — that is, "programs" created in these languages would be fundamentally less powerful than a general-purpose computer, even if they were given infinite memory (see Church-Turing thesis).

In 1945, Konrad Zuse published details of his Planakul programming language. It was Turing complete, and had many of the properties of modern-day languages. However, it was not implemented in his time, despite Zuse being perhaps the first new technology start up entrepreneur, with several successful computers in general use in the post war years. His original contributions were isolated from other develoopments because of Germany being isolated during and for some time after the war.

Other Turing complete programming languages were the assembly languages of the earliest electronic computers. In these languages, each construct corresponds exactly to one instruction in a physical computer. For example, a machine might have an instruction that added the contents of two machine registers, and placed the result in a third register; such an instruction might be represented in assembly language using the following text:

ADD R1, R2, R3

It was soon discovered that programming in assembly language was extraordinarily tedious and error-prone. Individual machine instructions were too "low-level" to construct large programs — doing so was akin to constructing a building by gluing together individual grains of sand. As demand for more complex programs grew, programmers soon realized that higher-level programming languages were necessary.

The 1950s: the founding three, and Algol

In the 1950s, the first three modern programming languages were invented:

Descendants of all three are still in use today, although Lisp (the language name is no longer written with all capitals) has influenced later languages to a much greater extent than the other two.

The next major milestone in computer languages came between 1958 and 1960, when a committee composed jointly of American and European computer scientists held a series of meetings in Europe to design "a new language for algorithms". These meetings culminated in the Algol 60 Report, which described Algol 60, the "ALGOrithmic Language". This report consolidated many ideas then circulating in the languages community, and also featured two key innovations:

  • The use of Backus-Naur Form (BNF) for describing the language's syntax. Nearly all subsequent programming languages have used a variant of BNF to describe the context-free portion of their syntax.
  • The introduction of lexical scoping for names in arbitrarily nested scopes.

Algol 60 was never widely used in North America — partly for political reasons, stemming from IBM's introduction of PL/I, and partly due to the decision not to include I/O within the language definition. However, it represented a major step forward, and was influential in the education of a generation of computer scientists and the design of later languages, including Simula, Pascal, Scheme, and Modula. Additionally, most modern computer science textbooks use an Algol-like notation for pseudocode descriptions of algorithms.

1967-1978: establishing fundamental paradigms

The period from the late 1960s to the late 1970s brought a major flowering of programming languages, comparable to the Cambrian explosion of biology. Most of the major language paradigms now in use were invented in this period:

Each of these languages spawned an entire family of descendants, and most modern languages count at least one of them in their ancestry.

The 1960s and 1970s also saw considerable debate over the merits of "structured programming", which essentially meant programming without the use of GOTO. This debate was closely related to language design: some languages did not include GOTO, which forced structured programming on the programmer. Although the debate raged hotly at the time, nearly all programmers now agree that, even in languages that provide GOTO, it is bad style to use it except in rare circumstances. As a result, later generations of language designers have found the structured programming debate tedious and even bewildering: first, the answer seems obvious in retrospect; and second, there were far greater revolutions afoot in language design at the time.

The 1980s: consolidation, modules, performance

In contrast to the remarkable innovation of the previous decade, the 1980s were years of relative consolidation. C++ combined object-oriented and systems programming. The United States government standardized Ada, a systems programming language intended for use by defense contractors. In Japan and elsewhere, vast sums were spent investigating so-called "fifth generation" languages that incorporated logic programming constructs. The functional languages community moved to standardize ML and Lisp. Rather than inventing new paradigms, all of these movements elaborated upon the ideas invented in the previous decade.

However, one important new trend in language design was an increased focus on programming for large-scale systems through the use of modules, or large-scale organizational units of code. Modula, Ada, and ML all developed notable module systems in the 1980s. Module systems were often wedded to generic programming constructs---generics being, in essence, parameterized modules (see also parametric polymorphism).

Furthermore, although major new paradigms for programming languages did not appear, many researchers expanded on the ideas of prior languages and adapted them to new contexts. For example, the languages of the Argus and Emerald systems adapted object-oriented programming to distributed systems.

The 1980s also brought great advances in programming language implementation. The RISC movement in computer architecture hypothesized that hardware should be designed for compilers rather than for human assembly programmers. Aided by processor speed improvements that enabled increasingly aggressive compilation techniques, the RISC movement sparked greater interest in compilation technology for high-level languages.

Language technology continued along these lines well into the 1990s. However, the adoption of languages has always been driven by the adoption of new computer systems, and in the mid-1990s one of the most important new systems in computer history suddenly exploded in popularity.

The 1990s: the Internet age

The rapid growth of the Internet in the mid-1990's was the next major historic event in programming languages. By opening up a radically new platform for computer systems, the Internet created an opportunity for new languages to be adopted. In particular, the Java programming language rose to popularity because of its early integration with the Netscape Navigator web browser, and various scripting languages achieved widespread use in developing customized applications for web servers. Neither of these developments represented much fundamental novelty in language design—for example, the design of Java was a more conservative version of ideas explored many years earlier in the Smalltalk community—but the widespread adoption of languages that supported features like garbage collection and strong static typing was a major change in programming practice.

Current trends in programming languages

Programming language evolution continues apace, in both industry and research. Some notable current directions include:

  • Mechanisms for adding security and reliability verification to the language: extended static checking, information flow control, static thread safety.
  • Alternative mechanisms for modularity: mixins, delegates, aspects.
  • Component-oriented software development.
  • Increased emphasis on distribution and mobility.
  • Integration with databases, including XML and relational databases.
  • Open Source as a developmental philosophy for languages, including recent languages such as Python, Ruby, the Linux operating system, Squeak (a dialect of Smalltalk), etc..

Programming language implementation

Computers cannot directly execute programs expressed in a programming language. The internal representation used by computers for data and operations consists purely of a long sequence of "on" and "off" signals, or binary code.

TODO: FILL IN

Taxonomies of programming languages

It is difficult to design a single overarching classification scheme for programming languages. Unlike the Darwinian evolution of biological species, any given programming language does not usually have a single ancestor language. More commonly, languages arise by combining the elements of several predecessor languages with new ideas in circulation at the time. Ideas that originate in one language will diffuse throughout a family of related languages, and then leap suddenly across familial gaps to appear in an entirely different family.

The task is further complicated by the fact that languages can be classified along multiple axes. For example, Java is both an object-oriented language (because it encourages object-oriented organization) and a concurrent language (because it contains built-in constructs for running multiple threads in parallel). Python is an object-oriented scripting language.

This section presents two taxonomies of programming languages: a classification by programming paradigm and a classification by intended domain of use.

See also: Curly brace family

Classification by programming paradigm

Procedural languages

Algol, BASIC, C, Fortran, Modula, Pascal

ADT-based languages: CLU?

COBOL?

Object-oriented languages

Simula, Smalltalk, Self

Eiffel, Java, Objective C

Emerald, Argus

CLOS, Dylan, Self, Cecil, BETA

See also: Scandinavian school of object-oriented programming

Functional languages

Lisp, ML, Haskell, APL, Scheme, Erlang, Clean, Miranda PHP is also emerging as a OO language. Though it is generaly a server side scripting language.

Logic and constraint languages

Logical languages: Prolog formulates data and the program evaluation mechanism as a special form of mathematical logic known as Horn logic and a general proving mechanism called logical resolution.

also called declarative languages or purely declarative languages

Prolog, Concurrent Prolog, CLP family

Clips, Jess, OPS-5

Multi-paradigm languages

Procedural languages with an object system grafted on top: Ada, C++, Delphi, Fortran 2003, Modula-3, Oberon, Python

Functional/OO hybrids: Nice, OCaml

Languages with an embedded constraint/logic programming subset: Kaleidoscope

Machine code, assembly, and compiler intermediate forms

LEFT OFF HERE

Machine language is code that is directly executable on a hardware processor, and consists of a chunk of binary data. Most people do not classify machine language as a programming language, although

Its scope is architecture-dependent. It is typically formulated as numbers expressed in octal or hexadecimal. Each group of numbers is associated with particular fundamental operations of the hardware. The activation of specific wires and logic controls the computation of the computer.

Assemblers: Assemblers are almost always directly tied to a machine language. Assembler allows these machine instructions to be written in a form readable by humans. Assembler allows a program to use symbolic addresses which become absolute addresses calculated by the assembler. Most assemblers also allow for macros and symbolic constants as well.


Classification by intended domain of use

Foundational languages (core languages, calculi)

General purpose languages

Ada, C, C++, D, Fortran

Systems languages

Ada, C, C++, D, various forms of assembly language

Scripting or embedded languages

ACS, ASP, ActionScript, AppleScript, Awk, BeanShell (scripting for Java), bash, Brain, CobolScript, csh, Dylan, E, Escapade (server side scripting), Euphoria, Groovy, Guile, ICI, IRC script, JavaScript (ECMAScript), JCL, Jython, ksh, Lua, mIRC script, Miva, Mondrian MUMPS, ObjectRexx, Perl, PHP, Pike, Pliant, Python, QuakeC, REBOL, REXX, Ruby, Scheme, ScriptBasic, sh, Shorthand Language, Simkin, Tcl, TENEX C-Shell (tcsh) , UnrealScript, VBScript, Visual DialogScript, ZZT-oop

Domain-specific languages

An important class of domain-specific languages are the database manipulation languages. SQL

... provide powerful ways of searching and manipulating the relations that have been described as entity relationship tables which map one set of things into other sets.

XML languages XSLT,XPath

PaulFord 08:37, 11 Mar 2004 (UTC): Just wrote Domain-specific language before I looked at this rewrite page; perhaps it could be useful to you.

Hardware Description Languages which are used to model the logic in digital electronic circuits, such as VHDL and Verilog.

Educational languages

BASIC, LOGO, *Pascal, ABC

* designed partially as an educational language

Concurrent and Distributed languages

Ada, Concurrent Pascal (by Per Brinch Hansen), Occam, Pict, SR, Java, Erlang, Joule, CORBA

Related articles

External links

Template:Major programming languages small