Network visualization showing the most important influence for any single programming language. The structure of the tree corresponds to a classification of languages (e.g., or language families). We use the method for network analysis proposed in Valverde and Solé (2015) "Punctuated Equilibrium in the Large-Scale Evolution of Programming Languages" Journal of the Royal Society Interface.
Graph rendering uses vis.js and the layout was precomputed with NetlabScript. This website was designed and implemented by Sergi Valverde (2015).
More information at: complex.upf.es/~sergi and @svalver.
This group of languages have been historically associated to Lisp. They display a common trait called functional paradigm. Functional and logical languages belong to the larger class of declarative languages. Unlike the more common imperative paradigm (with Fortran or C as examples), the declarative approach does not describe the process to control computation. Instead, a program written in a declarative language is an abstract description of the task to be solved. Indeed, the design of Lisp was a radical departure from PLs based on Turing machines. The goal was to find easy ways to write, understand and maintain complex software. On the other hand, programmers find more difficult to develop user interfaces and certain processes in declarative languages than imperative languages. Functional languages, although more emphasised in academia rather than in commercial software development, has been quite influential for developing AI applications and in modern, multi-paradigm languages.
Logo is an educational programming language designed in 1967 by Daniel G. Borrow, Wally Feurzeig, Seymour Papert and Cynthia Solomon in order to teach concepts of Lisp programming. Logo is a dialect of Lisp and it more commonly known because of its graphics capabilities involving the movement of robotic turtles (early versions of Logo) and virtual turtles in the screen (recent versions). This subfamily of functional languages includes the Logo dialects StarLogo (designed by Mitchel Resnick, Eric Klopfer and others at the MIT Media Lab) and Netlogo (by Uri Wilensky of Northwestern University) as well as other educational-related languages, i.e., Scratch (by Mitchel Resnick), Etoys (by Alan Kay) and Agentsheets (University of Colorado).
Forth (designed by Charles H. Moore in the 1970’s) and its dialects colorForth (also designed by Charles H. Moore in the 1990’s) and MUF (Piaw Na, 1990) have been influenced by mathematical notation and provide tools for data processing. Forth is modelled after a stack machine and reverse Polish notation for mathematical expressions (a shared featured with Lisp). Important languages derived from Forth include Postscript (a language for creating vector graphics) and RPL (used in Hewlett-Packard’s handheld calculators). The design of stack-based languages follow a minimalistic philosophy, and they are relatively syntax-free. This trait has influenced the design of a number of esoteric programming languages (False) that allow very compact implementations.
This cladogram includes the Smalltalk language designed by Alan Kay, Dan Ingalls and Adele Goldberg and its dialects Squeak (designed by Alan Kay, Dan Ingalls and Adele Golberg), Self (designed by David Ungar and Randall Smith) and NewtonScript (designed by Walter Smith at Apple and heavily influenced by Self). Smalltalk combined the conceptual framework of object-orientation (first described by the object-oriented Simula, see below) with the functional approach of Lisp. Smalltak, like Ruby (a modern language influenced by Smalltalk), inherited two of their most powerful features (i.e., code blocks and closures) from Lisp. Closures were invented by Peter J. Landin in 1964 and adopted later by Scheme in 1975. Smalltalk has been an important influence in more recent developments of functional languages including Etoys (also designed by Alan Kay and classified in the Logo family, see above) and Lisaac (designed by Benoit Sonntag and influenced by Self).
This subtree includes the two main Lisp dialects, i.e., Scheme and Common Lisp. Scheme was designed by Guy L. Steele and Gerald Jay Sussman in 1975 and it is characterised by its minimalist design philosophy. The spread of many Lisp dialects in the 1970s lead to many compatibility problems. Common Lisp was created by the ANSI X3J13 committee in 1984 in order to remedy this situation. The design of Common Lisp is the combination of the many features coming from early Lisp dialects (including Scheme) into a single, rich-featured language. Common Lisp has not been very popular (unlike Scheme) perhaps because of its complexity. Although Common Lisp features have been imported to other languages, it has no descendants in this subtree. Another standardisation effort was EuLisp (1990), from an European consortium of industrial and academic partners.
Ruby was designed and developed in the 1990’s by Yukihiro Matsumoto in Japan. Although Ruby is regarded largely as an imperative language, it is a multi-paradigm language that combines elements of Perl, Smalltalk, Python, Lisp, Dylan, and CLU. All these ancestors have functional traits. For example, one of the most powerful Ruby features is the concept of ‘block’, which is the equivalent of Lisp closures. Dialects of Ruby include Interactive Ruby Shell, Fancy (an object-oriented language influenced by Ruby and Smalltalk) and Reia (discontinued).
The tree rooted in Speedcoding (the ancestor of Fortran) represents the largest group of languages in our dataset. The imperative backbone accepts a natural division in two coherent subtrees: many languages in the same subtree display common traits, i.e., structured programming (on the left) or object-oriented programming (on the right). This tree maps important innovations in the history of programming languages, which are associated to (1) the decoupling of software and hardware and (2) the transition from special-purpose languages to general-purpose, high-level languages.
An important subset of functional languages including ML, Clojure, OCaml, Haskell and F# appears here because the imperative ISWIM (an Algol-derived language including many functional elements) was a strong influence. For example, ML is a functional language with a syntax that closely follows the imperative paradigm.
The roots of the personal computing revolution can be traced to the time-sharing tech- nology and the associated language JOSS (acronym for JOHNNIAC Open Shop systems). In the 1960s, computers were very expensive machines and they have to be shared among many users by means of time-sharing. JOSS was one of the first interactive languages designed by Cliff Shaw at RAND in 1966. JOSS influenced other time-sharing languages: TELCOMP (used in a commercial timesharing service by BBN in 1965), MUMPS (acronym for Massachusetts General Hospital Utility Multi-Programming System, 1966), and FOCAL (a language very similar to JOSS and designed by Richard Merrill in 1968).
This subtree captures the initial explosion of PL diversity. During 1956 and 1957, a programming revolution began. This early period was characterised by an explosion of PL diversity: almost every new computer was accompanied by its own specific language or dialect of a previous language. However, the strong hardware-software coupling means that many of these designs are now extinct because they belong to obsolete computer architectures. Fortran (designed by John Backus in 1957) was the first successful attempt to liberate programmers from the clumsiness of specific hardware details.
Fortran increased programmers’ efficiency by focusing in high-level algorithmic issues. And very soon, many followers imitated (and adapted) Fortran innovations. For example, over 200 high-level languages were developed between 1952 and 1972. In 1958, an international consortium of computer experts recognised the need for unifying this artificial ‘Tower of Babel’ in a universal, standardised approach for communicating software among users and computers. This committee created the language IAL (or International Algebraic Language), and which latter was renamed to Algol (for Algorithmic language), in an effort to design a language that included key programming innovations. The efforts of this consortium lead to several versions of elegant languages, each with more features than its predecessors, i.e., Algol-58 (1958), Algol-60 (1960) and Algol-68 (1968). And although the Algol family was never highly popular among programmers, its innovations were adopted by many imperative languages (this pattern is consistent with their large "offspring").
The innovations introduced by Algol made the complexity of programming manageable, e.g., structured programming allows programmers to decompose any complex task in a number of simpler sub-tasks and so on. These innovations were further extended by the last member of the family released in 1968. However, the high degree of complexity associated to Algol-68 discouraged many programmers. Competition and extinction are common patterns in the history of programming languages: high complexity of known languages yields to new, simpler approaches. Programming languages without significant community support tend to experience less development and they have an increased risk of abandonment. For example, the language Pascal (developed by Niklaus Wirth) was a simplified design of Algol-60. Pascal and its dialects Modula/Modula-2 (also created in 1978-1985 by Niklaus Wirth) were very successful and widely used (in particular for teaching programming). Wirth also developed Algol-W, Oberon (an extension to Algol-60), Oberon-2 and Oberon-07. However, like its predecessor Algol, the language Pascal (and its offspring) was be replaced by a stronger competitor: the C programming language. In 1972, the emergence of C and Unix (and the downfall of the Pascal family) signals the transition from early imperative languages to modern imperative languages.
Computer technology has evolved to be easier for people to use. The evolution of modern languages has a strong social component and the success of programming languages is not always related to specific language innovations. Historical accidents, like in biology, have played an important role. An interesting example is the language C, one of the most successful languages in history. Large parts of the Unix operating system were rewritten in C during the 1970s. Although C was simultaneously efficient (the design of C was influenced by Fortran) and expressive (C inherited features of Algol-68), it was not the best design at the time. Much more important was the decision of developing a successful operating system (Unix) with a high-level programming language that was not tied to any specific hardware platform. This made Unix much easier to implement for different computers. Previous operating systems were developed with low-level languages and they were much more difficult to port to other platforms. Ease of portability increased the adoption and diffusion rates of Unix in communities of programmers, which in turn learned its programming environment, i.e., the C language. That is, language and operating system co-evolved together and increased their spread and cultural diffusion. Today C is one of the most popular languages and many software developments take place in C or in one of its derivatives.
Although C is a general-purpose, imperative language, many descendants are multi-paradigm. The coexistence of several paradigms in the same language enables programmers to pick the most adequate technique in every situation. An example is the language C++ (initially named as ‘C with classes’) which combines the imperative nature of its ancestor C with the object-oriented (OO) paradigm of Simula. Moreover, the OO paradigm is a common trait shared by imperative and functional languages (i.e., Smalltalk). The presence of analogous structures in different programming styles suggests that OO is an interesting example of convergent evolution in technology.
Array programming defines high-level operations that apply at once to an entire collection of values (e.g., matrix and vector operations). This enables programmers to express complex calculations in a more concise and elegant way than using the imperative paradigm Canonical examples of array languages include APL and J. These languages have a natural application in scientific and engineering settings. Examples in this category include: S (a statistical programming language), SAC (numerical applications and efficient array processing), Mathematica and MATLAB. Many other languages have extended their functions with array operations. For example, Perl Data Language (1996) provides a number of array programming extensions to the Perl programming language. Another example is the Hartmann pipeline, a data flow language strongly influenced by APL, which extended pipeline programming (Unix) to allow complex data processing.
These languages have been designed (or have specific traits) for business applications. Flow-matic (originally B-0 or Business Language version 0) developed by Grace Hopper was the first data processing language designed for business. The focus of business languages is on readability by customers and they use English-like expression instead of machine-oriented syntax. Flow-matic was an important influence in the design of COBOL (Common Business-Oriented Language, 1959), a standardised language widely used in business, finance and administration. The Gartner Group estimated that, in 1997, about 80 percent of business applications run on Cobol, amounting to more than 200 billion lines of code, with 5 billion new lines of code created each year.
This huge success of COBOL contrasts with the small number of descendants in the cladogram. Perhaps this phenomenon can be explained by the isolation of the COBOL community from the computer science community. Like other languages designed by consortiums (e.g., PL/I and Algol), COBOL was designed by a group of professionals (i.e., the CODASYL consortium) in an effort to create a common, hardware-independent language that decreased the huge costs of data processing in the business environment. No academic computer scientist participated in the design of COBOL. Computer scientists at that time were much more interested in other domains. Moreover, they perceived COBOL as an example of poor, obsolete design.
External factors other than specific language characteristics (e.g., business orientation) determine the adoption of PLs, like their exploitation by other communities of practice. For example, the English-like syntax of business languages was inherited by the special-purpose language Tutor (also known as PLATO Author Language[17]). Tutor was initially developed by the University of Illinois at Urbana Champaign to facilitate computer assisted instruction (CAI) and later evolved into a general-purpose language because of the added degree of expressiveness,
Languages in this subtree can be associated with the paradigm of logic programming. This is an important subclass of declarative languages that has its roots in first-order logic. Logic programming does not express algorithms as a collection of procedures or functions but as logical statements. Conceptually, the declarative approach is very different from the imperative approach because the programmer specifies what, not how, she wants to achieve with the computer. Prolog (1972) was one of the first logic languages and remains one of the most popular logic languages to this day. Prolog descendants have been used in many different application domains: Visual Prolog (an object-oriented extension of Prolog), Mercury (a subset of Prolog aimed to develop real-world applications), CLACL (a logic language used in creative fields like music and design) and Datalog (a subset of Prolog used as a query language).
These languages represent a very different class of languages that cannot be strictly classified neither as imperative nor functional. Instead of describing an algorithm or a function, definition languages describe data structures. HTML (HyperText Markup Language) is one of the most popular examples in this class because HTML is used to create web pages. IBM’s Generalized Markup Language (GML) is the root of this subtree, which was developed during the early 1970s. Using GML, the different parts (paragraphs, headers, tables, etc) of text documents are ’marked up’ using tags. GML is a (so-called) document markup language, i.e., a language that enables the formatting and layout of documents in a way like the handwritten annotations added to a manuscript by a copy editor. Unlike GML and LaTeX, SGML (Standard Generalized Markup Language, 1986) is a standardised meta-language that can be used to define the rules followed by an arbitrary markup language. SGML is the basis for two popular markup languages: HTML and XML (1998). HTML and XML are subsets of SGML having with different goals. HTML tags are predefined by the HTML language. In SGML and XML, we can use arbitrary tags to describe the relationship between content and structure. This added flexibility means that XML can replace HTML in the future as the markup language for the Web. This trend is consistent with the asymmetry of this subtree: the XML branch appears to be more developed than the HTML branch.