Ericsson

Ericsson AB    
Home
 
· Home
· Product Info
· Licensing
· Consulting
· Training
· Documentation
· Publications
· Contact Info
· Licensees Area

· User Conferences
· Workshops
· Open Source

For comments or questions about this site, contact us.
     

Technical description of
the Erlang programming language

Last revised May 9, 2000

This page is a brief technical description of Erlang, a functional programming language. Erlang related tools, strategic discussions, methods and philosophical aspects are omitted. It focuses on what Erlang is, rather than why.

Contents

1.0 Virtual Machine
2.0 Programming paradigm
3.0 Erlang problem domain
4.0 Concise summary of properties
5.0 Program examples

1.0 Virtual Machine

More and more programs are communicating over the Internet or intranets with TCP/IP as the de facto standard protocol. This has raised the need to be able to execute the same program on many different hardware platforms. This has brought an old idea to new light - virtual machines.

Instead of compiling the source program to machine instructions for the target processor it is compiled to instructions for a virtual machine that does not exist in hardware. The generated object code is executed by a program called an emulator. This is a program that emulates a virtual processor. Examples of this technique are the old P-code Pascal, Smalltalk and the Java machine.

The immediate advantage with this solution is portability. You can transfer the object code to every machine on the network that has an Erlang emulator installed. This makes code updates easy. The drawback is reduced execution speed. A program that is executed by an emulator is slower than the same program executed directly by the target processor. But the performance is not as bad as it sounds. We compile our target language to machine instructions for an ideal machine specific for that language. One instruction for the virtual machine may result in the execution of tens to several hundreds of native instructions in the emulator-reducing the overhead for decoding virtual instructions.
 The speed of a program is no longer as simple as calculating the number of instructions executed. Compiling a high-level language directly to machine instructions for the target processor may result in a huge object code size. This will result in many cache misses. Small object code and a small emulator result in fewer cache misses.

The Erlang BEAM emulator has been proven to give acceptable speed for the kinds of application Erlang has been designed for. There is also continuous work going on for improving the speed of the emulator as well as research activities like compilation to C as well as "just in time" compilation to target machine language.

2.0 Programming paradigm

Functional languages have mathematical properties, which give good control over program execution. Lack of (most) side effects and referential transparency provides well defined behaviour. A pure function must return exactly the same value each time it is called with the same set of parameters. When developing concurrent systems the level of problem complexity increases, especially when distributed systems are developed. Reduced complexity and a better overview are required, and are achieved with this paradigm. Declarative programming concerns what to do, not how to do it.

3.0 Erlang problem domain

The language is general and does not lack any properties confining it to a small set of applications. Erlang/OTP provide a soft real-time kernel. Real time requirements of approximately 5-10 ms give an idea of where Erlang has its use. Heterogeneous environments, different hosts and/or sources to be integrated, resource allocation, database management, client/server applications are examples where Erlang is used.

4.0 Concise summary of properties

A brief technical introduction to the Erlang language is given. These properties are further explained in chapter 5, where program examples are given.

Declarative syntax

The declarative syntax makes programs short. Memory management or pointers are not explicitly controlled by the programmer. Erlang is a dynamically typed language, and data types are not declared by the programmer. Global variables do not exist, and values of variables can not be redefined (single assignment). Erlang is largely free from side effects.

Repeat by recursion

Recursive programs are nice for several reasons. They evaluate in a mathematically pure, well defined way. Correctness is more easily verified (and proven). They are short and are easy to understand by other programmers.

Selection by pattern matching

Choice in Erlang is provided by pattern matching. Erlang has no ordinary variables. If a variable X is matched to a value (4 or {a,b,c} for instance), it will succeed if X is either previously unbound, or previously bound to the same value. Once a variable has been assigned a value, the value can never be changed (single assignment). Tests of partial state are extremely easily made by the programmer. The art of structuring program code into alternative execution threads in a clear way is encouraged.

Concurrency

Concurrency is explicitly handled by the programmer. Erlang has asynchronous message passing with selective message reception. Parallel Erlang processes are dynamically created and monitored using built in functions.

Transparent distribution over a network of processors

Developing distributed programs is probably one of the hardest challenges for system developers. Erlang has facilities for communication between Erlang nodes (= the operating system process that runs the emulator). Communication between Erlang processes (with Erlang messages) it is transparent if they are located on the same Erlang node or not. Each Erlang process gets a unique process identifier that can be used as its address when sending messages. This process identifier can be used locally within an Erlang system as well as remotely to send messages from one Erlang process to another. Erlang/OTP also provides libraries for other external communication. There are for instance primitives for socket communication which make it straightforward to communicate remotely with other sockets (independently of their implementation), from an Erlang/OTP application.

Code loading

Code can be changed in running systems. Two compiled versions of the same Erlang module can reside simultaneously in the system. This allows code to be updated, corrected or completely changed in a controlled way without stopping the application.

Fault-tolerance

Despite developing systems with Erlang, programs with errors will still be written. Syntax errors are detected by the compiler. Logical errors resulting from an unclear or inaccurate implementation of a specification can only be detected by extensive compliancy tests. Other errors come to light as run-time errors. Erlang provides mechanisms for detection and containment of failures. These mechanisms can be used to design robust and fault-tolerant systems.

Real time behaviour

One of the requirements of the Erlang language was that it should be suitable for soft real-time applications where response times are in the order of milliseconds. Real-time behaviour is an implementation-dependent issue, but there are some criteria all implementations satisfy: any process which can be run will be run; no process will be allowed to block the machine for a long time; a process is allowed to run for a short time, called a time slice, before it is re-scheduled to allow another runnable process to be run; processes can be set to run at one of two different priorities. Another important feature for Erlang systems used for real time applications is memory management. Erlang hides all memory management from the user. Memory is "automatically" handled by a real-time garbage collector without blocking the system.

Object-Oriented programming with Erlang

Erlang is object-based rather that object-oriented. This means that Erlang provides data encapsulation and polymorphic functions. One benefit of not having an OO mechanism with classes and inheritance built in to the language is the ability to modify the OO system to suit the application using it. It is easy to write Erlang code in a style which provides a powerful OO system with features like complete encapsulation of objects, interactive modification of classes and inheritance.

5.0 Program examples

These examples should illustrate the properties mentioned in the previous sections and give a view of the language. For a complete description of the language, consult the book "Concurrent Programming in Erlang", ISBN 0-13-285792-8 Prentice Hall.

5.1 Declarative syntax

In Erlang data types are not declared by programmer. There are no pointers and memory is handled by the Erlang runtime system. A comparison between C- and an Erlang code that does exactly the same thing:

struct address {
  char *street;
  int number;
};
struct person {
  struct address *addr;
  char *telno;
};
char *check_malloc(size)
{
  char *s;
  if ((s = (char *) malloc(size)) == 0)
    exit(1);
  return(s);
}

struct address *make_address(street,num)
char *street;
{ 
  struct address *a;
  a = (struct address *)
   check_malloc(strlen(street) + 1);
    strcpy(a->street,street); a->number = num;
  return a;
}
struct person *make_person(addr,tel)
struct address *addr; char *tel;
{
  struct person *p;
  p =
  (struct person *)
   check_malloc(sizeof (struct person));
  p->telno =
   (char*) check_malloc(strlen(tel) + 1);
  p->addr = addr; strcpy(p->telno,tel);
  return p;
}
do_something()
{ 
  struct person *p;
  p = make_person(
   make_address("Big Street",23),"124679");
}

The Erlang way of declaring and allocating memory for that structure corresponding to the C code above would be:

X = {person,
     {address, "Big street", 23},
     {telno, {1,2,4,6,7,9}}}.

Since memory and pointers are handled by the Erlang system, space leaks or pointer errors are impossible. This is a strong argument for declarative programming. Erlang is equipped with a real-time garbage collector that will dynamically free "old" memory.

5.2 Repeat by recursion

-module(mathStuff).
-export([factorial/1, list_length/1]).

factorial(0) -> 1;                                    
factorial(N) when N > 0->
  N * factorial(N-1).

list_length(List) ->
  list_length(List,0).

list_length([El | Rest],N) ->
  list_length(Rest,N+1);
list_length([],N) -> N.

Loops or repetitive behaviour is implemented in Erlang with recursive function calls. In the example above the module mathStuff defines three functions of which two are exported. The first one is factorial/1 (1 denotes arity of the function) which will calculate the factorial of any positive integer given as argument. factorial(4) for example, will evaluate to 4 * factorial(3), which will evaluate to 3 * factorial(2) which will evaluate to 2 * factorial(1) which will evaluate to 1 * factorial(0) which will evaluate to 1 *1. Now we have got the first complete result which is 1, which is result of factorial(0), and we pass the result successively back again and can finally calculate 4*3*2*1*1 which is 24. For each loop the computer stack increased since we have unfinished calculations until the base clause was reached. But the factorial function can be written as a tail recursive function which will make the function evaluate differently (to same result of course). Tail recursion! is described below where second exported function is used as example.

The second exported function list_length/1, takes a list as argument and will calculate its length. This example differs from that above in that this illustrates a tail recursive function. With tail recursive programs Erlang use last-call optimization to let programs be evaluated in constant space (no growing stack). The function list_length/1 calls the function list_length/2 (another function!) with the list and the initiated integer value 0 used to denote the length of the list measured "so far". This parameter is called the accumulating result parameter. In this example all operations can be calculated immediately, and we do not need to wait for the result from new function calls (to same or other functions). So for each element in the list we increase the "length so far"-parameter before calling next (same) function. When the list is finally empty the length-value that we updated during the evaluation is returned. list_length([a,b,c]) will evaluate to list_length([a,b,c],0) which will evaluate to list_length([b,c],1) which will evaluate to list_length([c],2) which will evaluate to list_length([],3) which will evaluate to 3.

5.3 Selection by pattern matching

-module(geometry).
-export([area/1]).

area({square, Side}) ->
  Side * Side;
area({circle, Radius}) ->
  math:pi() * Radius * Radius;
area({triangle, A, B, C}) ->
  S = (A + B + C)/2,
  math:sqrt(S*(S-A)*(S-B)*(S-C));
area(Other) ->
  invalid_object.

The geometry module above has one function divided into four clauses. In this way a function can be written to act differently depending on the arguments passed. New clauses are easily added without modifying old code. This is advantageous for maintenance reasons, when we need to be able to keep control over code dependencies and relations. The last clause where the variable Other will match any passed parameter, will take care of the case when none of the previous clauses matched the pattern of the argument passed. In that case the return value of calling area/1 will be invalid_object. It may be preferred to omit the last clause making the function undefined for other calls than those matched by the three first clauses.

5.4 Concurrency

Concurrent (parallel) activities are controlled with a small set of built in functions. This example shows a program where two new processes are being spawned (created). The result of spawning a new process is a unique process identifier which is used as an address when sending messages to the process. Messages are the way processes communicate. The function start takes an integer as argument. The return value of calling the function start is started. When the two new processes are started a ping-pong action is triggered by the initiating process. The ping-pong messages are sent forth and back as many times as specified by the integer, before the processes terminate. Each time a process receives "start/ping/pong message" it is printed using the io:format/2 function.

-module(t).
-export([start/1,p/1]).
start(N) ->
  Pid1 = spawn(t,p,[N]),
  Pid2 = spawn(t,p,[N]),
  Pid1 ! {start,Pid2},
  started.

p(0) ->
  io:format("~w terminating~n",[self()]);
p(N) ->
  receive
    {start,Pid} ->
      io:format("~w got start~n",[self()]),
      Pid ! {self(),ping},
      p(N);
    {From,pong} ->
      io:format("~w got pong ~w~n",[self(),N]),
      From ! {self(),ping},
      p(N-1);
    {From,ping} ->
      io:format("~w got ping ~w~n",[self(),N]),
      From ! {self(),pong},
      p(N-1)
  end.

5.5 Built in distribution & Error handling

A well-known technique for programming distributed systems is the Remote Procedure Call (RPC). A function (task) is evaluated (performed) transparently on a remote node. To implement this in Erlang we could have a server with a registered process named rpc_server (for instance) running. The complete code for the server is:

-module(server).
-export([start/0,loop/0,reply/4]).

start() ->
  register(rpc_server,spawn(server,loop,[])).

loop() ->
  receive
    {From,{apply,Mod,Fun,Args}} ->
      spawn(server,reply,[From,Mod,Fun,Args]),
      loop();
    _ ->
      loop()
  end.

reply(To,Mod,Fun,Args) ->
  To ! {rpc_server,catch apply(Mod,Fun,Args)}.

The function start/1 spawns a new process to start executing the loop/0 function. The return value of spawning that process is a process identifier which is registered with the name rpc_server. The name rpc_server can then (as well as the pid) be used by other processes to send messages. The process identifier of a process is unique in a distributed Erlang system and can be used in the same way to address either local or remote processes. Registered process names are local for each Erlang node but can be addressed remotely by combining the process name with the name of the Erlang node as shown in the client code below. When the server receives an RPC-message a new process is spawned to take care of the request. This way the server process can immediately receive another request without blocking. The complete code for a client using RPC to server is;

-module(client).
-export([call/4]).

call(Node,Mod,Fun,Args) ->
  monitor_node(Node, true),
  {rpc_server,Node} !
    {self(),{apply,Mod,Fun,Args}}, 
  receive
    {rpc_server,Result} ->
      monitor_node(Node, false),
      Result;
    {nodedown, Node} ->
      monitor_node(Node, false),
      {error,nodedown}
  end.

The client:call/4 takes as arguments: nodename, module, function and arguments it wants evaluated at the server node. A message containing information about sender and task to be performed is sent to server. The client process waits for the server to send a message back containing the result. The node where the server is residing can be monitored in case it goes down before the reply has been sent, or if communication with it cannot be established. It would also be possible to link to the server process and receive an exit message if it terminated.

The code for the RPC above could easily be enhanced to hide where (which node) the server is running, dealing with broadcasts and load-balancing for instance. Erlang System /OTP is ported to several different platforms. A distributed Erlang application is developed as easily for a heterogeneous system as for a homogenous one. So for integrating different computer environments the work is already done where there are Erlang systems available. And since it is normally an easy task to port Erlang systems to new platforms it might be a very powerful tool to develop distributed systems.

Updated: 2000-05-09