MUMPS syntax
MUMPS is a high performance transaction processing key–value database with integrated programming language.
MUMPS allows multiple commands to appear on a line, grouped into procedures (subroutines) in a fashion similar to most structured programming systems. Storing variables in the database (and on other machines on the network) is designed to be simple, requiring no libraries and using the same commands and operators used for working with variables in RAM as with data in persistent storage.
History
There have been several revisions to the MUMPS language standard between 1975 and 1999. The basic language structure has remained constant. MUMPS was early used for multi-user and multi-tasking work. Today, a PC running MUMPS can behave much as a large minicomputer of former years. Early versions of MUMPS did not require large memory or disk capacities and so were practical on smaller machines than some other systems required.
Whitespace
In MUMPS syntax, some spaces are significant; they are not merely whitespace. Spaces are used as explicit separators between different syntax elements. For example, a space (called ls in the formal MUMPS standard) separates a tag on a line from the commands that make up that line. Another example is the single space that separates a command from the arguments of that command. If the argument is empty, the command is considered to be "argumentless". This means this a context in which a pair of spaces has a different syntactic significance than a single space. One space separates the command from its argument, and the second space separates this command from the next command. However, extra spaces may always be added between commands for clarity because in this context the second and more spaces are not syntactically significant, up to the line length limit in an implementation. The end-of-line characters are syntactically significant, as they mark the end of line scope for IF, ELSE, and FOR commands. In contrast to other languages, carriage returns and linefeeds are not the same as white space; they are terminators of a line. Where some languages have a requirement to put semicolons at the end of commands, MUMPS uses the space or line-terminator to end the command. While other languages have larger ways of grouping commands, such as statements and blocks, MUMPS does not have these, only the line scope. Unlike Fortran and some other languages which had fixed-length lines, lines have variable length up to the limit of the implementation. There is no explicit way to extend or continue a line.
Routines
A typical M procedure (a "routine" in MUMPS terminology) is analogous to a source file in C (in that the subroutines and functions relevant to a particular task or category are grouped together, for instance) and consists of lines of MUMPS code. Starting a line with a label instead of whitespace creates a tag which can be used as the target of a goto, procedure call or function call (functions return values, procedures do not). The label can be used from outside the parent routine's scope by adding the routine name separated by a caret character <label>^<routine> (e.g. SUBRTN^ABC
).
A routine file might look like this (for a routine called 'sampleproc'):
sampleproc(z) ; a sample routine
write "This is a sample procedure",!
new a,b,c
dosets set a=10,b=20,c=30
do subproc(b)
if z set c=a+c+z
quit c
subproc(y) set a=(a+y)*2 quit
In this case, labels have been attached to the first, fourth, and eighth lines, creating subroutines within the parent routine. The fifth line makes a subroutine call within the same routine, to a subroutine called 'subproc'. It is also possible for any other program to call that subroutine by fully specifying it, as do subproc^sampleproc(argument)
. Even though the fourth line appears to be a continuation of the subroutine 'sampleproc()', it can still be called from other routines with do dosets^sampleproc
, and execution will continue with the first part of sampleproc() ignored.
Even though sampleproc is defined as needing an argument, dosets is not, so you would not pass any arguments to dosets. MUMPS also allows the programmer to jump to an arbitrary line within a subroutine. do sampleproc+3^sampleproc
is equivalent to do dosets^sampleproc
. Of course, z would have to be defined before calling dosets and likewise anyone calling subproc would have to have created 'a' already as it is referenced but is not declared as an argument.
Variables and datatypes
MUMPS does not require declaration of variables, and is untyped: all variables, including numbers, are effectively strings. Using variables in a numeric context (e.g., addition, subtraction) invokes a well-defined conversion in case the string is not a canonical number, such as "123 Main Street".
MUMPS has a large set of string manipulation operators, and its hierarchical variable management system extends to both RAM-based and disk-based variables. Disk resident (i.e., database) variables are automatically stored in hierarchical structures. Most implementations use caching, node indexes and name compression to reduce the time and space cost of disk references.
All variables are considered to be sparse arrays. In a MUMPS context, this means that there is no requirement for sequential nodes to exist — A(1), A(99)
and A(100)
may be used without defining, allocating space for, or using any space for nodes 2 through 98. Indeed, one can even use floating-point numbers and strings (A(1.2)
, A(3.3)
, A("foo")
, etc.), where the subscript names have some meaning external to the program. The access function $ORDER ( A(1.2) )
returns the next defined key or subscript value, 3.3 in this example, so the program can readily manage the data. Subscripts are always returned (and usually stored) in sorted order.
Given their sorting and naming features, it's not uncommon for subscripts and variable names to be used as data stores themselves, independent of any data stored at their locations. This feature is often used for database indexes. E.g., SET ^INDEX(lastname,firstname,SSNumber)=RecordNum
.
Global database
The MUMPS term globals does not refer strictly to unscoped variables, as in the C tradition. MUMPS Globals are variables which are automatically and transparently stored on disk and persist beyond program, routine, or process completion. Globals are used exactly like ordinary variables, but with the caret character prefixed to the variable name. Modifying the earlier example as follows:
SET ^A("first_name")="Bob"
SET ^A("last_name")="Dobbs"
results in creation of a new disk record, which is immediately inserted within the file structure of the disk. It is persistent, just as a file persists in most operating systems. Globals are stored in structured data files by MUMPS, and accessed only as MUMPS globals. MUMPS has a long history of cached, journaled, and balanced B-tree key/value disk storage, including transaction control for multiple file transaction 'commit' and 'roll-back' at the language/operating system level. Real-world databases can often grow unpredictably (such as having 20 patients with a last name of 'Anderson' before you get any with surnames starting with 'B'), but modern MUMPS implementations are designed to structure the database efficiently as it grows.
For all of these reasons, one of the most common MUMPS applications is database management. MUMPS provides the classic ACID properties as part of any standard MUMPS implementation. FileMan is an example of a DBMS built with MUMPS. The InterSystems Caché implementation allows dual views of selected data structures—as MUMPS globals, or as SQL data—and has SQL built in (called M/SQL).
Since MUMPS's global variables are stored on disk, they are immediately visible to and modifiable by any other running program once they are created. RAM-based variables, called locals are only visible inside the currently running process, and their value is lost when the process exits. The scope of local variables is determined by using the 'new
' command to declare the variable. Declaration is optional - an undeclared variable is in scope for all routines running in the same process. A declared variable is accessible at the stack level it was declared, and remains accessible as long as that stack level exists. This means that a called routine has access to the variables available in its calling routine. Using the 'new
' command, a routine can redeclare variables its caller might have created, and thus prevent itself from modifying them. It cannot prevent routines it calls from modifying its own variables, so good MUMPS programming practice is to have every routine 'new' the variables it uses.
Multi-user, multi-tasking, multi-processor
MUMPS allowed multi-user operation at a time when memory was measured in kilobytes, processor time was scarce, and processors themselves were considerably slower than those found today. Many MUMPS implementations included full support for multi-tasking, multi-user, multi-machine programming even when the host operating system itself did not. For instance, a single PC running MUMPS under MS-DOS and equipped with multiple RS232 ports, behaved as a large minicomputer serving multiple ASCII terminals, with proper data sharing and protection.
The following code demonstrates how to alter data on other computers on the network:
SET ^|"DENVER"|A("first_name")="Bob"
SET ^|"DENVER"|A("last_name")="Dobbs"
which gives A a value as before, but this time on the remote machine "DENVER".
Another use of MUMPS in more recent times has been to create object databases. The InterSystems Caché implementation, for instance, includes such features natively.
MUMPS can generate text in HTML or XML format as well, and can be called via the CGI interface to serve web pages directly from the database. It can also be used as a backend for web applications using AJAX background communication.