Super Prev Next

The Lexis

This chapter describes the lexical elements of Modula-2 source code including white space, comments, source code directives, and tokens. Definitions of tokens which are used in concrete syntax in chapter ??? are given in appendix ???. The symbols and pervasive identifiers of Modula-2 are defined in this chapter as well.

Super Prev Next

Source Code Structure

A Modula-2 source code is a sequence of tokens and separators. An arbitrary number of separators may appear between any two tokens, precede the first token and follow the last token.

All tokens fall into the following categories:

Separators include white space, comments, and source code directives.

NOTE: A token is the largest sequence of characters that satisfies the definitions. Consequently, a word token must be separated from a following word token, numeric literal, string literal or character number literal. Furthermore, a numeric literal must be separated from a following numeric literal or character number literal.

Super Prev Next

Word Tokens

Super Prev Next


An identifier is a list of alphanumeric and low line ("_") characters starting with a letter. The ISO Standard permits an implementation-defined set of national alphanumeric characters to be used in identifiers. XDS defines this set as empty.

Identifiers are used to denote predefined and user-defined elements of a program, such as constants, types, variables, procedures, and modules; they are case sensitive.

pervasive identifier=
"CAP" | "CHR" | "CHAR" | "COMPLEX" |
"CMPLX" | "DEC" | "DISPOSE" | "EXCL" |
"FALSE" | "FLOAT" | "HALT" | "HIGH" |
"IM" | "INC" | "INCL" | "INT" |
"NEW" | "NIL" | "ODD" | "ORD" |
"VAL" ;

NOTE: Pervasive identifiers are not reserved words; if redeclared in a program, they will no longer have their predefined meaning in the scope of the redeclared identifier.

Super Prev Next


Keywords in fact are identifiers reserved by the language for special use. They can not be redeclared, unlike the pervasive identifiers.

"AND" | "ARRAY" | "BEGIN" | "BY" |
"DO" | "ELSE" | "ELSIF" | "END" |
"FOR" | "FORWARD" | "FROM" | "IF" |
"MOD" | "MODULE" | "NOT" | "OF" |
"REPEAT" | "RETURN" | "SET" | "THEN" |
"TO" | "TYPE" | "UNTIL" | "VAR" |
"WHILE" | "WITH" ;

NOTE: The keywords AND, DIV, IN, MOD, NOT, OR and REM are operator keywords; the rest are punctuation keywords.

Super Prev Next

Symbols and Operators

symbol= required symbol | symbol with alternatives | operator ;

Super Prev Next

Required Symbols

":" "," ".." "=" "." ";" "(" ")"

Super Prev Next

Symbols with alternatives

"[" "(!"

"]" "!)"

"" "(:"

"" ":)"

"|" "!"

Super Prev Next


":=" "+" "-" "*" "/" "DIV" "MOD" "REM" "NOT" "~" "OR" "AND" "&"

"=" "<>" "#" "<" ">" "<=" ">=" "IN"

"^" "@"

Super Prev Next

Constant Literals

Super Prev Next

Whole Number Literals

There are three types of whole number literals: decimal, octal, and hexadecimal.

Decimal number is a list of digits (0-9).

Octal number is a list of octal digits (0-7), followed by a "B".

Hexadecimal number is a list of hexadecimal digits (0-9,A-F), followed by an "H". It must start with a digit.

Super Prev Next

Real Literals

Real literal is a list of digits, followed by a ".", optionally followed by a list of digits and/or a scale factor. A scale factor is a "E", followed by an optional sign and a list of digits.

Super Prev Next

String Literals

There are two types of string literals: quoted strings and character number literals.

A quoted string is a sequence of characters surrounded by either single ("'") or double ('"') quotes. It cannot contain both single and double quote characters (since one or the other will mark the end of the string). A quoted string cannot cross a line boundary.

A character number literal is a list of octal digits representing its value, followed by a "C".

Super Prev Next


Separators are white space, comments, and source code directives. Separators may appear anywhere in the source code, provided that they do not delimit tokens.

White space include space character, line separator and implementation-defined character sequences. XDS recognizes the tab character as white space.

Comments are started with "(*", ended with "*)", and may contain arbitrary characters. Comments may be nested.

NOTE: The addition of comment delimiters around program text that has quoted strings containing comment delimiters may result in a different program from that which was intended (and probably in an incorrect one).

Source code directives are started with "<*" and ended with "*>". The ISO standard does not define directive syntax.