Section 2: Program Dooumentation, 280: Introduction. The computations involved in the running of the udo can be divided into several seotionsi In this implementation there is a strong notion of processing and inter~prooess storesr These alternate along the flow of control of the programi This is shown in Diagram l. Diagram 1: Udo Control FlDW§ Thus is can be seen that every input command creates no new information to be stored, the input goes through several processes and winds up being stored either in the output file or in the variable table for interrogations and assignments respectively. The input information to a process. the description of the process and the output information to be stored are given in the subsequent sections, though for more detailed descriptions of the mechanics of the processes involved, reference should be made to the pseudocode for these procedures [4}. 2.1: Preliminary Treatment. The first process is to read a single command from the entry from the input file, whether this be a stored file or the standard input file. In doing this, certain irrelevant chars are filtered out ~ tabs and spaces and certain instruction chars — semicolons and colons are acted upon and then filtered out. remaining commands inputted in the same entry are left in the input buffer or input file depending upon from where input is being sent. Senicolons mark the end of a single command within a line of several commands so this process terminates, colons result in subsequent input being taken from the next line. e.g. From the input X = : 7 7 7;y=656 only the single unspaced command ’x=777‘ is passed on, on first execution of this procedure, on second execution 'y=666’ is passed. Each char read is either registed as a passable symbol or an unpassable one, it is the passable chars that are transferred one at a time to the command storage array. After which this process terminates. 2.2: Lexical Analysis. 2.2.0: Introduction. This process lifts the symbols stored in the command storage array one at a time, performs a lexical analysis, converting the command into a representation more easily manipulated by the subsequent parsing process and then stores this representation in the token storage array. 2.2.1: What is Lexical Analysis ? To understand the concept of lexical analysis we need some preliminary definitions. Every input symbol can be regarded as a token or as part of a token. Every token is an ordered pair consisting of a lexical value ~ that is what type of symbol the token represents and a possibly redundant semantic value. Table 4 gives the lexical values and semantic definitions of the possible tokens catered for in this program. Table 4: Token Definitions. symbol lexical value semantic definition ? QM O = ASSIG O ( LPAREN 0 ) RPAREN O { LEE 0 } ass 0 [a~z] VAR [A~Z]( VEC [O~9]* NUM @ UNDEF 0 if IF 0 then THEN 0 else ELSE 0 /\ MAX 0 \/ MIN 0 “ TTPO 0 % MQD O MULT O DIV O PLUS 0 ~ if preceding token‘s lexical value is NUM, VAR? UNDEF, RBR or RPAREN; MINUS 0 otherwise: UNMTNUS < LT <= LE 2: EQ >= GE > GT & AND € ¥ +\X° OR NOT EOC ERR CDCDCDCZDCDCIDCDCZDCDCIDCD erg. Consider as an example the following input command E{3) = l23 + (a/\b) The lexical and semantic token response to this would be In addition to these tokens, which result from lexical analysis of the input, several other tokens are used by the program» These help in parsing [293] the input command. These are not derived by the lexical analysis procedure but as tokens, they are relevant to this section. Table 5 shows what these additional tokens are. Table 5: Parsewconstructed Tokensr lexical Value semantic definition purpose OPBASE O operator to sit at bottom of operator stack, that has low enough precedence as to prevent total reduction of expression stack” BUENDSYM O operator to be placed on top of operator stack, that has low enough precedence to force reduction of expression stack using up all operators on operator stack except the OPBASE operator» IFACTIQNS O This is a totally redundant token both for the user and the running of the programi It serves only to label a previously unlabeled node in the parse tree constructed by the parsing procedure, maintaining the aesthetics of the parse tree. For more details on these additional tokens see the section on parse trees {2.3.2]. 2.2.2: Construction of the State Table. The lexical analysis state table serves to show which stage of parsing a stream of characters, to detect a token, has been reached. The state table is the combination of several smaller state tables each of which parses only one token. e.g. Let us consider the finite automaton that is needed to parse the characters that constitute the else token. It needs to move towards a recognition state as it encounters the letters ‘e’, '1', “s” and ’e‘ consecutively. But if at any stage we encounter an unexpected symbol in the input then we must move into a rejection state and remain there whatever the subsequent input. The automaton we need has instructions as shown in Diagram 2. Diagram 2: Finite Automaton to Parse the Token ‘else’ . ,. § *; W ff} 5 % étv cu-9a,. as 4;”; W 1:: it‘ i «-"W ; . ‘”/y at “i k e; M fie *5 Vs‘ I _y\ 45' *4 3 vi 5 E F: gr» Q ‘'41 w t i “K 3 ,-W W‘ 9 -"‘“W W p 4,‘ __. A“; g J) 3,, 1:5 xxx ‘it 9 i ... cw eh,» is its up on“; » a ma .v‘~<»..,, {M éé ‘-\\ E 1 13 £0 3 \N ‘K ‘\ an ‘xx x R \. at . H W\\*« \ ‘Vt g flé; «N, KN K,‘ V‘ VVL . Vs,‘ «xx 3‘. aezifi. “ E ‘KIN.’ “V K’ ‘\“2< \ .».\" v(;‘\‘;~ \“ K‘ Ray XE «.. K‘ W“. ‘s. . ’fiV“*s R“-. Xx \.\ "V-l‘ ‘~\_ ~.. 2 (M. ‘g ‘. ‘xx 1 ’‘‘«*e.% g is’; r ‘K M 3 5: ' '—""3'f* "“ iii 5‘ ’t i J ‘: fa“ ‘iv ‘an fie-“-<'.+ 5» '9». ‘*1. This can also be represented in the table below. The table shows the new state entered depending on the current state and the next character read off the input store. Table 6: State Transition Table To Parse The Token ‘else’ Current Input Characters State e l s others l O ~l -2 O O O ~2 O -3 O O ~3 O O -4 0 m4 l O O O In this table. 0 is the start state, l is the rejection or recognise error token state, and 2 is the recognise ‘else’ state. Having devised mini~lexical analysers for each token. these are then combined making sure that each token recognition state is unique. In the final lexical analysis state transition table [4.3.4] negative states indicate partially parsed tokens, 0 represents the general error state and positive states represent successfully parsed tokens. The possible states range from ~6 to +32 and the possible chars range over all ascii characters. 2.3: Parsing 2.3.0: Introduction. Parsing is the last stage in the udc’s manipulation of the input file. It takes tokens one at a time out of the token storage array. performs a parse and either passes out the result of an interrogation, records an assignment in the variable table or alerts the user of an error in the input sequence. 2.3.1: What is a Parser ? Parsing is the combination of two things. The first of these is in some sense a boolean function of an input command, returning the syntactically directed translation of the input command if it is syntactically correct on the level of the BNF specification for a command [4.l.2] and above this level. checks that the execution of this command will not cause errors such as division by zero or infinite loops» The second part of parsing is the inclusion of an appropriate instruction set, for execution during the first check. erg. Consider a typical command entry ? { 23*a + b } This, to anyone familiar with the udc operator set, obviously constitutes a legal command» Thus it must be accepted as such by the parser. Apart from registering it as a legal command, the parser must also perform the following calculation; what is the value of the function defined by the expression '23*a + b‘ An expression is considered legal if a parse tree can be constructed for its 2.332: What is a Parse Tree ? A parse tree is the program’s internal representation of an expressionr An expression is represented as a n~ary graphr The value of n depends upon the application but for most it is 2. Each node on the graph represents a token, Those nodes that are leaves denote operands, those nodes that are internal represent operators. For most of the operators this binary tree representation is fine since most of the operators are binary, unary operators result in the binary tree being incomplete at that node. Ternary operators can be represented, but their representation is not elegant. Thus any token can be represented in a tree node that carries three pieces of information; the lexical value of the token held at that node, the semantic value of the token held at that node and thirdly, the token’s relationship to other tokens within the expression. This final piece of information is held in two fields of a node’s store, each of which points to sub-expressions that the token acts on. Thus each node carries four pieces of information, two of which ~ the lexical and semantic values will be the same as given in Table 4 above. These fields are labeled °lex’ and ‘sea’ respectively in the node's information store. The remaining pieces of information are denoted ’lch’ and ‘rch’ for the first and second subexpressions of the token, The particular form that these take are given in the table below where the tokens are divided into four main classes. Table 7: Token Representation in Tree Nodes; Token Type lch rch Operand ~ ~ Unary operator pointer to operand ~ Binary operator pointer to lst operand pointer to 2nd operand if~then-else _ pointer to condition pointer to IFACTIONS node These constructs are best explained by diagrams of the treesr Diagram 3a: Token Representation In Tree Nodes: Unary Operatorsr m»~o~«« yw Jgfiu» 3 5 « 3 6‘ mi 10% 2"”! i J aw} vasagk-. W5‘ 51 :"§ ‘Kr «. in to «-‘ ?‘ rik E" .....4»..., ‘sew-veg -My “N pa in of i " Rani’ 2,, i‘; E . .- Q? ‘;'®n? 1;!»-u§ (I, I: ,,., _, 9.‘?- “!—~ §».~"’ &»$..« '3‘ 32:‘, X 2; " .»"‘:;: ‘ »"""~7 1"",5"7” a""M,,_} «r",_',, 1.?“ ;?‘V",’~ er‘ * ‘a. "~’,.»°‘t:?...s r 1;. 5 . Diagram 3b: Token Representation In Tree Nodes: Binary Operators. 5 7 . 3 i ; a m g 3 aware- .% - z 5 W : s ,, a s. ;«-3. -' M‘ cs». rm: ,, we 54“ 3 ”"““ W" * ‘“-'<~w §‘it3£¢’W"z.é :‘C:.~ ‘V 1’ ?v’.7”l1}i«£:»~$~,‘3 ‘R: .z ..~ ,3... £1’ ‘’ ’”"“‘ § ’ ; v ..mw..=s,s.m.r,.s....,r.,..r...._, “gm M; ~ r" ._ ""“”“-" ~ '*'-—x::<\Asa=v S-isw-an-vs wum rs». -4 was m..z........ ,-“K X Z5 1 ‘M E: e um 3; A at-‘ 3;». r 3 ’ a}»w»=~'\ : m .., 1 . K /v . .33 ‘& 5 l"~"‘§"‘ *4. 3‘ Ear ‘ ‘,-ex‘ 4-»? J av. if 1% 3,, «#21 »»-v~»,,,=~' J w saw “$25 1 , f....’‘‘{-~- 3! {I 3“?- ‘ 2” w’‘~=»~ F 5 at iffi E E ‘ea * ”WJr”fii r ' ....—*' ' Wm. ‘ f «"-;“’ r’“‘°~ r" ' t = i tear 5' zsfg g: ‘E fig: r R: -- ., _ " ,_ _..* *3 ; M . 9'49 9"): . J’ WW"? ‘1',_,.€ ,.1.}u‘ fii,‘ 9? 5%: 5“: '_ _ I aw ii! ;f,._V _,» M K,“ ,\ j, 13‘: 5:4». §fi5f’3%'?’W is; ‘ W? %2\;a*§f;;4'i;ir "”‘ in 5”‘; =“” E“-v‘i~>t 5 g ;r{::; 5 iii’? as we ‘ 5 Diagram 3d: Token Representation In Tree Nodes: Operandsr {E 14 9: an if I g E 3,6 “P V :2. . % é ,s>‘?’V, E: g _. we-.<»...». . ’ -- 1”‘; A U6 5., Vi L!'§, V 3 .» "' r J 4 f I .. .. am.‘ <’"”‘'.’''‘“‘ 79%?’ if ,5 E gm’ fm, ..r~"":§ ,4‘ ,2‘; ’.r' ‘ ‘ ,3 e 5 ,/6 S E ff‘, E x .2 I r ' ; J) There are a few other tokens that must be catered for in an expression. These are parentheses and braces, Parentheses are not actually stored in the parsing tree but instead when they are encountered then the semantic value of the operator at the expression that the parentheses act on is used as a 1» parentheses flag; the semantic value of any operator is normally 0 when the flag is set it takes on the value l. Braces are not represented at all in the tree. A braced expression is evaluated [4s3aL2]o e.g. Consider the following expressions and their appropriate tree representations. Diagram 4a: Binary Tree Representation of —a. Diagram 4b: Binary Tree Renresentstion of ~3*b+e. W? 5 E » X f u E; \ :5 «. .£@ 33 ,5 Kl» of E f 2, fa WW" XL?» 5‘ £3’ if Diagram QC; Binary Tree Representation of if C220 then egb else E(%bI£a). ‘M W «eflyfiw Wm X“ e E f if r W § E {or tel are g 9 R f .« g ‘xx zs 2% K 5 E § 5% $3 Q 5% E 5 *2 E ,1 we %% These parse trees can be constructed by several methods. The most common of these are topdown parsing and bottom up parsing. The choice of parsing technique depends on the application or more particularly, the style of the BNF for the input. 2.3.3: Top Down Parsing Methods. The basic idea of top down parsing is to construct the parse tree starting at the root and work down attaching nodes derived from the input to the existing nodes of the tree until the input is exhausted and the parse tree complete. Symbols on the input stream are matched to the possibilities defined by the grammar. when only one rules defines the form of the input then as much as possible of the input is matched and the process continues. When more than one rule is found that matches the current part of the input then these are all taken up and all but one are subsequently discarded upon realisation of their inappropriateness. The sentence definitions involved in the top down parsing process of the udc are determined by the following rules. E represents an expression which will force a bottom up parse of subsequent input when the previous symbol has been passed. Table 8: Rules for the First Stage Parse. S 3:: Q?’ E { '2‘ E Table 9: Rules for the Top Down Parsing Procedure. S ::= E ')’ is(sEe)e ie{eEo:}a I ‘if’ E ‘then’ B ‘else’ E In both tables it can be seen that the first symbol in the input determines which rule is followed since no two rules have the same first symbols on their right hand sides. 2.3.4: Bottom Up Parsing Methods. The basic idea of bottom up parsing is the exact reverse of top down parsing. it consists of constructing the parse tree starting at the leaves and working up combining roots of subtrees to make larger trees until there is only one tree - the completed parse tree. A particular type of bottom up parsing is that designed for operator precedence grammars. In this an expression is considered as a hierarchy of subexpressions. The hierarchical position of each subexpression is determined by the precedence of the operator that binds the subexpressions at that level. This method is akin to saturating an expression with brackets until the order of evaluation is determined by the order of the brackets, operators and operands. rather than their relative precedences. though the precedences are used as rules by which the bracketing is determined. The rules for an operator precedence grammar are much simpler than any other type of parser since only two operations are possible. If the symbol on the input is an operand ( possibly derived from a top down parse [4.3.l6] ) then it placed on an expression stack, if the symbol is an operator then it is placed on the operator stack having first successively combined expressions on the expression stack with operators from the top of the operator stack if they are of higher precedence than the operator on the input. A bottom up parse is completed by the reduction of the expression stack successively using the top operator on the operator stack until there are no operators on the operator stack and only one expression on the expression stack ~ this being the completed parsed expression. 2.3.5; Hybridisation. Because of the inclusion in the udc command set of some operators and commands that are best parsed by top down methods and some that are best parsed by top down methods, this implementation, instead of bending the rules of top down parsing to allow for those features which are parsed more easily by bottom up methods or bending the rules of bottom up parsing to allow for those features which are parsed more easily by top down methods, uses both methods, flipping between the two depending on what the characteristics of the next token on input is. The parser classifies all tokens into classes given in the following table. Depending on which class an encountered token falls into, the program decides either to maintain the current parsing technique or switch to the other technique, calling it as a subroutine. The default parsing technique, that is the one that any expression is first parsed by once the nature of the command is determined, is the bottom up parser. Table l0: Parser Switching Token Types, Token class Elements Action TD Starters VEC, LPAREN, LBR, IF tart top down child parsing process. BU Enders RPAREN, EBB, THEE, Return to parent parsing process. EDC, ELSE BU Middlers NOT, TTPO, MOD, MULT, Stay in bottom up parser ~ no switch, DIV, PLUS, LT, LE, EQ, GT, MAX, MIN, GE, AND, OR, UNMINUS, MINUS, VAR, NUM, UNDEF Inappropriate ASSIG, QM, ERR Call error procedure. e.g. Consider the following expression and how it would be parsed. ?{ if a>O then 12 * (b+c) else A(O) } The parsing sequence for this command would involve the nesting of several parsing processes using both parsing procedures. The sequence in which these procedures would be called and returned from can be represented by the diagram below, Diagram 5: Parsing Calls For ‘?{ if a>O then 12 * (b+c) else A(O) }’. 2.3.6: Parsing Instructions. Other than the tree construction there are several other operations to be done. These together with explanations of what is involved in the process are given below. a) Evaluation [4.3.l2]. This is the reduction of a section of the parse tree to a single cell containing either the lexical value UNDEF for the undefined value or the lexical value NUM and the semantic value defined by the number. The way that an expression or subexpression is evaluated from the parse tree is by recursively evaluating the subtrees of this subtree. If a subtree is undefined then the parent is also undefined, and if both subtrees are defined numerically then the parent is defined as the combination of these numbers using the operator at the parent node. If a particular node is a variable then it is replaced by the expression tree that the variable represents as defined in the variable table. When the parse has finished and the root to an expression subtree is returned to the parse prooedure [4.3.l9] then either an interrogation or an assignment is made. b) Interrogation or Tree Traversal [4.3.lO}. This is the printing of the expression represented by a tree. There are three ways in which a tree could have been printed ~ postfix. infix, prefix. Each of these involves printing the symbol at a node and recursively doing the same on the left and right subtree should they exist. The difference between these three printing formats comes about from the order in which the tree is traversed. The orders that these actions are carried out in is given for each printing format in the table below. Since infix is the format in which the expression was entered. this was been chosen as the output format. Table ll: Different Printing Procedure Procedure Calls. Format Order of procedure calls Prefix: Prefix(left—subtree);Prefix(right~subtree);PrintNode Infix: Infix(left—subtree);PrintNode;Infix(right-subtree) Postfix: PrintNode;Postfix(left~subtree);Postfix(right~snbtree) e.g. Consider the expression '(x+y) ~ (x/\y)“2’ printed in these three formats. Diagram 6: Parse Tree for ’(x+y) ~ (x/\y)“2’. i H '- C 3/ 3 E rfiiifxis. x. re 75 9 2. yr E‘ - "‘ "as... «W *” ta. iii? 5 ‘L’ h ) ~04 A49. ~, _ , 3 u 9 , E y _ , Q A._w_‘_WW,:,A%i,_{,W_mAr ‘(ff E Mi ‘1§§b'I<~h$M?f4L~.xr*$:z¢r2rr2~¢~WX'4w2’.st32-mn:.'sWM _ ._..,,,,_W?_ys_‘ .. » r 3 3 jg" ,1; ,6 ; X, ,. ? ’ ’ & 1 K. .>« § ; 3%, N‘; g [3 ,5 I J! E E «)5 1 yes, E 1”’ ff’ E ,..% we E g : E ,,.g a (":3 ,/it E J i: E '2 ' " : 5 E ' ‘Aug: ~ ' A ‘,4’ -1 " f . ’ E ‘ii; "~g""/‘wt % 46 15 " £5 E “H F fig, E E J i_,»{ % , J #2:‘? E éi K j k g EX? g,__ \.,§ E 3% Ev § § Q B 4 , i. x‘ X: y, R g s ‘ : . ~ 2.’ . . ,.«ar~!"“f.;-M; ‘X r *1’ .a s * r 9” "37; E Q 7* ; as 3: § ‘Kw’ ff iv‘ 2? ga i £3 ff‘ % f 1: “ . -M. .% N. r _. mg five? fa‘ E ,$,,\7 gr éfé’ ( ‘Va . g .55 . at _ ‘Mt if. g ” e, ' e Prefix output: ~ + x y A /\ x y 2 Infix output: ( x + Postfix output: x y c) Assignment. When an assignment is madewthen a pointer to the derived expression tree is inserted at the appropriate place in the variable table. 2,4: Error Handling, There are three main classes of errors, these are listed below. a) Syntactic errors: these are caused by unknown or unexpected symbols, or commands with missing or surplus symbols. This error has no specific checking routine, though it uses Passsym [4,3,lS] and GetToken [4,3.l3] to cope with missing expected symbols and unknown symbols respectively. In the event of an error of this type the error [4,3,9] procedure is called, b) Illegal assignments; caused by circular definitions ~ that is an attempt to assign to a variable an expression dependent, directly or indirectly, on that same variable, The function SDV {4,3.ll] ( Self-Dependent Variable ) returns the boolean value true if a variable in assignment is self~dependent either because of the assigned expression or the expression assigned to some variable in the expression, The error [4,3,9] handling procedure is called, c) Division by zero; caused by the second operand to the division or modulus operator being zero, There are checks in the expression evaluation procedure Eval [4,3,l2] concerning the value of the second operand to these operators before any attempt to combine the two operands is made, Again the error handling procedure is called, The error handling procedure after outputting an error message, uses the ’goto‘ command to drop control from the current position, guickly pass out of all procedures that lead from the first call to the current level, and exit from the parser. Although this method means a loss in the modularity of the program, it does what is required quickly and effectively, 2,5: lnitialisation, When first started up, the udc prompts the user to start entering commands on the assumption that the computer will have executed all the initialisation procedures and be ready to deal with the first command well before the user has finished typing the first command, The initialisation process includes the setting up of the following tables, a) The lexical analysis state transition table, This is a table of the new states that the lexical analyser enters based on two pieces of information ~ the previous state and the current character on the input, b) The lexical token code table, Essentially this is a table of the final states of the lexical analyser and the associated lexical token, It facilitates the storage of tokens in a non~numeric coding, making the program much easier to understand, c) The operator precedence table. This is a table in which all of the unary and binary operators ~ that is the stack based operators are assigned a number, positioning their precedences relative to each other, d) The variable table, This is the table in which the values of each of the 26 simple variables and 26 vector variables with all their indexes are stored, All variables are initially set to the undefined value,