Operator precedence parsing is based on bottom-up shift-reduce parsing. As an expression is parsed tokens are shifted to a stack. At the appropriate time the stack is reduced by applying the operator to the top of the stack. This is best illustrated by example.
We define two stacks: opr
, for operators, and val
, for values. A ' $
' designates the end of input or end of stack. Initially the stacks are empty, and the input contains an expression to parse. Each value, as it is encountered, is shifted to theval
stack. When the first operator is encountered (step 2), we shift it to the opr
stack. The second value is shifted to theval
stack in step 3. In step 4, we have a ' *
' in opr
, and a ' +
' input symbol. Since we want to multiply before we add (giving multiplication precedence), we'll reduce the contents of the val
, applying the ' *
' operator to the top of the val
. After reducing, the ' +
' operator will be shifted to opr
in step 5.
This process continues until the input and opr
are empty. If all went well, we should be left with the answer in the val
. The following table summarizes the action taken as a function of input and the top of opr
:
When the input token is '+
', and '+
' is on opr
, we reduce before shifting the new '+
' to opr
. This causes the left-most operator to execute first, and implies that addition is left-associative. If we were to shift instead, then addition would be right-associative. When the input token is '*
', and '+
' is in opr
, we shift '*
' to opr
. Later, when reducing, '*
' is popped before '+
', giving precedence to multiplication. By appropriately specifying shift and reduce actions, we can control the associativity and precedence of operators in an expression. When we encounter an operator in the input stream, and an operator is already on the stack, take the following action:
- If the operators are different, shift to give higher precedence to the input operator.
- If the operators are the same, shift for right associativity.
Let's extend the previous example to include additional operators.
The above table incorporates the following operators:
- "
M
", for unary minus.
- "
^
", for exponentiation. 5 ^ 2
yields 25
.
- "
f
", for factorial. f(x)
returns the factorial of x
.
- "
p
", for permutations. p(n,r)
returns the number of permutations for n
objects taken r
at a time.
- "
c
", for combinations. c(n,r)
returns the number of combinations for n
objects taken r
at a time.
The following operator precedence is implied, starting at the highest precedence:
- unary minus
- exponentiation, right-associative
- multiplication/division, left-associative
- addition/subtraction, left-associative
The following errors are detected:
- error
e1
: missing right parenthesis
- error
e2
: missing operator
- error
e3
: unbalanced right parenthesis
- error
e4
: invalid function argument
Parentheses are often used to override precedence. This is easily accomodated in the operator precedence table using the following algorithm:
input action
'(' shift
')' while opr[tos] != '('
reduce
shift ')'
reduce '()'
On encountering a left parenthesis, we shift it to the opr
stack. When we input a right parenthesis, the stack is popped until the matching left parenthesis is found. Then we shift the right parenthesis, and reduce by popping both parentheses off the stack.
For function calls, the function reference is shifted to the stack. Each comma-separated argument is also shifted to the stack. On encountering a comma, the operator stack is reduced until the opening parenthesis of the function call is visible, leaving the function parameter on the value stack. Then the comma is shifted to the stack, and popped on the next reduction. When the closing parenthesis of the function call is encountered, it will be shifted to the stack and reduced. This will expose the function call for subsequent reduction.
Operator classes may be used to minimize the size of the precedence table. For example, a single generic token representing a function call may be pushed on the operator stack. An ordinal, defining which function is referenced, is pushed on the value stack. For this purpose, you may wish to implement each element on the value stack as a union to allow for different datatypes.
Error-checking is not bullet-proof. For example, omitting commas between function arguments will be accepted, and work properly. Even more bizarre, reverse-polish notation, where operators follow operands, is also acceptable. This phenomenon occurs because operators are applied to the top elements of the value stack, and the order that operators and values are pushed is not significant. Major errors, such as omitting an operator, will be found by the final reduction, as the stacks will not be empty. More rigorous error-checking may be done by defining a boolean follow matrix. The current and previous tokens are used as indices into the matrix to determine if one can follow the other.
The table is implemented in the next section, using ANSI-C. Note that there are some differences between my solution and Aho's:
- Aho specifies three precedence relations, whereas I've only specified two (shift and reduce). You can use three if you wish; I'm the last one to stifle creativity!
- Aho uses one stack for both values and operators. I've used separate stacks, as the implementation is typically easier.
// ==========================================================================================
// The following is a precedence based expression parser. It does not include variables,
// but it is true OOP. It may be broken into individual files by removing the appropriate
// comments and breaking where indicated.
//
// This code is NOT guaranteed to be free of defects, and no warranty of any kind is
// given with this code.
//
// Original code developed by David O'Neil, 1/2/00. This code may be freely modified
// and redistributed as you see fit. I do request that if you make any improvements
// to this code you send me a copy. I may be reached at david@daodservices.com
// Thanks, and enjoy!
//
// (Developed from Thomas Niemann's table based precedence parsing example available
// at epaperpress.com. The table has been eliminated, and a precedence enumeration
// technique has been implemented in it's place.)
// ==========================================================================================
//eliminate these #include and using statements if breaking this file apart
#include
#include
using std::vector;
using std::cin;
using std::cout;
/* ==========================================================================================
// The original Stack.h file follows
#ifndef Stack_h
#define Stack_h
#include
using std::vector; */
// Class to manipulate a float vector
class Stack {
private:
vector stack;
public:
void addToStack(float);
void add();
void subtract();
void multiply();
void divide();
void power();
float getAnswer();
};
//#endif //Stack.h - uncomment if separating
/* ==========================================================================================
// The original Objects.h file follows
#ifndef Objects_h
#define Objects_h
#include "Stack.h" */
//the order of this enumeration is critical for correct operation
enum precedence_ {PrecGet, PrecEnd, PrecParen, PrecAddSub, PrecMulDiv, PrecPow,
PrecVal, NoPrec };
// ==========================================================================================
// Base class for various objects (ie - Numbers, Delimiters ('+', '-', '*'...))
// ==========================================================================================
class Object {
private:
Stack *numStack;
protected:
Object(precedence_);
public:
//will make precedence private in future, but for simple test leave public
precedence_ precedence;
virtual void execute();
Object();
};
// ==========================================================================================
// Classes derived from the base Object class
// ==========================================================================================
// Get class derived from base class. This object is placed on the bottom of the stack
// so that the precedence checking algorithm will fetch the next object for the stack
// when the stack is reduced to only having this on it
class Get: public Object {
public:
Get();
};
// Number class derived from Base class
class Number : public Object {
private:
void execute();
float number;
Stack *numStack;
public:
Number(float, Stack*);
float getNumber();
};
// Plus derived from Base (handles the '+' sign)
class Plus : public Object {
private:
void execute();
Stack *numStack;
public:
Plus(Stack* stack);
};
// Minus derived from Base (handles the '-' sign)
class Minus : public Object {
private:
void execute();
Stack *numStack;
public:
Minus(Stack* stack);
};
// Minus derived from Base (handles the '*' sign)
class Multiply : public Object {
private:
void execute();
Stack *numStack;
public:
Multiply(Stack* stack);
};
// Divide derived from Base (handles the '/' sign)
class Divide : public Object {
private:
void execute();
Stack *numStack;
public:
Divide(Stack* stack);
};
// Power derived from Base (handles the '^' sign)
class Power : public Object {
private:
void execute();
Stack *numStack;
public:
Power(Stack* stack);
};
// LParen derived from Base (handles '(' )
class LParen : public Object {
private:
void execute();
public:
LParen();
bool isRightParen;
};
// RParen derived from Base (handles ')' )
class RParen : public Object {
private:
void execute();
public:
RParen();
bool isRightParen;
};
// End derived from Base (handles the string EOL '\0')
class End : public Object {
public:
End();
};
//#endif //Objects_h - uncomment if breaking this file up
/* ==========================================================================================
// The original Parser.h file follows
#ifndef Parser_h
#define Parser_h
#include "Objects.h"
#include
using std::vector; */
const int MaxStr = 80;
// Class to implement the parsing of the expression
class Parser {
private:
Stack floatStack;
char *str;
bool getObject();
vector