ANTLR with C# – adding actions to rules

In the last post, we saw how to install ANTLR in Visual Studio and write a simple grammar. We could run the parser and get it to check some input for errors, but so far nothing happens when a correct string is input. In this post, we’ll see how to add some actions for each rule in the grammar.

Although the grammar rules are written in ANTLR’s special syntax, actions are written in the language for which ANTLR produces its code, which in this case is C#. We’ll begin with a quick review of our grammar file so far:

grammar Calculator;

options {
    language=CSharp3;
}

@lexer::namespace{AntlrTest01}
@parser::namespace{AntlrTest01}

/*
 * Parser Rules
 */

public addSubExpr
	: multDivExpr (( '+' | '-' ) multDivExpr)*;

multDivExpr
  : INT (( '*' | '/' ) INT)*;

/*
 * Lexer Rules
 */

INT : '0'..'9'+;
WS :  (' '|'\t'|'\r'|'\n')+ {Skip();} ;

The grammar accepts a single line of input consisting of the four arithmetic operations acting on non-negative integers. No variables or parentheses are allowed. As such, there are four operations for which we need to add actions. Well, actually there is one more action, since we need to get the program to convert a string representation of an integer to C#’s int data type, and we’re allowing a single integer as valid input.

Code for an action is inserted directly into the rules at the point where we want it to be run. The easiest way to see what’s going on is to have a look at the modified grammar file with actions included. Here it is:

grammar Calculator;

options {
    language=CSharp3;
}

@lexer::namespace{AntlrTest01}
@parser::namespace{AntlrTest01}

@header {
using System;
}
/*
 * Parser Rules
 */

public addSubExpr returns [int value]
	: a = multDivExpr {$value = a;}
  ( '+' b = multDivExpr {$value += b;}
  | '-' b = multDivExpr {$value -= b;})*;

multDivExpr returns [int value]
  : a = INT {$value = Int32.Parse($a.text);}
  ( '*' b = INT {$value *= Int32.Parse($b.text);}
  | '/' b = INT {$value /= Int32.Parse($b.text);})*;

/*
 * Lexer Rules
 */

INT : '0'..'9'+;
WS :  (' '|'\t'|'\r'|'\n')+ {Skip();} ;

We’re going to have the grammar return the result of the calculation so it can be used elsewhere in our master program.

First, notice the @header statement on line 10. You place any intitalization statements that the C# code will need in here. Since we’ll need C#’s System.Int32.Parse() method to convert from a string to an int, we put a ‘using System’ statement in the header. We could, of course, just write all calls to Parse() using its full path, but as with regular C# code, the using statement saves a lot of typing.

Now have a look at the parser and lexer rules themselves. The lexer rules haven’t changed at all, since all the lexer does is extract tokens from the input and feed them to the parser, and the parser is where all the work is done on interpreting the input.

Next, look at the multDivExpr rule on line 22. This rule should calculate the product or quotient of two integers and return the result. We’ve added a ‘returns [int value]‘ phrase to this rule. This is ANTLR’s syntax for defining the return value from a rule. (Actually, the latest version of ANTLR allows more than one value to be returned (it encapsulates all return values into a new class), but we won’t need that here.) Parameter lists in ANTLR are enclosed in square brackets rather than parentheses as in C#. The ‘value’ parameter is a local variable whose scope is the entire rule in which it’s defined.

On line 23, we deal with the part of the rule which matches the first integer. We define another local variable, ‘a’, to hold the contents of the token INT. Remember that at this stage, INT is a string of digits, and not an int. Thus we need to add an action that converts INT to a C# int, and that’s what the code in braces on line 23 does. The $a.text is ANTLR syntax which extracts the string from the local variable ‘a’. When you need to access an ANTLR variable, you prefix it with $.

Line 24 deals with multiplication. We save the second INT in the variable ‘b’, and return the product of the current $value with the int extracted from ‘b’. The variable ‘a’ is still in scope on line 24, so we could have written for this line:

  ( '*' b = INT {$value = Int32.Parse($a.text) * Int32.Parse($b.text);}

However, using the current value of $value is easier.

Line 25 handles division in the same way.

The addSubExpr rule on line 17 works in pretty much the same way, although there are a couple of points to note.

First, we’ve declared addSubExpr as ‘public’. By default, the code ANTLR produces for each rule is private, so you won’t be able to access it from outside the parser class. Since addSubExpr is the top level rule in our grammar, it’s the one we’ll need to call from outside to parse the input, so we need to make sure it’s public.

Also, note on line 18 that we save the result of the first multDivExpr in a local variable ‘a’, and we assign ‘a’ directly to $value without using a $ in front of the ‘a’ or accessing any of its properties. We can do this because multDivExpr was defined as returning an int, and an int is just what we need in the calculation. In more complex cases, we might use some non-primitive data type as the return type, in which case we’d need to access one of its fields.

Lines 19 and 20 use the same technique for performing the addition and subtraction.

The code that calls this modified version of the parser is

using System;
using System.IO;
using Antlr.Runtime;
using Antlr.Runtime.Misc;

namespace AntlrTest01
{
  class Program
  {
    public static void Main(string[] args)
    {
      Stream inputStream = Console.OpenStandardInput();
      ANTLRInputStream input = new ANTLRInputStream(inputStream);
      CalculatorLexer lexer = new CalculatorLexer(input);
      CommonTokenStream tokens = new CommonTokenStream(lexer);
      CalculatorParser parser = new CalculatorParser(tokens);
      int answer = parser.addSubExpr();
      Console.WriteLine("Answer = " + answer);
    }
  }
}

This is the same as in the previous post, except that this time we save the return value from the call to the parser’s addSubExpr() method and print it out as the answer.

In the next post, we’ll have a look at ANTLR’s other method defining grammars: the abstract syntax tree or AST.

About these ads
Post a comment or leave a trackback: Trackback URL.

Comments

  • Anonymous  On May 21, 2013 at 1:07 PM

    I have followed the above instructions exactly, but at the end VS 2012 reports that
    CalculatorLexer does not contain a constructor that takes 1 argument.

Trackbacks

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 42 other followers

%d bloggers like this: