Creating ANTLR Applications in TypeScript

Georg Dangl by Georg Dangl in Web Development Thursday, September 22, 2016

Thursday, September 22, 2016

Posted in TypeScript

So much of software developement revolves around reading and processing data. There are a whole lot of Domain Specific Languages (DSLs) out there, some very convenient to use with regular tools like XML and JSON, others a bit less common like ISO 10303-21 EXPRESS. You've probably encountered at least once the need to read data in whatever format's been thrown at you, and quite often Regex will do. Other times, the structure is simple and a manual parser can be generated. For everything else, there's ANTLR!

It's a great tool for generating a parsers. I'll not go into detail about what lexers, parsers or visitors are or how to write an ANTLR grammar (there are lots of good sources online and the ANTLR community is quite active). Instead, I'll document how I got ANTLR with it's JavaScript runtime so I'll be able to use it in an Angular 2 project. I've got a sample project available at GitHub so you can look at everything in detail that's not covered here.

To start, I've been using the antlr-webpack plugin, which will give you access to running the ANTLR generation via webpack. There's no need to use it, since you could just call the antlr.jar file with Java yourself and generate the JavaScript target, but webpack integrates nicely in a TypeScript project.

Here's the webpack.config.antlr.js I'm using, adjust yours as necessary:

var antlr4Plugin = require('antlr4-webpack-plugin');

// This config will only run the antlr4Plugin to create the Lexer and Parser files in the ./GeneratedAntlr directory
// Based on https://github.com/corzani/generator-antlr4
module.exports = {
    output: {
        filename: "./GeneratedAntlr/bundle.js" // Shouldn't produce anything since there's no entry defined
    },
    plugins: [
        antlr4Plugin({
            grammar: 'Calculator.g4',
            options: {
                o: 'GeneratedAntlr/',
                grammarLevel: {
                    language: 'JavaScript'
                },
                flags: [
                    'visitor',
                    'no-listener'
                ]
            }
        })
    ]
};

It's sole purpose is to generate a lexer, a parser and finally a visitor (I'm not using a listener for this example) and puts them in the GeneratedAntlr/ directory.

Implementing the JavaScript Visitor

Take that generated listener.js and move it out of the way so it doesn't get overwritten the next time the code is generated! You can then start writing your code in the visitor.

In my example, the Calculator.g4 grammar produces a top level rule called calculator. This one is passed to the visitor which then, well, visits the expression tree. This "walking the tree" basically is like a small avalanche of recursive function calls that each visit a single expression (which may again consist of sub expressions).

If you're going for the JavaScript task, you'll be considering minifying the result at some time before actual deployment, so here's how the visitor decides on what to do with mangled class names. Take this snippet from the generated parser:

CalculatorContext.prototype.accept = function(visitor) {
    if ( visitor instanceof CalculatorVisitor ) {
        return visitor.visitCalculator(this);
    }
};

This snippet shows us that when we generate a CalculatorContext by calling var context = parser.calculator() (the entry rule of the parser), we'll then be able to call var result = context.accept(new CalculatorVisitor()) and the visitors method is invoked. That's not absolutely required for the entry level, but you'll have occasions where you need to decide at runtime which method to call. In the sample project, there's a parser rule expression defined which may be anything, from an addition to a multiplication to calculating the cotangens. To decide whether to call visitAdd, visitMul or visitCot, your visitors visitExpression method just calls accept and passes itself as reference:

FormulaVisitor.prototype.visitExpression = function (context) {
        return context.accept(this);
    };

Where's the promised TypeScript?

So, yeah, if you've read so far and thought this post was about TypeScript but wondered where it is, well, stay with me!

It's just that since ANTLR4 doesn't have typings defined, there's not really a convenient way to integrate it into your TypeScript project. Although Burt Harris did some great effort (and it looks like he's just gonna migrate the whole thing himself=), there's not yet a way to natively integrate it in TypeScript. Your visitor needs to be derived from the generated visitor to have the parsers accept() method work (It's checking if visitor instanceof CalculatorVisitor ). This means that you'd have to create a TypeScript visitor that inherits from a JavaScript class, which is probably working but not very intuitive... And that's why the visitor has to stay in JavaScript land!

TypeScript, finally!

Look at the following code:

var antlr4 = require('antlr4');
var calculatorLexer = require('./GeneratedAntlr/CalculatorLexer');
var calculatorParser = require('./GeneratedAntlr/CalculatorParser');
var formulaVisitor = require('./FormulaVisitor.js');

export class Calculator {
    public static calculate(formula: string): number {
        var inputStream = new antlr4.InputStream(formula);
        var lexer = new calculatorLexer.CalculatorLexer(inputStream);
        var commonTokenStream = new antlr4.CommonTokenStream(lexer);
        var parser = new calculatorParser.CalculatorParser(commonTokenStream);
        var visitor = new formulaVisitor.FormulaVisitor();
        var parseTree = parser.calculator();
        var visitorResult = visitor.visitCalculator(parseTree);
        return visitorResult
    }
}

Ok, not the greatest example of static typing ever, but the end result is TypeScript!

It should be self explanatory if you're familiar with ANTLR. The require statements at the top are our way to import the ANTLR runtime and the generated code. Again, since there are neither typings nor an actual TypeScript library, there's not much more you can do other than fall back to JavaScript when calling the ANTLR runtime or the generated code directly.

However, you've now got a decent TypeScript class that integrates nicely within other TypeScript code.

Does.Not.Compute

If you're actually using the TypeScript class already, there's a good chance that it does not behave nicely, more particularly, it probably breaks your build. That's because the FileStream class in the ANTLR4 JavaScript runtime has an (optional) import of fs, which is used in a node.js environment to access the file system. It's not necessary on the client side, but it might break your build since it can't be resolved. That's why you tell webpack to ignore the fs dependency:

node: {
        fs: "empty"
    }

Happy ANTLRing!

 


Share this post


comments powered by Disqus

About me

Hi, my name's George! I love coding and blogging about it. I focus on all things around .Net, Web Development and DevOps.

DanglIT

Need a partner for DevOps, Web Services or Software Development?

Contact me at [email protected], +49 (173) 56 45 689 or visit my professional page!

Dangl.Blog();
// Just 💗 Coding

Social Links