Introduction
Since the first version of QSL grammar file, there have been many changes and the current grammar is completely different from the original. Unlike other projects, project related to language seems to be not easy to follow. In order to make it clear, i write an article dedicated to technique we use to solve QSL grammar.
Before we continue
It would be easier when you have experience with grammar of a language but i think regular expression is all we need to get into QSL grammar file. ANTLR library and ANTLR Works are also needed.
Sequence of processing QSL script
The lexical analyzer, or lexer, breaks up the input stream into tokens. The parser feeds off this token stream and tries to recognize the sentence structure. Next, we translate the sentences get from parser in to abstract structure tree (AST). Finally, we parse AST into data object. This article focuses on lexical level as it caused many problem need to solved (From my point of view, parser grammar of QSL is pretty clean and easy to understand)
Structure of QSL grammar file
Normally, there are individual file for lexer grammar and parser grammar but in QSL, we use a mixed grammar file. It means that all lexer rules and parser rules are defined in the same file. This is the original structure and decide to keep it since from my own experience, it is easier to work in ANTLR Works.
From the bottom of QSL lexer
The lexer has many mode and they work like switches so that we can direct the input stream to match the possible tokens. Besides, we have a flag that plays the most important role in the grammar : lineMode.
Logic of lineMode flag
This flag is true in case we scan each line of the input, searching for keyword or just a text line (question text, answer option text and sub question text). lineMode is false when the input cursor is inside of block (properties or expression).
Question text is not wrapped
Question text(include answer option text, sub question text) used to be wrapped with '##' when the the very first version of QSL came out. Later, in an update, we decided to remove '##'. This is a big change and effect the tactical of handling lexer.