Language Parsing Starter Pack

Micaela Estabillo

Purpose

This web-based tutorial includes activities that guide the learner through different stages involved in generating a parser. No previous knowledge of programming or CS theory is required.

Objective

At the end of this tutorial, the learner will have a basic understanding of how computers understand words. The final product will be a grammar definition, which will be supplied to a parser generator. The generated parser will be used to parse some basic strings of text in order to test whether the grammar works.

Requirements

This page is best viewed in a maximized browser window. An active internet connection is also needed to open external links.

cat

1. What is a parser?

A parser translates code so that it can be run by a computer. The parser interprets code and detects when language syntax rules are violated. A more detailed description of a parser can be found on Wikipedia.

Real-life example

Parsing occurs when we communicate. The most obvious sign of this is when we recognize grammatical errors according to rules of a dialect. For example, an English sentence needs the following to be considered complete:

  • singular or plural noun
  • a verb 'is' or 'are'
  • an adjective
  • a punctuation mark at the end

Choose and click on the sentences which are grammatically correct.

Apples are red.

Banana are yellow

Leaves are growing

Trees is tall.

Since the verb 'is' may only be used with singular nouns, and 'are' with plural nouns, grammatical errors occur when they are incorrectly used. Moreover, missing the ending punctuation is also an error!

2. How does a parser find syntax errors?

Similar to how the English language has spelling and grammar rules, parsers make sense of a program by knowing the language, its alphabet, and its grammar. Context-free grammars are built by defining different elements then combining them through substitution. The TutorialsPoint website gives an overview of how context-free grammars are generated and parsed.

The English language

The previous section's sample sentences can be summarized by this context-free grammar.

Let a CFG {N,T,P,S} be
N = {S}
T = {pluralnoun, singularnoun, pluralverb, singularverb, adjective}
Starting symbol = S
P = S → pluralnounAadjective | singularnounBadjective | ε
A = pluralverb | ε
B = singularverb | ε

This generates and accepts productions (sentences) that are grammatically correct, such as:

  • pluralnoun + pluralverb + adjective
  • singularnoun + singularverb + adjective

3. Finding and defining language patterns

Most humans can instantly classify words as either singular or plural. On the other hand, computers can only recognize elements of a language by looking at each character and deciding what it represents. For example, 3 digits in a row corresponds to a number and 1 or more characters together makes up a word. Regular expressions are sequences of characters that comprise a pattern. RegexOne walks you through the process of defining different patterns, in terms of RegEx.

Calculator exercise

For the rest of this tutorial, you will be creating a parser for inputs to the calculator on the right. Take note of the characters that get displayed on the screen:

  • Numbers 0 1 2 3 4 5 6 7 8 9
  • Operators + - * /

With your knowledge of RegEx, try to think of how you would represent numbers and operators.

Google Calculator

4. Express grammar in BNF form

After figuring out the grammar of the language, it needs to be written in a format that computers can read. The Backus-Naur notation is most commonly used to express the structure of context-free grammars. Examples of how to generate BNF grammars from a language’s accepted words are in this webpage.

Calculator BNF Challenge

To define a grammar, you need a RegEx AND the name of the pattern found. For example, if a number is found, return 'NUMBER' to identify it.

The text editor on the right has 2 columns for the regex and the type of Calculator's inputs (given). Using the first row as an example, fill in RegEx patterns for these in the first column:

  • Numbers 0 1 2 3 4 5 6 7 8 9
  • Operators + - * /

5. Precedence and operations

After classifying parts of the input, parsers need to decide which operators to execute first. You may remember the BEDMAS sequence from elementary school, where arithmetic operations are evaluated starting from brackets, exponents, division, multiplication, addition then subtraction. This video goes over the basic principles of operator precedence parsing.

Operator Precedence in BNF explained

The "left" keyword tells the parser to evaluate operations to the left of the operands first when they are found. + - and * / precedences are declared separately in that specific order to tell the parser to keep searching for div/mult operators to the left of plus/minus, and evaluate those first. Try to read through the BNF code for this and connect it to the explanation! What would happen if the order of declaration is reversed?

Calculator Expressions

Finally you need to tell the parser what to calculate when specific operators are found. The evaluation for + and NUMBER are given. Using that as a pattern, fill in how the parser should evaluate the - * / operations.

6. Generate a parser using Jison

The Jison webtool reads a BNF grammar, generates a parser, and lets users write sample programs written in the language.

Running and testing the calculator

The 2 previous activities focused on defining a calculator's acceptable expressions in BNF form.

To test if your grammar actually works, follow these steps:

  1. Click the text box on the right and copy its contents. The text contains all the code you have written during this tutorial
  2. Go to the Jison tool (link above), replace their code with your own grammar code, generate a parser
  3. Start parsing expressions! Try 1 + 2 and 1 * 2 + 3 for starters.

7. Debugging and fixing errors

It is a well-known fact that programming is about 30% writing and 70% debugging. Sometimes parsers don’t work when the grammar is not well-defined. This visual grammar debugger made by the creators of Jison can be used to verify that your BNF grammar properly describes your language, and if it has syntax errors.

If you're not sure where to start looking for errors, ask yourself these questions :

If your generated parser is not working as expected, your final challenge is to scroll through the pieces of BNF code you have written and find your bugs. Good luck! :)

typing cat

Congrats on making it this far!

I hope you learned something new today! Feel free to scroll back to any section of this activity to review what you did, or view the calculator parser solution.

Here are some useful links:

Calculator Grammar Parser Solution

You may copy and paste this into the Jison parser generator.