文章

用Go编写一个解释器 Write an interpreter in go

THIS POST IS WORK-IN-PROGRESS

I am using this book “Writing An Interpreter In Go” by Thorsten Ball. You can but this book at https://interpreterbook.com/. In the meantime, I am taking this free course at edx.org.

Test Driven Development

This book advocates a TDD approach. Some tests could be tedious to write, so I copy them from the book. And I wrote tests to reproduce bugs and verify implementation details. testing package provides very limited functionality. You have to write your assertions explicitly, like

1
2
3
4
peek, _ := lex.peekChar()
if peek != "=" {
  t.Fatalf("expected=%q, got=%q", "=", peek)
}

It would be nice to have

1
expect(peek).toEqual("=");

I will find a library to do that for me.

Lexer

Lexer and parser have a very common behavior. Both of them will peek one character/token then decide which concrete lexer/tokenizer to use. Some tokens may start with the same character. For example, let and latitude both starts with character l but they are keyword and identifier respectively. So when character l is encountered a lexer would try to use a sequence of “sub-lexers” according to some predefined priority. When one “sub-lexer” fails, it tries the next one. An example is given below.

1
2
3
4
5
6
7
8
9
10
11
word := lexer.eatWord()

t, err = lexer.tryKeyword(word)
if err == nil {
    return t, err
}

t, err = lexer.tryIdentifier(word)
if err == nil {
    return t, err
}

Parser

A parser parses tokens into abstract syntax trees. Parsing is one of the most well-understood branches of computer science and really smart people have already invested a lot of time into the problems of parsing.

I will implement a “Pratt parser” or “top down operator precedence parser”

from regex to NFA

simple ones

The following content is covered in section 04-03 of that edx course

For regex literal or epsilon moves

stateDiagram
direction LR
[*]-->[*]: regex literal or epsilon

For AB

stateDiagram
direction LR
[*]-->A: epsilon
A-->B: epsilon
B-->[*]: epsilon

For A + B. In regex (and in C# spec below), this means A|B.

stateDiagram
direction LR
[*]-->A: epsilon
[*]-->B: epsilon
A-->[*]: epsilon
B-->[*]: epsilon

For A*

flowchart LR
stt[start]--epsilon-->Astt
Astt[enter A]-.->Aend[exit A]
Aend--epsilon-->fin[end]
Aend--epsilon-->stt

Let’s try implementing C# spec for real literals

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
fragment Decimal_Digit
    : '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
    ;

fragment Decorated_Decimal_Digit
    : '_'* Decimal_Digit
    ;

Real_Literal
    : Decimal_Digit Decorated_Decimal_Digit* '.'
    | Decimal_Digit Decorated_Decimal_Digit* Exponent_Part? Real_Type_Suffix?
    | '.' Decimal_Digit Decorated_Decimal_Digit* Exponent_Part? Real_Type_Suffix?
    | Decimal_Digit Decorated_Decimal_Digit* Exponent_Part Real_Type_Suffix?
    | Decimal_Digit Decorated_Decimal_Digit* Real_Type_Suffix
    ;

fragment Exponent_Part
    : ('e' | 'E') Sign? Decimal_Digit Decorated_Decimal_Digit*
    ;

fragment Sign
    : '+' | '-'
    ;

fragment Real_Type_Suffix
    : 'F' | 'f' | 'D' | 'd' | 'M' | 'm'
    ;

error handling

Error occurs when current token is different from what the parser expects. For now, this parser simply adds descriptions of such mismatch to an array.

parse statements

let statement

  1. ensure there is a LET token
  2. try parse an identifier
  3. ensure there is an equal sign
  4. try parse an expression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
func (parser *Parser) tryLetStatement() (ast.LetStatement, error) {
	var err error
	stmt := ast.LetStatement{}

	stmt.Token = parser.currentToken
	parser.eatToken()

	stmt.Name, err = parser.tryIdentExpr()
	if err != nil {
		return stmt, err
	}
	parser.eatToken()

	err = parser.tryAssignOp()
	if err != nil {
		return stmt, err
	}
	parser.eatToken()

	// parse an expression, not implemented yet.

	return stmt, nil
}

Steps above closely resemble a let statement literal.

return statement

  1. ensure theres is a RETURN token
  2. try parse an expression
1
2
3
4
5
6
7
8
9
10
func (parser *Parser) tryReturnStatement() (ast.ReturnStatement, error) {
	stmt := ast.ReturnStatement{}

	stmt.Token = parser.currentToken
	parser.eatToken()

	// parse an expression, not implemented yet.

	return stmt, nil
}

parse expressions

identifier expression

To parse this expression, simply eat one Token. Its value is that token’s literal.

integer literal expression

To parse this expression, simply eat one Token. Then call strconv.Atoi to get its numerical value.

1
2
3
4
type IntegerLiteral struct {
	Token token.Token
	Value int
}

prefix operator expression

prefix_expr := prefix_op expr

本文由作者按照 CC BY 4.0 进行授权