用Go编写一个解释器 Write an interpreter in go
THIS POST IS WORK-IN-PROGRESS
I am using this book “Writing An Interpreter In Go” by Thorsten Ball. You can but this book at https://interpreterbook.com/. In the meantime, I am taking this free course at edx.org.
Test Driven Development
This book advocates a TDD approach. Some tests could be tedious to write, so I copy them from the book. And I wrote tests to reproduce bugs and verify implementation details. testing
package provides very limited functionality. You have to write your assertions explicitly, like
1
2
3
4
peek, _ := lex.peekChar()
if peek != "=" {
t.Fatalf("expected=%q, got=%q", "=", peek)
}
It would be nice to have
1
expect(peek).toEqual("=");
I will find a library to do that for me.
Lexer
Lexer and parser have a very common behavior. Both of them will peek one character/token then decide which concrete lexer/tokenizer to use. Some tokens may start with the same character. For example, let
and latitude
both starts with character l
but they are keyword and identifier respectively. So when character l
is encountered a lexer would try to use a sequence of “sub-lexers” according to some predefined priority. When one “sub-lexer” fails, it tries the next one. An example is given below.
1
2
3
4
5
6
7
8
9
10
11
word := lexer.eatWord()
t, err = lexer.tryKeyword(word)
if err == nil {
return t, err
}
t, err = lexer.tryIdentifier(word)
if err == nil {
return t, err
}
Parser
A parser parses tokens into abstract syntax trees. Parsing is one of the most well-understood branches of computer science and really smart people have already invested a lot of time into the problems of parsing.
I will implement a “Pratt parser” or “top down operator precedence parser”
from regex to NFA
simple ones
The following content is covered in section 04-03 of that edx course
For regex literal or epsilon moves
stateDiagram
direction LR
[*]-->[*]: regex literal or epsilon
For AB
stateDiagram
direction LR
[*]-->A: epsilon
A-->B: epsilon
B-->[*]: epsilon
For A + B
. In regex (and in C# spec below), this means A|B
.
stateDiagram
direction LR
[*]-->A: epsilon
[*]-->B: epsilon
A-->[*]: epsilon
B-->[*]: epsilon
For A*
flowchart LR
stt[start]--epsilon-->Astt
Astt[enter A]-.->Aend[exit A]
Aend--epsilon-->fin[end]
Aend--epsilon-->stt
Let’s try implementing C# spec for real literals
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
fragment Decimal_Digit
: '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9'
;
fragment Decorated_Decimal_Digit
: '_'* Decimal_Digit
;
Real_Literal
: Decimal_Digit Decorated_Decimal_Digit* '.'
| Decimal_Digit Decorated_Decimal_Digit* Exponent_Part? Real_Type_Suffix?
| '.' Decimal_Digit Decorated_Decimal_Digit* Exponent_Part? Real_Type_Suffix?
| Decimal_Digit Decorated_Decimal_Digit* Exponent_Part Real_Type_Suffix?
| Decimal_Digit Decorated_Decimal_Digit* Real_Type_Suffix
;
fragment Exponent_Part
: ('e' | 'E') Sign? Decimal_Digit Decorated_Decimal_Digit*
;
fragment Sign
: '+' | '-'
;
fragment Real_Type_Suffix
: 'F' | 'f' | 'D' | 'd' | 'M' | 'm'
;
error handling
Error occurs when current token is different from what the parser expects. For now, this parser simply adds descriptions of such mismatch to an array.
parse statements
let statement
- ensure there is a LET token
- try parse an identifier
- ensure there is an equal sign
- try parse an expression
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
func (parser *Parser) tryLetStatement() (ast.LetStatement, error) {
var err error
stmt := ast.LetStatement{}
stmt.Token = parser.currentToken
parser.eatToken()
stmt.Name, err = parser.tryIdentExpr()
if err != nil {
return stmt, err
}
parser.eatToken()
err = parser.tryAssignOp()
if err != nil {
return stmt, err
}
parser.eatToken()
// parse an expression, not implemented yet.
return stmt, nil
}
Steps above closely resemble a let statement literal.
return statement
- ensure theres is a RETURN token
- try parse an expression
1
2
3
4
5
6
7
8
9
10
func (parser *Parser) tryReturnStatement() (ast.ReturnStatement, error) {
stmt := ast.ReturnStatement{}
stmt.Token = parser.currentToken
parser.eatToken()
// parse an expression, not implemented yet.
return stmt, nil
}
parse expressions
identifier expression
To parse this expression, simply eat one Token. Its value is that token’s literal.
integer literal expression
To parse this expression, simply eat one Token. Then call strconv.Atoi
to get its numerical value.
1
2
3
4
type IntegerLiteral struct {
Token token.Token
Value int
}
prefix operator expression
prefix_expr := prefix_op expr