Photo by Emmanuel Edward on Unsplash
Building an Interpreter in Go - Chapter 1: Understanding the Basics
Overview
Starting the journey of building an interpreter is like learning to be a translator between two worlds - the world of human-readable code and the world of machine instructions. This first chapter lays the foundation by explaining what an interpreter is, how it differs from a compiler, and the basic components we'll be building.
Key Concepts Learned
What is an Interpreter?
An interpreter is a program that directly executes instructions written in a programming language without requiring them to be compiled into machine language first. Think of it as a real-time translator who reads your code line by line and performs the actions immediately.
Components of an Interpreter
Lexer (Tokenizer)
Breaks down source code into tokens
Example:
let x = 5;
becomes[LET, IDENTIFIER("x"), EQUALS, NUMBER(5), SEMICOLON]
Acts as the "eyes" of our interpreter
Parser
Transforms tokens into an Abstract Syntax Tree (AST)
Understands the grammar of our language
Catches syntax errors
Abstract Syntax Tree (AST)
Tree representation of the code's structure
Each node represents an operation or value
Example:
CopyAssignment ├── Identifier(x) └── Value(5)
Code Implementation Highlights
1. Token Structure in Go
package token
type TokenType string
type Token struct {
Type TokenType
Literal string
}
var keywords = map[string]TokenType {
"fn" : FUNCTION,
"let" : LET,
"int" : INT,
}
func LookupIdent(ident string) TokenType {
if tok, ok := keywords[ident]; ok {
return tok
}
return IDENT
}
const (
ILLEGAL = "ILLEGAL"
EOF = "EOF"
IDENT = "IDENT"
INT = "INT"
ASSIGN = "="
PLUS = "+"
COMMA = ","
SEMICOLON = ";"
LPAREN = "("
RPAREN = ")"
LBRACE = "{"
RBRACE = "}"
FUNCTION = "fn"
LET = "let"
)
2. First Test Case
package lexer
import (
"testing"
"interpreter/src/monkey/token"
)
func TestNextToken(t *testing.T) {
input := `
let five = 5;
let ten = 10;
let add = fn(x, y) {
x + y;
};
let result = add(five, ten);
`
tests := []struct {
expectedType token.TokenType
expectedLiteral string
}{
{token.LET, "let"},
{token.IDENT, "five"},
{token.ASSIGN, "="},
{token.INT, "5"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "ten"},
{token.ASSIGN, "="},
{token.INT, "10"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "add"},
{token.ASSIGN, "="},
{token.FUNCTION, "fn"},
{token.LPAREN, "("},
{token.IDENT, "x"},
{token.COMMA, ","},
{token.IDENT, "y"},
{token.RPAREN, ")"},
{token.LBRACE, "{"},
{token.IDENT, "x"},
{token.PLUS, "+"},
{token.IDENT, "y"},
{token.SEMICOLON, ";"},
{token.RBRACE, "}"},
{token.SEMICOLON, ";"},
{token.LET, "let"},
{token.IDENT, "result"},
{token.ASSIGN, "="},
{token.IDENT, "add"},
{token.LPAREN, "("},
{token.IDENT, "five"},
{token.COMMA, ","},
{token.IDENT, "ten"},
{token.RPAREN, ")"},
{token.SEMICOLON, ";"},
{token.EOF, ""},
}
l := New(input)
for i, tt := range tests {
tok := l.NextToken()
if tok.Type != tt.expectedType {
t.Fatalf("tests[%d] - tokentype wrong. expected=%q got=%q", i, tt.expectedType, tok.Type)
}
if tok.Literal != tt.expectedLiteral {
t.Fatalf("tests[%d] - tokenliteral wrong. expected=%q got=%q", i, tt.expectedLiteral, tok.Literal)
}
}
}
3. Lexer Class
package lexer
import (
"interpreter/src/monkey/token"
)
type Lexer struct {
input string
position int
readPosition int
ch byte
}
func New(input string) *Lexer{
l := &Lexer{input : input}
l.readChar()
return l
}
func (l *Lexer) readChar() {
if l.readPosition >= len(l.input) {
l.ch = 0
} else {
l.ch = l.input[l.readPosition]
}
l.position = l.readPosition
l.readPosition += 1
}
func (l *Lexer) NextToken() token.Token {
var tok token.Token
l.skipWhitespace() // Add this line
switch l.ch {
case '=':
tok = newToken(token.ASSIGN, l.ch)
case '+':
tok = newToken(token.PLUS, l.ch)
case '(':
tok = newToken(token.LPAREN, l.ch)
case ')':
tok = newToken(token.RPAREN, l.ch)
case '{':
tok = newToken(token.LBRACE, l.ch)
case '}':
tok = newToken(token.RBRACE, l.ch)
case ',':
tok = newToken(token.COMMA, l.ch)
case ';':
tok = newToken(token.SEMICOLON, l.ch)
case 0:
tok.Literal = ""
tok.Type = token.EOF
default:
if isLetter(l.ch) {
tok.Literal = l.readIdentifier()
tok.Type = token.LookupIdent(tok.Literal)
return tok
} else if isDigit(l.ch) {
tok.Type = token.INT
tok.Literal = l.readNumber()
return tok
} else {
tok = newToken(token.ILLEGAL, l.ch)
}
}
l.readChar()
return tok
}
// Add this new method
func (l *Lexer) skipWhitespace() {
for l.ch == ' ' || l.ch == '\t' || l.ch == '\n' || l.ch == '\r' {
l.readChar()
}
}
func (l *Lexer) readIdentifier() string {
position := l.position
for isLetter(l.ch){
l.readChar()
}
return l.input[position:l.position]
}
func (l *Lexer) readNumber() string {
position := l.position
for isDigit(l.ch) {
l.readChar()
}
return l.input[position:l.position]
}
func isLetter(ch byte) bool {
return 'a' <= ch && ch <= 'z' || 'A' <= ch && ch <= 'Z' || ch == '_'
}
func isDigit(ch byte) bool {
return '0' <= ch && ch <= '9'
}
func newToken(tokenType token.TokenType, ch byte) token.Token {
return token.Token {Type: tokenType, Literal: string(ch)}
}
Challenges Faced
Understanding Lexical Analysis
Challenge: Grasping how to break down source code into meaningful tokens
Solution: Created visual diagrams of the tokenization process
Learning: The importance of handling edge cases in lexical analysis
Go's Type System
Challenge: Adapting to Go's strict typing after working with dynamic languages
Solution: Studied Go's type system documentation and examples
Learning: How Go's type system helps catch errors at compile time
Go Language Features Learned
Custom type definitions using
type
Constants and iota
Struct composition
Go's testing framework
String manipulation in Go
Project Progress
[✓] Understanding Interpreters vs Compilers
[✓] Basic Lexer Implementation
[✓] Token Structure
[✓] Handles Whitespace, functions, digits, basic mathematical operations
Parser Implementation
AST Construction
Code Repository
GitHub Repository: https://github.com/AkshayContributes/interpreter
Branch:
chapter_1
Pull Request: https://github.com/AkshayContributes/interpreter/pull/1
Resources
Next Steps
Implement more token types
Handle whitespace and comments
Begin parser implementation
Study AST structures
Personal Notes
Time spent on this chapter: 6 hours Initial confidence level: Beginner Current confidence level: Intermediate
Key Insights
The lexer is simpler than I initially thought - it's just pattern matching
Test-Driven Development is incredibly helpful when building an interpreter
Go's strict typing helps catch potential issues early
Tips for Others
Draw out the token structure before coding
Write tests for edge cases first
Keep the lexer simple - resist the urge to make it too clever