How is it going, guys? Hope everyone’s been reading their articles lately because this one is going to be an important one. Today we are going to learn all about Golang ASTs, how they are implemented in Go, and how we can extract information from it. So let’s get started. I’ll be using the Atom Editor today, so you can do the same.
What are ASTs?
According to Wikipedia,
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code.
Perfect. So they are like binary trees, with child nodes branching out from the root node, and each node has ONLY two children. So our programming code, be it Go, C, Java, Python or any other language(Erlang, anyone?), it can be structurally divided and represented by a sequence of tokens. For example when we write
var a := 3;
it contains 5 tokens : var , a , : = , 3 and ;
These are the smallest building blocks of any programming language. There are five types of tokens.
- Keyword
- Identifier
- Operator
- Separator
- Literal
A binary expression tree is also quite important to the concept of ast. For example, if you have an expression:
x + ( 3 * y) = 10
Then a binaryExpr tree will stem out like this:
[=]
[+] 10
[*] x
y 3
Similarly, the tree spans out from the root, which is always the name of the package. Abstract syntax trees are data structures widely used in compilers to represent the structure of program code, and supports deeper program analysis.
Golang AST
Package ast declares the types used to represent syntax trees for Go packages. It is imported (along with other important packages) as:
import (
"fmt"
"go/ast"
"go/parser"
"go/token"
"log"
"os"
)
Then we start the main function, and within it the first thing we need to define is the token file set:
fs := token.NewFileSet()
Now, we create an instance of parser.parseFile() using fs as its fileset, and handle the errors(as we should):
f, err := parser.ParseFile(fs, "", "package main\nvar a int = 3", parser.AllErrors)
if err != nil {
log.Fatal(err)
}
ParseFile has 4 arguments:
- fileset, which we gave fs
- the filename, which is empty since we are typing it directly
- src interface{} – this is where we write a small .go program separated by \n
- mode – here we use parseAllErrors… but you can also choose comments or specific things. It can also be ignored with 0
Now that creates the tree, so we can use Printf to print the tree:
fmt.Printf("%#v",f)
which gives us: &ast.File{Doc:(*ast.CommentGroup)(nil), Package:1, Name:(*ast.Ident)(0xc00011e000), Decls:[]ast.Decl{(*ast.GenDecl)(0xc000108080)}, Scope:(*ast.Scope)(0xc00010a0a0), Imports:[]*ast.ImportSpec(nil), Unresolved:[]*ast.Ident{(*ast.Ident)(0xc00011e040)}, Comments:[]*ast.CommentGroup(nil)}
This is not very readable, so we can use a third party repo from github called spew. First install it using go get -u github.com/davecgh/go-spew/spew in terminal. Then:
import "github.com/davecgh/go-spew/spew"
func main(){
...
spew.Dump(f) //replace the Fprintf
}
and we get a nice structure:
(*ast.File)(0xc0000e0000)({
Doc: (*ast.CommentGroup)(<nil>),
Package: (token.Pos) 1,
Name: (*ast.Ident)(0xc0000ac120)(main),
Decls: ([]ast.Decl) (len=1 cap=1) {
(*ast.GenDecl)(0xc0000ba080)({
Doc: (*ast.CommentGroup)(<nil>),
TokPos: (token.Pos) 14,
Tok: (token.Token) var,
Lparen: (token.Pos) 0,
Specs: ([]ast.Spec) (len=1 cap=1) {
(*ast.ValueSpec)(0xc0000bc190)({
Doc: (*ast.CommentGroup)(<nil>),
Names: ([]*ast.Ident) (len=1 cap=1) {
(*ast.Ident)(0xc0000ac140)(x)
},
Type: (*ast.Ident)(0xc0000ac160)(int),
Values: ([]ast.Expr) (len=1 cap=1) {
(*ast.BasicLit)(0xc0000ac180)({
ValuePos: (token.Pos) 26,
Kind: (token.Token) INT,
Value: (string) (len=1) "9"
})
},
Comment: (*ast.CommentGroup)(<nil>)
})
},
Rparen: (token.Pos) 0
})
},
Scope: (*ast.Scope)(0xc0000962e0)(scope 0xc0000962e0 {
var x
}
),
Imports: ([]*ast.ImportSpec) <nil>,
Unresolved: ([]*ast.Ident) (len=1 cap=1) {
(*ast.Ident)(0xc0000ac160)(int)
},
Comments: ([]*ast.CommentGroup) <nil>
})
Now, we can make this better. So we must define a visitor that can traverse the tree nodes and extract information. The visitor is an interface in ast.Walk with a single method Visit. So we need to follow the documentation.
func main() {
fs := token.NewFileSet()
f, err := parser.ParseFile(fs, "", "package main\nvar x int = 9\nfunc main(){b:=2}", parser.AllErrors)
if err != nil {
log.Fatal(err)
}
var v visitor
ast.Walk(v, f)
//fmt.Printf("%#v",f)
}
Now that we added a visitor and called the ast.Walk, our main function is done. So now we define visitor outside:
type visitor int
func (v visitor) Visit(n ast.Node) ast.Visitor {
if n == nil {
return nil
}
fmt.Printf("%s%T\n", strings.Repeat("\t", int(v)), n)
fmt.Printf("%s", v)
return v + 1
}
What this will do is extract the node names, and strings.Repeat will add tabs in front of the lines. Then we have a beautiful output like this: