编写decaf语法解析器

编写 Decaf 语法解析器

Decaf 是 SFU Compiler 课程要实现的语言,前一篇 Blog 写了如何编写 Decaf 词法分析器, 这一篇 Blog 则是写 Decaf 编译器的第二个阶段,语法解析。语法解析器的作用是解析程序语法,并生成抽象语法树 (Abstract Syntax Tree),抽象语法树(AST)是程序结构的一种高抽象层次的表达,有了它我们并再不需要源代码的存在了。它可以认为是源代码等价的一种抽象表达。语法解析依赖许多理论基础,前几篇 Blog 我简单地总结了对 LR Parser 的理解(目前,SLR(1), LR(1), LL(1) 均没有总结)。Parser 的理论知识非常抽象,我只理解一些皮毛,但是我已经理解的语法解析知识足够我利用 Yacc 编写语法解析器了。我在编写 Decaf 过程中,遇到的最大的问题有两个:

  1. Shift / Reduce Conflict
  2. 抽象语法树数据结构设计

Decaf 语言说明文件:

ASDL

Decaf 的抽象语法树定义语言用的是 Zehpyr Abstract Syntax Definition Language

抽象语法定义语言可以定义抽象语法树相关的数据结构。

Decaf 的抽象语法树定义如下(Decaf.asdl)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
-- Decaf abstract syntax tree definition

-- The specification of the AST nodes is specified using the Zephyr
-- Abstract Syntax Definition Language (ASDL) [Wang97]

-- The abstract syntax tree (AST) is a high-level representation
-- of the program structure without the necessity of containing the
-- source code; it can be thought of as an abstract representation of
-- the source code.

-- Modifiers on the argument type specify the number of values
-- needed; '?' means it is optional, '*' means 0 or more (with commas),
-- no modifier means only one value for the argument and it is required.

-- For * print a singleton for one element, or multiple
-- elements seperated by commas, or None for the zero element.

-- ASDL's four builtin types are identifier, int, string, object

module Decaf
{
prog = Program(extern* extern_list, package body)

extern = ExternFunction(identifier name, method_type return_type, extern_type* typelist)

decaf_type = IntType | BoolType

method_type = VoidType | decaf_type

extern_type = VarDef(StringType) | VarDef(decaf_type)

package = Package(identifier name, field_decl* field_list, method_decl* method_list)

field_decl = FieldDecl(identifier name, decaf_type type, field_size size)
| AssignGlobalVar(identifier name, decaf_type type, constant value)

field_size = Scalar | Array(int array_size)

method_decl = Method(identifier name, method_type return_type, typed_symbol* param_list, method_block block)

typed_symbol = VarDef(identifier name, decaf_type type)

method_block = MethodBlock(typed_symbol* var_decl_list, statement* statement_list)

block = Block(typed_symbol* var_decl_list, statement* statement_list)

statement = assign
| method_call
| IfStmt(expr condition, block if_block, block? else_block)
| WhileStmt(expr condition, block while_block)
| ForStmt(assign* pre_assign_list, expr condition, assign* loop_assign_list, block for_block)
| ReturnStmt(expr? return_value)
| BreakStmt
| ContinueStmt
| block

assign = AssignVar(identifier name, expr value)
| AssignArrayLoc(identifier name, expr index, expr value)

method_call = MethodCall(identifier name, method_arg* method_arg_list)

method_arg = StringConstant(string value)
| expr

expr = rvalue
| method_call
| constant
| BinaryExpr(binary_operator op, expr left_value, expr right_value)
| UnaryExpr(unary_operator op, expr value)

constant = NumberExpr(int value)
| BoolExpr(bool value)

rvalue = VariableExpr(identifier name)
| ArrayLocExpr(identifier name, expr index)

bool = True | False

binary_operator = Plus | Minus | Mult | Div | Leftshift | Rightshift | Mod | Lt | Gt | Leq | Geq | Eq | Neq | And | Or

unary_operator = UnaryMinus | Not
}
-- References
-- [Wang97] Daniel C. Wang, Andrew W. Appel, Jeff L. Korn, and Chris
-- S. Serra. The Zephyr Abstract Syntax Description Language. In
-- Proceedings of the Conference on Domain-Specific Languages, pp.
-- 213--227, 1997.

例如 Program 和 ExternFunction 都是 Decaf 抽象语法树中的数据结构,使用 C++ 实现定义如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// Program(extern* extern_list, package body)
class ProgramAST : public decafAST {
decafStmtList *ExternList;
PackageAST *PackageDef;
public:
ProgramAST(decafStmtList *externs, PackageAST *c) : ExternList(externs), PackageDef(c) {}
~ProgramAST() {
if (ExternList != NULL) { delete ExternList; }
if (PackageDef != NULL) { delete PackageDef; }
}
string str() { return string("Program") + "(" + getString(ExternList) + "," + getString(PackageDef) + ")"; }
};


// ExternFunction(identifier name, method_type return_type, extern_type* typelist)
class ExternFunctionAST : public decafAST {
string Name;
TypeAST * ReturnType;
VarDefAST * VarList;

public:
ExternFunctionAST(string name, TypeAST * returntype, VarDefAST * varlist): Name(name), ReturnType(returntype), VarList(varlist) {}
~ExternFunctionAST() {
if( VarList ) delete VarList;
if( ReturnType ) delete ReturnType;
}

string str() {
return string("ExternFunction") + "(" + Name + "," + getString(ReturnType) + "," + getString(VarList) + ")";
}
};

其中 string str()decafAST 基类中定义的用于 AST 序列化的虚函数。

decaf 程序如下:

1
2
3
4
5
6
extern func print_int(int,int) void;
package QuickSort {
var x int;
func main() void {
}
}

序列化之后的结果如下

1
Program(ExternFunction(print_int,VoidType,VarDef(IntType,IntType)),Package(QuickSort,FieldDecl(x,IntType,Scalar),Method(main,VoidType,None,MethodBlock(None,None))))

Decaf 语法定义

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
Program = Externs package identifier "{" FieldDecls MethodDecls "}" .
Externs = { ExternDefn } .
ExternDefn = extern func identifier "(" [ { ExternType }+, ] ")" MethodType ";" .
FieldDecls = { FieldDecl } .
FieldDecl = var { identifier }+, Type ";" .
FieldDecl = var { identifier }+, ArrayType ";" .
FieldDecl = var identifier Type "=" Constant ";" .
MethodDecls = { MethodDecl } .
MethodDecl = func identifier "(" [ { identifier Type }+, ] ")" MethodType Block .
Block = "{" VarDecls Statements "}" .
VarDecls = { VarDecl } .
VarDecl = var { identifier }+, Type ";" .
Statements = { Statement } .
Statement = Block .
Statement = Assign ";" .
Assign = Lvalue "=" Expr .
Lvalue = identifier | identifier "[" Expr "]" .
Statement = MethodCall ";" .
MethodCall = identifier "(" [ { MethodArg }+, ] ")" .
MethodArg = Expr | string_lit .
Statement = if "(" Expr ")" Block [ else Block ] .
Statement = while "(" Expr ")" Block .
Statement = for "(" { Assign }+, ";" Expr ";" { Assign }+, ")" Block .
Statement = return [ "(" [ Expr ] ")" ] ";" .
Statement = break ";" .
Statement = continue ";" .
Expr = identifier .
Expr = MethodCall .
Expr = Constant .
UnaryOperator = ( UnaryNot | UnaryMinus ) .
UnaryNot = "!" .
UnaryMinus = "-" .
BinaryOperator = ( ArithmeticOperator | BooleanOperator ) .
ArithmeticOperator = ( "+" | "-" | "*" | "/" | "<<" | ">>" | "%" ) .
BooleanOperator = ( "==" | "!=" | "<" | "<=" | ">" | ">=" | "&&" | "||" ) .
Expr = Expr BinaryOperator Expr .
Expr = UnaryOperator Expr .
Expr = "(" Expr ")" .
Expr = identifier "[" Expr "]" .
ExternType = ( string | Type ) .
Type = ( int | bool ) .
MethodType = ( void | Type ) .
BoolConstant = ( true | false ) .
ArrayType = "[" int_lit "]" Type .
Constant = ( int_lit | char_lit | BoolConstant ) .

根据 Decaf.asdl 定义实现 decafast.cc

根据 Decaf 语法定义实现 decafast.y

两个文件和起来就撸出来一个语法解析器,将 Decaf 源代码转换成语法树。

错误处理

记录下错误处理,flex 不会自动更新 yylineno 变量,解析出错后不能知道确切的行号,解决方法如下

1
2
3
4
5
6
7
8
9
10
11
12
13
%{
extern int yylineno;
%}


%option yylineno

int yyerror(const char *s) {
cerr << yylineno << ":" << s << endl;
return 1;
}


完整代码

https://github.com/P4nda0s/compilers-class-hw/tree/master/decafast/answer

lex 代码: decafast.lex

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
%{
#include "default-defs.h"
#include "decafast.tab.h"
#include <cstring>
#include <string>
#include <sstream>
#include <iostream>
using namespace std;
extern int yylineno;
extern int yytokenpos;
string & covert_newline(string & s){
string tmp = "";
for(size_t i = 0; i < s.size(); i++)
if(s[i] == '\n')
tmp += "\\n";
else
tmp += s[i];
s = tmp;
return s;
}

%}
%option yylineno
%%
/*
Pattern definitions for all tokens1
*/
\{ { return T_LCB; }
\} { return T_RCB; }
bool { yylval.sval = new string("BoolType"); return T_BOOLTYPE; }
package { return T_PACKAGE; }
func { return T_FUNC; }
return { return T_RETURN; }
while { return T_WHILE; }
void { yylval.sval = new string("VoidType");return T_VOID; }
var { return T_VAR; }
string { yylval.sval = new string("StringType");return T_STRINGTYPE; }
true { return T_TRUE; }
null { return T_NULL; }
int { yylval.sval = new string("IntType");return T_INTTYPE; }
if { return T_IF; }
extern { return T_EXTERN; }
for { return T_FOR; }
break { return T_BREAK; }
continue { return T_CONTINUE; }
else { return T_ELSE; }
false { return T_FALSE; }
[a-zA-Z\_][a-zA-Z\_0-9]* { yylval.sval = new string(yytext); return T_ID; }
, { return T_COMMA; }
== { return T_EQ; }
>= { return T_GEQ; }
> { return T_GT; }
\<\< { return T_LEFTSHIFT; }
>> { return T_RIGHTSHIFT; }
\<= { return T_LEQ; }
\[ { return T_LSB; }
\] { return T_RSB; }
\< { return T_LT; }
\- { return T_MINUS; }
\+ { return T_PLUS; }
\% { return T_MOD; }
\* { return T_MULT; }
!= { return T_NEQ; }
! { return T_NOT; }
\|\| { return T_OR; }
; { return T_SEMICOLON; }
\"([^\n"\\]|\\(a|b|t|n|v|f|r|\\|\'|\"))*\" { yylval.sval = new string(yytext); return T_STRINGCONSTANT; }

([0-9]+(\.[0-9]+)?)|(0[xX][0-9A-Fa-f]+) { yylval.sval = new string(yytext); return T_INTCONSTANT; }
\'([^\n'\\]|\\(a|b|t|n|v|f|r|\\|\'|\"))\' { yylval.sval = new string(yytext); return T_CHARCONSTANT; }
"//".*"\n" { }
\( { return T_LPAREN; }
\) { return T_RPAREN; }

&& { return T_AND; }
= { return T_ASSIGN; }
\/ { return T_DIV; }
"." { return T_DOT; }
[\t\r\n\a\v\b ]+ { } /* ignore whitespace */
. { cerr << "Error: unexpected character in input" << endl; return -1; }
%%

int yyerror(const char *s) {
cerr << yylineno << ":" << s << endl;
return 1;
}

yacc 代码: decafast.y

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
%{
#include <iostream>
#include <ostream>
#include <string>
#include <cstdlib>
#include <vector>
#include "default-defs.h"

int yylex(void);
int yyerror(char *);

// print AST
bool printAST = true;
#include "decafast.cc"
using namespace std;

string proceesCharLit(string charlit) {
// a|b|t|n|v|f|r
if (charlit[1] == '\\'){
switch (charlit[2]){
case 'n':
return to_string('\n');
case 'a':
return to_string('\a');
case 'b':
return to_string('\b');
case 't':
return to_string('\t');
case 'v':
return to_string('\v');
case 'f':
return to_string('\f');
case 'r':
return to_string('\r');
case '\\':
return to_string('\\');
case '\'':
return to_string('\'');
case '\"':
return to_string('\"');
}
}
return to_string(charlit[1]);
}
%}

%union{
class decafAST *ast;
std::string *sval;
std::vector<std::string> *svals;
}

%token T_PACKAGE
%token T_LCB
%token T_RCB
%token <sval> T_ID T_STRINGTYPE T_INTTYPE T_BOOLTYPE T_VOID T_INTCONSTANT T_STRINGCONSTANT T_CHARCONSTANT
%token T_N_TOKEN T_FUNC T_INT T_LPAREN T_RPAREN T_WHITESPACE T_WHITESPACE_N T_AND T_ASSIGN T_BREAK T_COMMA T_COMMENT T_CONTINUE T_DIV T_DOT T_ELSE T_EQ T_EXTERN T_FALSE T_FOR T_GEQ T_GT T_IF T_LEFTSHIFT T_LEQ T_LSB T_LT T_MINUS T_MOD T_MULT T_NEQ T_NOT T_NULL T_OR T_PLUS T_RIGHTSHIFT T_RSB T_SEMICOLON T_TRUE T_VAR T_WHILE T_RETURN
%type <ast> return_type extern_list decafpackage extern_stmt extern_var_list ExternType program field_list method_list MethodDecl FieldDecl normal_type constant_expr T_TRUE T_FALSE typed_var_defs Block Statement Statements MethodArg MethodArgList ForAssignList ElseBlock Expr Assign MethodCall VarDecl MethodBlock VarDecls
%type <svals> comma_id_list

%nonassoc T_ID
%nonassoc HIGHIER_T_ID

%left T_OR
%left T_AND
%left T_EQ T_NEQ T_LT T_LEQ T_GT T_GEQ
%left T_PLUS T_MINUS
%left T_MULT T_DIV T_MOD T_RIGHTSHIFT T_LEFTSHIFT
%left UMINUS
%left UNOT
%%

start: program

program: extern_list decafpackage
{
ProgramAST *prog = new ProgramAST((decafStmtList *)$1, (PackageAST *)$2);
if (printAST) {
cout << getString(prog) << endl;
}
delete prog;
}

extern_list: /* extern_list can be empty */
{ decafStmtList *slist = new decafStmtList(); $$ = slist; }
| extern_list extern_stmt { decafStmtList *slist = (decafStmtList *) $1; slist->push_back($2); $$ = slist; }

extern_stmt: T_EXTERN T_FUNC T_ID T_LPAREN extern_var_list T_RPAREN return_type T_SEMICOLON
{
VarDefAST * ast_var_def = new VarDefAST((decafStmtList *)$5);
ExternFunctionAST * ast_extern_function = new ExternFunctionAST(*$3, (TypeAST * )$7, ast_var_def);
delete $3;
$$ = ast_extern_function;
}


extern_var_list:
{ decafStmtList *slist = new decafStmtList(); $$ = slist;}
| extern_var_list T_COMMA ExternType { decafStmtList * slist = (decafStmtList *)$1; slist->push_back($3); $$ = slist; }
| ExternType { decafStmtList * slist = new decafStmtList(); slist->push_back($1); $$ = slist; }

ExternType: T_STRINGTYPE { $$ = new TypeAST($1); }
| T_INTTYPE { $$ = new TypeAST($1); }
| T_BOOLTYPE { $$ = new TypeAST($1); }

return_type: T_VOID { $$ = new TypeAST($1);}
| T_BOOLTYPE { $$ = new TypeAST($1); }
| T_INTTYPE { $$ = new TypeAST($1); }
| T_STRINGTYPE { $$ = new TypeAST($1); }

normal_type:
T_BOOLTYPE { $$ = new TypeAST($1); }
| T_INTTYPE { $$ = new TypeAST($1); }
| T_STRINGTYPE { $$ = new TypeAST($1); }

decafpackage: T_PACKAGE T_ID T_LCB field_list method_list T_RCB
{ $$ = new PackageAST(*$2, (decafStmtList *)$4, (decafStmtList *)$5); delete $2; }


field_list:
{ decafStmtList * slist = new decafStmtList(); $$ = slist; }
| field_list FieldDecl {
decafStmtList * slist = (decafStmtList *)$1;
decafStmtList * slist2 = (decafStmtList *)$2;
slist2->move_to(slist);
delete slist2;
// std::cout << getString(slist) << endl;
$$ = slist;
}

constant_expr: T_INTCONSTANT {NumberExpr * ne = new NumberExpr(*$1); delete $1; $$ = ne; }
| T_TRUE { BoolExpr * be = new BoolExpr(true); $$ = be;}
| T_FALSE { BoolExpr * be = new BoolExpr(false); $$ = be;}
| T_CHARCONSTANT {NumberExpr * ne = new NumberExpr(proceesCharLit(*$1)); delete $1; $$ = ne; }
| T_STRINGCONSTANT { StringConstantAST * sast = new StringConstantAST(*$1); $$ = sast; delete $1; }

FieldDecl:
T_VAR comma_id_list normal_type T_SEMICOLON
{
decafStmtList * slist = new decafStmtList();
for(std::vector<std::string>::iterator i = $2->begin(); i != $2->end() ; i++){
std::string var_name = *i;
VarSizeAST * sizedecl = new VarSizeAST(VAR_TYPE_SCALAR, "1");
FieldDeclAST * decl = new FieldDeclAST(var_name, (TypeAST *)$3, sizedecl);
slist->push_back( (decafAST *)decl );
}
delete $2;
$$ = slist;
}
| T_VAR comma_id_list T_LSB T_INTCONSTANT T_RSB normal_type T_SEMICOLON
{
decafStmtList * slist = new decafStmtList();
for(std::vector<std::string>::iterator i = $2->begin(); i != $2->end() ; i++){
std::string var_name = *i;
VarSizeAST * sizedecl = new VarSizeAST(VAR_TYPE_ARRAY, *$4);
FieldDeclAST * decl = new FieldDeclAST(var_name, (TypeAST *)$6, sizedecl);
slist->push_back( (decafAST *)decl );
}
delete $2;
delete $4;
$$ = slist;
}
| T_VAR comma_id_list normal_type T_ASSIGN constant_expr T_SEMICOLON
{
std::vector<std::string> * ids = $2;
if( ids->size() == 1){
decafStmtList * slist = new decafStmtList();
AssignGlobalVar * assign = new AssignGlobalVar(*ids->begin(), (TypeAST *)$3, (ValueExpr *) $5);
delete $2;
slist->push_back(assign);
$$ = slist;
}else{
printf("error\n");
//exit(0);
YYABORT;
}

}

VarDecl:
T_VAR comma_id_list normal_type T_SEMICOLON
{
decafStmtList * slist = new decafStmtList();
for(std::vector<std::string>::iterator i = $2->begin(); i != $2->end() ; i++){
std::string var_name = *i;
VarDefAST * decl = new VarDefAST(var_name, (TypeAST *)$3);
slist->push_back( (decafAST *)decl );
}
delete $2;
$$ = slist;
}
| T_VAR comma_id_list T_LSB T_INTCONSTANT T_RSB normal_type T_SEMICOLON
{
decafStmtList * slist = new decafStmtList();
for(std::vector<std::string>::iterator i = $2->begin(); i != $2->end() ; i++){
std::string var_name = *i;
VarDefAST * decl = new VarDefAST(var_name, (TypeAST *)$6);
slist->push_back( (decafAST *)decl );
}
delete $2;
delete $4;
$$ = slist;
}

VarDecls:
{ decafStmtList * slist = new decafStmtList(); $$ = slist; }
| VarDecls VarDecl {
decafStmtList * slist = (decafStmtList *)$1;
decafStmtList * slist2 = (decafStmtList *)$2;
slist2->move_to(slist);
delete slist2;
// std::cout << getString(slist) << endl;
$$ = slist;
}

comma_id_list: comma_id_list T_COMMA T_ID { std::vector<std::string> * ids = $1; ids->push_back(*$3); $$ = ids; delete $3; }
| T_ID { std::vector<std::string> * ids = new std::vector<std::string>(); ids->push_back(*$1); $$ = ids; delete $1;}

method_list: {decafStmtList * slist = new decafStmtList(); $$ = slist; }
| method_list MethodDecl { decafStmtList * slist = (decafStmtList *)$1; slist->push_back($2); $$ = $1;}

typed_var_defs:
{ decafStmtList* slist = new decafStmtList(); $$ = slist; }
| typed_var_defs T_COMMA T_ID normal_type { decafStmtList * slist = (decafStmtList *)$1; VarDefAST * def = new VarDefAST(*$3, ( TypeAST *)$4); slist->push_back(def); delete $3; $$ = $1;}
| T_ID normal_type { decafStmtList* slist = new decafStmtList(); VarDefAST * def = new VarDefAST(*$1, ( TypeAST *)$2); slist->push_back(def); delete $1; $$ = slist; }

MethodDecl: T_FUNC T_ID T_LPAREN typed_var_defs T_RPAREN return_type MethodBlock
{
MethodAST * method_decl = new MethodAST(*$2, (TypeAST *)$6, (decafStmtList *)$4, (MethodBlockAST *)$7);
delete $2;
$$ = method_decl;
}

MethodBlock: T_LCB VarDecls Statements T_RCB
{
MethodBlockAST * block = new MethodBlockAST((decafStmtList *)$2, (decafStmtList *)$3);
$$ = block;
}

Block: T_LCB VarDecls Statements T_RCB
{
BlockAST * block = new BlockAST((decafStmtList *)$2, (decafStmtList *)$3);
$$ = block;
}

Statements: {decafStmtList * slist = new decafStmtList(); $$ = slist;}
| Statements Statement { decafStmtList * slist = (decafStmtList *)$1; slist->push_back($2); $$ = slist; }

ElseBlock:
{ $$ = nullptr; }
| T_ELSE Block { $$ = $2; }

Statement: Block { $$ = $1; }
| Assign T_SEMICOLON { $$ = $1; }
| MethodCall T_SEMICOLON { $$ = $1; }
| T_IF T_LPAREN Expr T_RPAREN Block ElseBlock { IfStmtAST * ifast = new IfStmtAST((ExprAST *)$3, (BlockAST *)$5, (BlockAST *)$6 );$$ = ifast; }
| T_WHILE T_LPAREN Expr T_RPAREN Block { WhileStmtAST * whileast = new WhileStmtAST((ExprAST *)$3, (BlockAST *)$5); $$ = whileast; }
| T_FOR T_LPAREN ForAssignList T_SEMICOLON Expr T_SEMICOLON ForAssignList T_RPAREN Block { ForStmtAST * forast = new ForStmtAST((decafStmtList *)$3, (ExprAST *)$5, (decafStmtList *)$7, (BlockAST *)$9); $$ = forast; }
| T_RETURN T_LPAREN Expr T_RPAREN T_SEMICOLON { ReturnStmtAST * returnast = new ReturnStmtAST((ExprAST *)$3); $$ = returnast; }
| T_RETURN T_LPAREN T_RPAREN T_SEMICOLON {ReturnStmtAST * returnast = new ReturnStmtAST(nullptr); $$ = returnast; }
| T_RETURN T_SEMICOLON { ReturnStmtAST * returnast = new ReturnStmtAST(nullptr); $$ = returnast; }
| T_BREAK T_SEMICOLON { BreakStmtAST * breakast = new BreakStmtAST(); $$ = breakast; }
| T_CONTINUE T_SEMICOLON { ContinueStmtAST * cntast = new ContinueStmtAST(); $$ = cntast; }

Assign:
T_ID T_ASSIGN Expr
{
AssignVarAST * assign = new AssignVarAST(*$1, (ExprAST *)$3);
delete $1;
$$ = assign;
}
| T_ID T_LSB Expr T_RSB T_ASSIGN Expr
{
AssignArrayLocAST * arr_loc = new AssignArrayLocAST(*$1, (ExprAST *)$3, (ExprAST *)$6);
delete $1;
$$ = arr_loc;
}

MethodArg: Expr { $$ = $1;}

MethodArgList: { decafStmtList * slist = new decafStmtList(); $$ = slist; }
| MethodArgList T_COMMA MethodArg {decafStmtList * slist = (decafStmtList *)$1; slist->push_back($3); $$ = $1;}
| MethodArg { decafStmtList * slist = new decafStmtList(); $$ = slist; slist->push_back($1); }

MethodCall: T_ID T_LPAREN MethodArgList T_RPAREN
{
MethodCallAST * call = new MethodCallAST(*$1, (decafStmtList *)$3);
$$ = call;
delete $1;
}

Expr:
T_ID { VariableExprAST * var_exp = new VariableExprAST(*$1); delete $1; $$ = var_exp; }
| MethodCall { $$ = $1; }
| constant_expr { $$ = $1; }
| T_LPAREN Expr T_RPAREN { $$ = $2 }
| T_ID T_LSB Expr T_RSB { ArrayLocExprAST * arr = new ArrayLocExprAST(*$1, (ExprAST *)$3); $$ = arr; delete $1; }
| Expr T_PLUS Expr { BinaryExprAST * binexp = new BinaryExprAST("Plus", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_MINUS Expr {BinaryExprAST * binexp = new BinaryExprAST("Minus", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_MULT Expr { BinaryExprAST * binexp = new BinaryExprAST("Mult", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_DIV Expr { BinaryExprAST * binexp = new BinaryExprAST("Div", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_LEFTSHIFT Expr { BinaryExprAST * binexp = new BinaryExprAST("Leftshift", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_RIGHTSHIFT Expr { BinaryExprAST * binexp = new BinaryExprAST("Rightshift", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_MOD Expr { BinaryExprAST * binexp = new BinaryExprAST("Mod", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_LT Expr { BinaryExprAST * binexp = new BinaryExprAST("Lt", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_GT Expr { BinaryExprAST * binexp = new BinaryExprAST("Gt", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_LEQ Expr { BinaryExprAST * binexp = new BinaryExprAST("Leq", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp; }
| Expr T_GEQ Expr { BinaryExprAST * binexp = new BinaryExprAST("Geq", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp;}
| Expr T_EQ Expr { BinaryExprAST * binexp = new BinaryExprAST("Eq", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp;}
| Expr T_NEQ Expr { BinaryExprAST * binexp = new BinaryExprAST("Neq", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp;}
| Expr T_AND Expr { BinaryExprAST * binexp = new BinaryExprAST("And", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp;}
| Expr T_OR Expr { BinaryExprAST * binexp = new BinaryExprAST("Or", (ExprAST *)$1, (ExprAST *)$3); $$ = binexp;}
| T_MINUS Expr %prec UMINUS { UnaryExprAST * unary = new UnaryExprAST("UnaryMinus", (ExprAST *)$2); $$ = unary;}
| T_NOT Expr %prec UNOT { UnaryExprAST * unary = new UnaryExprAST("Not", (ExprAST *)$2); $$ = unary;}


ForAssignList: /* { Assign }+, */
{ decafStmtList * slist = new decafStmtList(); $$ = slist}
| ForAssignList T_COMMA Assign { decafStmtList * slist = (decafStmtList *)$1; slist->push_back($3); $$ = slist; }
| Assign { decafStmtList * slist = new decafStmtList(); slist->push_back($1); $$ = slist; }


%%

int main() {
// parse the input and create the abstract syntax tree
int retval = yyparse();
return(retval >= 1 ? EXIT_FAILURE : EXIT_SUCCESS);
}


decafast.cc

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498

#include "default-defs.h"
#include <list>
#include <ostream>
#include <iostream>
#include <sstream>
#include <string>
#ifndef YYTOKENTYPE
#include "decafast.tab.h"
#endif

using namespace std;

/// decafAST - Base class for all abstract syntax tree nodes.
class decafAST {
public:
virtual ~decafAST() {}
virtual string str() { return string(""); }
};

string getString(decafAST *d) {
if (d != NULL) {
return d->str();
} else {
return string("None");
}
}

template <class T>
string commaList(list<T> vec) {
string s("");
for (typename list<T>::iterator i = vec.begin(); i != vec.end(); i++) {
s = s + (s.empty() ? string("") : string(",")) + (*i)->str();
}
if (s.empty()) {
s = string("None");
}
return s;
}

/// decafStmtList - List of Decaf statements
class decafStmtList : public decafAST {
list<decafAST *> stmts;
public:
decafStmtList() {}
~decafStmtList() {
for (list<decafAST *>::iterator i = stmts.begin(); i != stmts.end(); i++) {
delete *i;
}
}
int size() { return stmts.size(); }
void push_front(decafAST *e) { stmts.push_front(e); }
void push_back(decafAST *e) { stmts.push_back(e); }
void move_to(decafStmtList * target){
assert( target != NULL );
for( list<decafAST *>::iterator i = stmts.begin(); i != stmts.end(); i++) {
target->push_back( *i );
}
stmts.clear();
}
string str() { return commaList<class decafAST *>(stmts); }
};

// package = Package(identifier name, field_decl* field_list, method_decl* method_list)
class PackageAST : public decafAST {
string Name;
decafStmtList *FieldDeclList;
decafStmtList *MethodDeclList;
public:
PackageAST(string name, decafStmtList *fieldlist, decafStmtList *methodlist)
: Name(name), FieldDeclList(fieldlist), MethodDeclList(methodlist) {}
~PackageAST() {
if (FieldDeclList != NULL) { delete FieldDeclList; }
if (MethodDeclList != NULL) { delete MethodDeclList; }
}
string str() {
return string("Package") + "(" + Name + "," + getString(FieldDeclList) + "," + getString(MethodDeclList) + ")";
}
};

/// ProgramAST - the decaf program
// prog = Program(extern* extern_list, package body)
class ProgramAST : public decafAST {
decafStmtList *ExternList;
PackageAST *PackageDef;
public:
ProgramAST(decafStmtList *externs, PackageAST *c) : ExternList(externs), PackageDef(c) {}
~ProgramAST() {
if (ExternList != NULL) { delete ExternList; }
if (PackageDef != NULL) { delete PackageDef; }
}
string str() { return string("Program") + "(" + getString(ExternList) + "," + getString(PackageDef) + ")"; }
};

class TypeAST : public decafAST {
string * Name;

public:
TypeAST(string * name): Name(name) {};
~TypeAST() { if(Name) delete Name; };
string str() {
return string(*Name);
}
};

class VarDefAST : public decafAST {
string Name;
decafStmtList * TypeList;

public:
VarDefAST(decafStmtList * typelist): TypeList(typelist) {}
VarDefAST(std::string name, TypeAST * type ) {
decafStmtList * typelist = new decafStmtList();
typelist->push_back(type);
TypeList = typelist;
Name = name;
};

~VarDefAST() {
if( TypeList ) delete TypeList;
}


string str(){
if (TypeList->size() == 0)
return string("None");
else if(TypeList->size() == 1 && Name != "")
return string("VarDef(") + Name + "," + getString(TypeList) + ")";
else
return string("VarDef(") + getString(TypeList) + ")";
}
};




// ExternFunction(identifier name, method_type return_type, extern_type* typelist)
class ExternFunctionAST : public decafAST {
string Name;
TypeAST * ReturnType;
VarDefAST * VarList;

public:
ExternFunctionAST(string name, TypeAST * returntype, VarDefAST * varlist): Name(name), ReturnType(returntype), VarList(varlist) {}
~ExternFunctionAST() {
if( VarList ) delete VarList;
if( ReturnType ) delete ReturnType;
}

string str() {
return string("ExternFunction") + "(" + Name + "," + ReturnType->str() + "," + getString(VarList) + ")";
}
};

// field_size = Scalar | Array(int array_size)
enum VAR_TYPE { VAR_TYPE_SCALAR, VAR_TYPE_ARRAY };
class VarSizeAST : public decafAST {
private:
VAR_TYPE VarType;
string Size;
public:
VarSizeAST(VAR_TYPE type, string size): VarType(type), Size(size) {};

string str() {
if( VarType == VAR_TYPE_SCALAR)
return string("Scalar");
else
return string("Array(") + Size + ")";
}
};

// FieldDecl(identifier name, decaf_type type, field_size size)
class FieldDeclAST : public decafAST {
string Name;
TypeAST * Type;
VarSizeAST * field_size;

public:
FieldDeclAST(string name, TypeAST * type, VarSizeAST * size): Name(name), Type(type), field_size(size) { };
~FieldDeclAST() { if (Type) delete Type; if(field_size) delete field_size;};

string str(){
return string("FieldDecl(") + Name + "," + getString(Type) + "," + getString(field_size) + ")";
}

};

class StringConstantAST: public decafAST {
string Value;
public:
StringConstantAST(string v): Value(v) {};
~StringConstantAST() {};

string str() {
return string("StringConstant(") + Value + ")";
}
};

class ValueExpr : public decafAST {};

// NumberExpr(int value)
class NumberExpr : public ValueExpr {
string Value;

public:
NumberExpr(string value): Value(value) { };
~NumberExpr() {};
string str() {
return string("NumberExpr(") + Value + ")";
}
};

// BoolExpr(bool value)
class BoolExpr : public ValueExpr {
bool Value;

public:
BoolExpr(bool value): Value(value) {};
~BoolExpr() {};

string str() {
return string("BoolExpr(") + ( Value ? "True":"False" ) + ")";
}
};


// AssignGlobalVar(identifier name, decaf_type type, constant value)
class AssignGlobalVar : public decafAST {
string Name;
TypeAST * Type;
ValueExpr * Value;

public:
AssignGlobalVar(string name, TypeAST * type,ValueExpr * value): Name(name), Type(type), Value(value) {} ;
~AssignGlobalVar() {
if ( Type ) delete Type;
if ( Value ) delete Value;
}

string str() {
return string("AssignGlobalVar(") + Name + "," + getString(Type) + "," + getString(Value) + ")";
}
};

// MethodBlock(typed_symbol* var_decl_list, statement* statement_list)
class MethodBlockAST: public decafAST {
decafStmtList * VarDeclList;
decafStmtList * StatementList;

public:
MethodBlockAST(decafStmtList* var_decl_list, decafStmtList * statement_list): VarDeclList(var_decl_list), StatementList(statement_list) {}
~MethodBlockAST(){
if( VarDeclList ) delete VarDeclList;
if( StatementList ) delete StatementList;
}

string str() {
return string("MethodBlock(") + getString(VarDeclList)+"," + getString(StatementList) + ")";
}
};

// Block(typed_symbol* var_decl_list, statement* statement_list)
class BlockAST: public decafAST {
decafStmtList * VarDeclList;
decafStmtList * StatementList;

public:
BlockAST(decafStmtList* var_decl_list, decafStmtList * statement_list): VarDeclList(var_decl_list), StatementList(statement_list) {}
~BlockAST(){
if( VarDeclList ) delete VarDeclList;
if( StatementList ) delete StatementList;
}

string str() {
return string("Block(") + getString(VarDeclList)+"," + getString(StatementList) + ")";
}
};

// Method(identifier name, method_type return_type, typed_symbol* param_list, method_block block)
class MethodAST : public decafAST {
string Name;
TypeAST * ReturnType;
decafStmtList * ParamList; // VarDefAST*
MethodBlockAST * Block;

public:
MethodAST(string name, TypeAST * return_type, decafStmtList * param_list, MethodBlockAST * block):
Name(name), ReturnType(return_type), ParamList(param_list), Block(block) {};

~MethodAST(){
if( ReturnType ) delete ReturnType;
if( ParamList ) delete ParamList;
if( Block ) delete Block;
}

string str() {
return string("Method(") + Name + "," + getString(ReturnType) + "," + getString(ParamList) + "," + getString(Block) + ")";
}
};

class ExprAST: public decafAST {};

// AssignArrayLoc(identifier name, expr index, expr value)
class AssignArrayLocAST : public decafAST {
string Name;
ExprAST * IndexExpr;
ExprAST * ValueExpr;

public:
AssignArrayLocAST(string name, ExprAST * index, ExprAST * value): Name(name), IndexExpr(index), ValueExpr(value) {};
~AssignArrayLocAST() {
if ( IndexExpr ) delete IndexExpr;
if ( ValueExpr ) delete ValueExpr;
}

string str() {
return string("AssignArrayLoc(") + Name + "," + getString(IndexExpr) + "," + getString(ValueExpr) + ")";
}
};

// AssignVar(identifier name, expr value)
class AssignVarAST: public decafAST {
string Name;
ExprAST * Expr;

public:
AssignVarAST(string name, ExprAST * expr): Name(name), Expr(expr) {};
~AssignVarAST() {
if( Expr ) delete Expr;
};

string str() {
return string("AssignVar(") + Name + "," + getString(Expr) + ")";
}
};

// rvalue = VariableExpr(identifier name)
// | ArrayLocExpr(identifier name, expr index)
class VariableExprAST: public decafAST {
string Id;
public:
VariableExprAST(string id): Id(id) {};
~VariableExprAST() {};

string str() {
return string("VariableExpr(") + Id + ")";
}
};

class ArrayLocExprAST: public decafAST {
string Id;
ExprAST * IndexExpr;

public:
ArrayLocExprAST(string id, ExprAST * expr): Id(id), IndexExpr(expr) {};
~ArrayLocExprAST() {
if( IndexExpr ) delete IndexExpr;
}

string str() {
return string("ArrayLocExpr(") + Id + "," + getString(IndexExpr) + ")";
}

};

// MethodCall(identifier name, method_arg* method_arg_list)
class MethodCallAST: public decafAST {
string Name;
decafStmtList * MethodArgList;
public:
MethodCallAST(string name, decafStmtList * method_arg_list): Name(name), MethodArgList(method_arg_list) {};
~MethodCallAST() {
if ( MethodArgList ) delete MethodArgList;
}

string str() {
return string("MethodCall(") + Name + "," + getString(MethodArgList) + ")";
}
};

// BinaryExpr(binary_operator op, expr left_value, expr right_value)
class BinaryExprAST: public decafAST {
string Op; // {Plus , Minus , Mult , Div , Leftshift , Rightshift , Mod , Lt , Gt , Leq , Geq , Eq , Neq , And , Or};
ExprAST * LeftExpr;
ExprAST * RightExpr;

public:
BinaryExprAST(string op, ExprAST * left, ExprAST * right): Op(op), LeftExpr(left), RightExpr(right) {};
~BinaryExprAST() {
if ( LeftExpr ) delete LeftExpr;
if ( RightExpr ) delete RightExpr;
}

string str() {
return string("BinaryExpr(") + Op + "," + getString(LeftExpr) + "," + getString(RightExpr) + ")";
}

};

// UnaryExpr(unary_operator op, expr value)
class UnaryExprAST: public decafAST {
string Op; // UnaryMinus | Not
ExprAST * Expr;

public:
UnaryExprAST(string op, ExprAST * expr): Op(op), Expr(expr) {};
~UnaryExprAST() {
if ( Expr ) delete Expr;
}

string str() {
return string("UnaryExpr(") + Op + "," + getString(Expr) + ")";
}
};


// IfStmt(expr condition, block if_block, block? else_block)
class IfStmtAST: public decafAST {
ExprAST * ConditionExpr;
BlockAST * IfBlock;
BlockAST * ElseBlock;

public:
IfStmtAST(ExprAST * condition, BlockAST * ifblock, BlockAST * elseblock): ConditionExpr(condition), IfBlock(ifblock), ElseBlock(elseblock) {};
~IfStmtAST() {
if ( ConditionExpr ) delete ConditionExpr;
if ( IfBlock ) delete IfBlock;
if ( ElseBlock ) delete ElseBlock;
}

string str() {
return string("IfStmt(") + getString(ConditionExpr) + "," + getString(IfBlock) + "," + getString(ElseBlock) + ")";
}
};

// WhileStmt(expr condition, block while_block)
class WhileStmtAST: public decafAST {
ExprAST * ConditionExpr;
BlockAST * WhileBlock;

public:
WhileStmtAST(ExprAST * cond, BlockAST * block): ConditionExpr(cond), WhileBlock(block) {} ;
~WhileStmtAST() {
if ( ConditionExpr ) delete ConditionExpr;
if ( WhileBlock ) delete WhileBlock;
}

string str() {
return string("WhileStmt(") + getString(ConditionExpr) + "," + getString(WhileBlock) + ")";
}
};

// ForStmt(assign* pre_assign_list, expr condition, assign* loop_assign_list, block for_block)
class ForStmtAST: public decafAST {
decafStmtList * PreAssignList;
ExprAST * Condition;
decafStmtList * LoopAssignList;
BlockAST * ForBlock;
public:
ForStmtAST(decafStmtList * pre_assign_list, ExprAST * condition, decafStmtList * loop_assign_list, BlockAST * for_block):
PreAssignList(pre_assign_list), Condition(condition), LoopAssignList(loop_assign_list), ForBlock(for_block) {};
~ForStmtAST() {
if ( PreAssignList ) delete PreAssignList;
if ( Condition ) delete Condition;
if ( LoopAssignList ) delete LoopAssignList;
if ( ForBlock ) delete ForBlock;
}

string str() {
return string("ForStmt(") + getString(PreAssignList) + "," + getString(Condition) + "," + getString(LoopAssignList) + "," + getString(ForBlock) + ")";
}
};

// ReturnStmt(expr? return_value)
class ReturnStmtAST: public decafAST {
ExprAST * ReturnValue;
public:
ReturnStmtAST(ExprAST * return_value): ReturnValue(return_value) {};
~ReturnStmtAST() {
if ( ReturnValue ) delete ReturnValue;
}
string str() {
return string("ReturnStmt(") + getString(ReturnValue) + ")";
}
};

// BreakStmt
class BreakStmtAST: public decafAST {
public:
string str() { return string("BreakStmt"); }
};

// ContinueStmt
class ContinueStmtAST: public decafAST {
public:
string str() { return string("ContinueStmt"); }

};

下一步? 上 LLVM。


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!