LLVM Practice

LLVM Practice

Decaf 语言要用 LLVM 作为后端,生成机器代码。我本人对 LLVM 相关的 API 也十分感兴趣。这篇文章记录我学习 LLVM 的过程及相关代码片段。

编译 LLVM

1
2
3
4
5
6
git clone https://github.com/llvm/llvm-project.git
cd llvm-project
mkdir build
cd build
cmake -G "Unix Makefiles" ../llvm
make -j16

编译完成后可以在 build/bin 中找到 llvm 相关的 bin 文件。

如何运行 LLVM assembly 代码?

llvm-as 程序将 LLVM assembly 编译成 LLVM bitcode

llc 程序将 LLVM bitcode 编译成 x86 assembly,得到 .s 文件

LLVM assembly 运行脚本: run-llvm-code.sh

decaf-stdlib.c 文件封装了 decaf 语言的相关库函数。

1
2
3
4
5
6
7
llvmconfig=/Users/pandaos/llvm-project/build/bin/llvm-config
b=`basename -s .ll $1`
`$llvmconfig --bindir`/llvm-as $1 # convert LLVM assembly to bitcode
`$llvmconfig --bindir`/llc $b.bc # convert LLVM bitcode to x86 assembly
clang $b.s decaf-stdlib.c -o $b
./$b
rm -f $b.bc $b.s $b

使用例子: ./run-llvm-code.sh helloworld.ll

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
; Declare the string constant as a global constant. 
; run the following command to run this LLVM assembly program:
; sh run-llvm-code.sh helloworld.ll
@LC0 = internal constant [13 x i8] c"hello world\0A\00"
; note how the newline character is inserted into the string

; External declaration of the puts function
declare i32 @puts(i8*)

; Definition of main function
define i32 @main() {
; Convert [13 x i8]* to i8*
; this is because the function puts takes a char* which is an i8* in LLVM
%cast = getelementptr [13 x i8], [13 x i8]* @LC0, i8 0, i8 0
; read up on getelementptr: http://llvm.org/docs/GetElementPtr.html

; Call puts function to write out the char* string to stdout.
call i32 @puts(i8* %cast)
ret i32 0
}

CMake 项目中嵌入 LLVM 库

Clion 使用 CMake 作为项目的构建工具,使用 CMake 引用 LLVM 非常方便。

1
2
3
4
5
6
7
8
9
10
11
# llvm settings.
set(LLVM_DIR /Users/pandaos/llvm-project/build/lib/cmake/llvm) # 默认安装则不用设置 LLVM_DIR
find_package(LLVM REQUIRED CONFIG)
message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}")
message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}")
include_directories(${LLVM_INCLUDE_DIRS})
add_definitions(${LLVM_DEFINITIONS})
add_executable(llvm_test main.cpp) # llvm_test 是 project name
# Link against LLVM libraries
target_link_libraries(llvm_test ${llvm_libs})# llvm_test 是 project name

LLVM C++ API

这一部分主要是如何利用 LLVM API 生成中间代码。

头文件

1
2
3
4
5
6
llvm/IR/DerivedTypes.h
llvm/IR/LLVMContext.h
llvm/IR/Module.h
llvm/IR/Type.h
llvm/IR/Verifier.h
llvm/IR/IRBuilder.h

TheModule

1
2
3
4
5
6
7
8
9
10
11
12
13
// this global variable contains all the generated code
static llvm::Module *TheModule;
static llvm::LLVMContext TheContext;
// this is the method used to construct the LLVM intermediate code (IR)
static llvm::IRBuilder<> Builder(TheContext);


int main() {
TheModule = new llvm::Module("test1", TheContext);
// ......
TheModule->print(llvm::errs(), nullptr); // standard error
return 0;
}

TheModule 里面有所有生成的代码,用下面这种方法可以打印

1
2
TheModule->print(llvm::outs(), nullptr); // standard output
TheModule->print(llvm::errs(), nullptr); // standard error

LLVM Value

LLVM 中的 types, constants, functions 这些都是从 llvm:Value 继承而来。yacc 中,action 之间的数据传递可以用 llvm:Value *。具体使用某一个数据时,可以重新转换成具体的类型,例如llvm:Function*

LLVM Types

Type llvm::Type* Explanation
void Builder.getVoidTy() just a void type
int Builder.getInt32Ty() assume 32 bit integers
bool Builder.getInt1Ty() a one bit integer
string Builder.getInt8PtrTy() pointer to array of bytes (int8)

LLVM Constant

llvm::Constant*

Int32: Builder.getInt32(0)

bool: Builder.getInt1(0)

Stack 上分配变量

一般情况,局部变量都是存储在栈上的。CreateAlloca 在当前插入点创建一个变量,而不是在 Block 入口点创建。

1
2
3
4
5
llvm::AllocaInst *Alloca;
// unlike CreateEntryBlockAlloca the following will
// create the alloca instr at the current insertion point
// rather than at the start of the block
Alloca = llvm::Builder.CreateAlloca(llvm::IntegerType::get(TheContext, 32), 0, "variable_name");

Alloca->getType() 获取类型

类型检查:

1
2
3
const llvm::PointerType *ptrTy = rvalue->getType()->getPointerTo();
if (ptrTy == Alloca->getType()) .... ;

是不是感觉写法很奇怪?这是因为 Alloca 的类型是实际类型的指针类型。比如我在栈上分配一个int变量,CreateAlloca 返回的是一个 int *

赋值

1
llvm::Value *val = Builder.CreateStore(rvalue, Alloca); // Alloca := rvalue

算术操作

OP Function
+ Builder.CreateAdd
- Builder.CreateSub
* Builder.CreateMul
/ Builder.CreateSDiv
<< Builder.CreateShl
>> Builder.CreateLShr
% Builder.CreateSRem
< Builder.CreateICmpSLT
> Builder.CreateICmpSGT
<= Builder.CreateICmpSLE
>= Builder.CreateICmpSGE
&& Builder.CreateAnd
`
== Builder.CreateICmpEQ
!= Builder.CreateICmpNE
- Builder.CreateNeg
! Builder.CreateNot

函数定义

1
2
3
4
5
6
7
8
9
10
11
12
llvm::Type *returnTy;
// assign the correct Type to returnTy

std::vector<llvm::Type *> args;
// fill up the args vector with types

llvm::Function *func = llvm::Function::Create(
llvm::FunctionType::get(returnTy, args, false),
llvm::Function::ExternalLinkage,
Name,
TheModule
);

创建 Basic Block

1
2
3
4
5
// Create a new basic block which contains a sequence of LLVM instructions
llvm::BasicBlock *BB = llvm::BasicBlock::Create(TheContext, "entry", func);
// insert "entry" into symbol table (not used in HW3 but useful in HW4)
// All subsequent calls to IRBuilder will place instructions in this location
Builder.SetInsertPoint(BB);

Builder.SetInsertPoint 很重要,设置 Builder 代码插入位置。

为参数创建对应的局部变量并添加到符号表,这样就可以用符号的形式访问参数。

1
2
3
4
5
6
7
for (auto &Arg : func->args()) {
llvm::AllocaInst *Alloca = CreateEntryBlockAlloca(func, Arg.getName());
// Store the initial value into the alloca.
Builder.CreateStore(&Arg, Alloca);
// Add to symbol table
syms.enter_symtbl(Arg.getName(), Alloca);
}

一些辅助函数

1
2
3
4
5
6
7
8
llvm::BasicBlock *CurBB = Builder.GetInsertBlock();
// gives you a link to the current basic block

llvm::Function *func = Builder.GetInsertBlock()->getParent();
// gives you a pointer to the function definition

func->getReturnType()
// gives you the return type of the function

插入 return 语句

1
2
3
4
// sometimes the return statement is deep inside the method
// so it is useful to retrieve the function we are in without
// passing it down to all the AST nodes below the method declaration
Builder.CreateRet(llvm::Value*)

函数调用

1
2
3
4
5
6
7
8
9
10
11
12
13
14
llvm::Function *call;
// assign this to the pointer to the function to call,
// usually loaded from the symbol table

std::vector<llvm::Value *> args;
// argvals are the values in the method call,
// e.g. foo(1) would have a vector of size one with value of 1 with type i32.

bool isVoid = call->getReturnType()->isVoidTy();
llvm::Value *val = Builder.CreateCall(
call,
args,
isVoid ? "" : "calltmp"
);

i1 到 i32 转换

1
llvm::Value *promo = Builder.CreateZExt(*i, Builder.getInt32Ty(), "zexttmp");

全局字符串

1
2
llvm::GlobalVariable *GS = Builder.CreateGlobalString(s, "globalstring");
llvm::Value *stringConst = Builder.CreateConstGEP2_32(GS->getValueType(), GS, 0, 0, "cast");

Static Single Assignment in LLVM

一般情况下的使用 CreateBrCreateCondBr 的控制流,不需要 Phi 函数, llvm 会自动处理。

对于逻辑运算符,拥有短路性质,例如 c = fun1() || fun2(), 这种情况需要自己处理 Phi 函数。


本博客所有文章除特别声明外,均采用 CC BY-SA 4.0 协议 ,转载请注明出处!