数组溢出

为了简要概述 CBMC 的功能,我们从一个小例子开始:缓冲区溢出问题。 缓冲区是连续分配的内存块,由 C 中的数组或指针表示。用 C 编写的程序不提供对缓冲区的自动边界检查,这意味着程序可以意外或恶意地写入缓冲区。 以下示例是一个完全有效的 C 程序(在某种意义上,编译器编译它没有任何错误):

1
2
3
4
5
int main()
{
int buffer[10];
buffer[20] = 10;
}

但是,对分配的内存区域之外的地址的写访问可能导致意外行为。 特别是,可以利用这些错误来覆盖函数的返回地址,从而能够执行任意用户引起的代码。 CBMC 能够检测到此问题,并报告违反了缓冲区的“上限属性”。 CBMC 能够检查这些下限和上限,即使对于具有动态大小的阵列也是如此。

CBMC 安装

CBMC 支持 windows, linux, max os。在有具体的步骤,

mac os
MacOS 可以点此下载,安装之后,就可以使用 CBMC 了。

Screenshot 2018-07-23 at 23.58.05

CBMC 验证例子

假设我们已经正确安装了 CBMC,跟编译器一样,CBMC 接受 c 程序文件,CBMC 并不是二进制执行,而是符号执行(symbolic execution)。假设我们有以下文件:

1
2
3
4
5
6
// file1.c
int puts(const char *s) { }

int main(int argc, char **argv) {
puts(argv[2]);
}

程序中,main 函数访问了参数数组 argv,如果参数数量少于 3,那么 argv[2] 就越界了,我们现在运行 CBMC。

1
cbmc file1.c --show-properties --bounds-check --pointer-check

两个选项 --bounds-check--pointer-check 指示 CBMC 查找与指针和数组边界相关的错误。 CBMC 将打印所检查的性质列表。 请注意,它列出了一个标有“argv中的对象边界”的属性以及有故障的数组访问的位置。 CBMC在很大程度上决定了需要检查哪些性质,这是通过静态分析来实现的,该分析依赖于计算各种抽象域上的不动点。

我们得到以下结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
CBMC version 5.9 (cbmc-5.9) 64-bit x86_64 macos
Parsing file1.c
Converting
Type-checking file1
Generating GOTO Program
Adding CPROVER library (x86_64)
Removal of function pointers and virtual functions
Generic Property Instrumentation
Running with 8 object bits, 56 offset bits (default)
Property main.pointer_dereference.1:
file file1.c line 4 function main
dereference failure: pointer NULL in argv[(signed long int)2]
!(POINTER_OBJECT(argv) == POINTER_OBJECT(((char **)NULL)))

Property main.pointer_dereference.2:
file file1.c line 4 function main
dereference failure: pointer invalid in argv[(signed long int)2]
!INVALID-POINTER(argv)

Property main.pointer_dereference.3:
file file1.c line 4 function main
dereference failure: deallocated dynamic object in argv[(signed long int)2]
!(POINTER_OBJECT(argv) == POINTER_OBJECT(__CPROVER_deallocated))

Property main.pointer_dereference.4:
file file1.c line 4 function main
dereference failure: dead object in argv[(signed long int)2]
!(POINTER_OBJECT(argv) == POINTER_OBJECT(__CPROVER_dead_object))

Property main.pointer_dereference.5:
file file1.c line 4 function main
dereference failure: pointer outside dynamic object bounds in argv[(signed long int)2]
16l + POINTER_OFFSET(argv) >= 0l && __CPROVER_malloc_size >= 24ul + (unsigned long int)POINTER_OFFSET(argv) || !(POINTER_OBJECT(argv) == POINTER_OBJECT(__CPROVER_malloc_object))

Property main.pointer_dereference.6:
file file1.c line 4 function main
dereference failure: pointer outside object bounds in argv[(signed long int)2]
16l + POINTER_OFFSET(argv) >= 0l && OBJECT_SIZE(argv) >= 24ul + (unsigned long int)POINTER_OFFSET(argv) || DYNAMIC_OBJECT(argv)

事实上,这些性质不不一定跟程序的 bugs 对应,他们只是指明了潜在的错误缺陷,因为抽象解释可能是不精确的。所以,还需要进一步的分析。CBMC 提供了符号执行&模拟(symbolic execution&simulation),本质上是将程序转换为公式,这些公式程序的性质。我们用执行以下的命令:

1
cbmc file1.c --show-vcc --bounds-check --pointer-check

CBMC 执行符号执行,然后输出 验证条件(Verification Conditions)。VC 是一些逻辑公式,如果逻辑是有判定过程的,那么公式就可以求证是否是 valid 的。这些 VC 通常可以用一些 SAT 或者 SMT 求解。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
VERIFICATION CONDITIONS:

file file1.c line 4 function main
dereference failure: pointer outside object bounds in argv[(signed long int)2]
{-1} __CPROVER_dead_object#1 == NULL
{-2} __CPROVER_deallocated#1 == NULL
{-3} __CPROVER_malloc_is_new_array#1 == FALSE
{-4} __CPROVER_malloc_object#1 == NULL
{-5} __CPROVER_malloc_size#1 == 0ul
{-6} __CPROVER_memory_leak#1 == NULL
{-7} __CPROVER_next_thread_id#1 == 0ul
{-8} __CPROVER_pipe_count#1 == 0u
{-9} __CPROVER_rounding_mode!0#1 == 0
{-10} __CPROVER_thread_id!0#1 == 0ul
{-11} __CPROVER_threads_exited#1 == ARRAY_OF(FALSE)
{-12} argc'#0 >= 1
{-13} !(argc'#0 >= 268435457)
{-14} argv'#1 == argv'#0 WITH [argc'#0:=((char *)NULL)]
{-15} argc!0@1#1 == argc'#0
{-16} argv!0@1#1 == argv'
|--------------------------
{1} 8ul * (__CPROVER_size_t)(1 + argc'#0) >= 24ul

我们运行下面的命令,直接求解 VC。

1
cbmc file1.c --bounds-check --pointer-check

执行命令后,得到以下的结果:(CBMC 实际上将 VC 公式转换成 CNF 范式,然后传递给 SAT 求解器。)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Generated 6 VCC(s), 1 remaining after simplification
Passing problem to propositional reduction
converting SSA
Running propositional reduction
Post-processing
Solving with MiniSAT 2.2.1 with simplifier
462 variables, 1031 clauses
SAT checker: instance is SATISFIABLE
Solving with MiniSAT 2.2.1 with simplifier
462 variables, 0 clauses
SAT checker inconsistent: instance is UNSATISFIABLE
Runtime decision procedure: 0.00586755s

** Results:
[main.pointer_dereference.1] dereference failure: pointer NULL in argv[(signed long int)2]: SUCCESS
[main.pointer_dereference.2] dereference failure: pointer invalid in argv[(signed long int)2]: SUCCESS
[main.pointer_dereference.3] dereference failure: deallocated dynamic object in argv[(signed long int)2]: SUCCESS
[main.pointer_dereference.4] dereference failure: dead object in argv[(signed long int)2]: SUCCESS
[main.pointer_dereference.5] dereference failure: pointer outside dynamic object bounds in argv[(signed long int)2]: SUCCESS
[main.pointer_dereference.6] dereference failure: pointer outside object bounds in argv[(signed long int)2]: FAILURE


** 1 of 6 failed (2 iterations)
VERIFICATION FAILED

可以看到,第 6 个 VC 不能求解。我们运行以下命令,CBMC 会给出性质不能满足的 prigram trace。

1
cbmc file1.c --bounds-check --pointer-check --trace

验证入口

上述例子中,程序验证的入口是 main 函数,实际上,CBMC 可以选择任意的函数入口。比如针对以下的程序(file2.c)

1
2
3
4
5
6
7
int array[10];
int sum() {
unsigned i, sum;
sum=0;
for(i=0; i<10; i++)
sum+=array[i];
}

我们在执行时,可以选择入口函数:

1
cbmc file2.c --function sum --bounds-check

验证循环

CBMC 是执行有界模型检查(Bounded Model Checking),所有循环必须具有有限的上限运行时限,以保证找到所有错误。 CBMC可以选择展开循环, 例如,考虑程序 binsearch.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
int binsearch(int x) {
int a[16];
signed low=0, high=16;

while (low < high) {
signed middle = low + ((high-low) >> 1);
if(a[middle] < x)
high = middle;
else if(a[middle] > x)
low = middle + 1;
else // a[middle]==x
return middle;
}
return -1;
}

这个程序中,循环是不会终止的,所有用 cbmc 检查时,我们必须给出循环次数。

1
cbmc binsearch.c --function binsearch --unwind 6 --bounds-check --unwinding-assertions

CBMC 检查选项

CBMC 目前提供了以下的检查选项:

  • –no-assertions ignore user assertions
  • –bounds-check add array bounds checks
  • –div-by-zero-check add division by zero checks
  • –pointer-check add pointer checks
  • –signed-overflow-check add arithmetic over- and underflow checks
  • –unsigned-overflow-check add arithmetic over- and underflow checks
  • –undefined-shift-check add range checks for shift distances
  • –nan-check add floating-point NaN checks
  • –uninitialized-check add checks for uninitialized locals (experimental)
  • –error-label label check that given label is unreachable

http://cprover.diffblue.com/cbmc-user-manual.html