Finding bugs is hard, reverse engineering is hard. Constraint solvers are the heart of many program analysis techniques, and can aid Fuzzing, and software verification.

This post contains a few hands-on experiments with Z3, a high performance theorem prover developed at Microsoft Research by Leonardo de Moura and Nikolaj Bjorner. With KLEE, a Symbolic Execution Engine built on top of the LLVM compiler infrastructure developed by Cristian Cadar, Daniel Dunbar, and Dawson Engler. And, angr, a binary analysis framework developed by the Computer Security Lab at UC Santa Barbara and their associated CTF team, Shellphish.

The motivation behind this post is basically to use it as reminder to myself. I couldn’t keep using google to search for the same things/references over and over. Eventually in the future I’ll post more realistic and advanced use case scenarios for the tools mentioned above, and a few others in this area that I’m currently working on (meaning, trying to learn). However, this post was already too long.

To illustrate the use of Z3 I will mainly use its Python API because is practical and easy to understand. SAT/SMT solvers have many applications. An important application of SMT solvers is symbolic execution for analysis and testing of programs (concolic testing). The goal? Finding security vulnerabilities. That’s exactly my main interest here.

A major problem you may find when you look at this subject is the fact that most of the information you’ll find is too theoretical. However, many security researchers have already been working on this field for a long time. Symbolic execution can definitely help with Reverse Engineering and software analysis as you’ll see if you follow this post.

The main reason this topic is getting in voga again, I believe, is mainly because computers are getting faster, and faster, and software is getting highly complex. Consequently, solving problems with SAT/SMT is more attractive now. If you want to go deep on the topic see the “references” section at the end of the article, you’ll find awesome work on this subject that goes back more than 10 years. Notably, besides academia, as far as I know, Richard ‘@richinseattle’ Johnson was one of the first to start applying these techniques to bug finding while he was at Sourcefire (now Cisco).

SAT/SMT Solvers

I wouldn’t be able to define SAT/SMT better than wikipedia. So here’s the wikipedia definition.

In computer science and mathematical logic, the satisfiability modulo theories (SMT) problem is a decision problem for logical formulas with respect to combinations of background theories expressed in classical first-order logic with equality. Examples of theories typically used in computer science are the theory of real numbers, the theory of integers, and the theories of various data structures such as lists, arrays, bit vectors and so on. SMT can be thought of as a form of the constraint satisfaction problem and thus a certain formalized approach to constraint programming.

An SMT instance is a generalization of a Boolean SAT instance. I’ll try to stay away from the theory and math behind SAT/SMT solvers. Not only because I’m not at the academia anymore (and actually I didn’t really study this subjects while I was there), but also because you’ll be bored. So basically I’ll show a few practical examples of how to use SAT/SMT solvers. I’ll mainly use Z3, but there are a lot more SMT solvers, check this wikipedia page for an eventually complete list.

Before diving into equations (SAT/SMT solvers are ‘solvers’ of huge equations), you must be familiar with computer science logic. This is a prerequisite, so I won’t be talking here about what’s a variable, what’s true (1) and false (0), what’s OR, XOR, AND, and so on. If you feel rusty and you need some refreshing about these concepts wikipedia might be a good starting point. You’ll also need to be familiar with other concepts like functions, connectives (&&, ||, !), asserts (an assertion is always assumed to be true, if an assertion evaluates to false the program must end), and a few others.

So, an SMT problem is a decision problem for formulas expressed using logic that return true or false based in theories of arithmetic, lists, bit vectors, arrays, etc. In other words, an SMT solver is a constraint solver. What is a constraint? A statement that specifies properties of a solution to be found. A good example is… Z3.

How solvers work? Magic? Well, kind of.

  • user specifies a formula that must be satisfied
  • solver attempts to find a solution that SATisfies the formula

If the solver can find a solution, the formula is said to be SATisfiable. If the solver cannot find a solution, the formula is said to be UNSATisfiable.

Z3

Quick Z3 Setup

To get started I recommend you to grab the code from the project’s GitHub official repository.

$ git clone https://github.com/Z3Prover/z3
$ cd z3
$ CXX=clang++ CC=clang python scripts/mk_make.py
$ cd build
$ make
$ sudo make install

After make Z3 Python scripts can already be executed in the build/python directory.

If you prefer to use a docker container just type:

$ docker pull fdiskyou/z3:0.1

Linear equation with 3 variables

As we learnt at primary school, real world problems can be represented as equations. So let’s just use one of those equations we learnt somewhere around the 7th or 8th school year.

We’ll use the following linear equation system with 3 variables, that I grabbed from http://tutorial.math.lamar.edu.

x - 2y + 3z = 7
2x + y + z = 4
-3x + 2y - 2z = -10

How can we solve this using Z3? Easy-peasy. If we use the Python bindings it’s actually easy to read the code and understand what’s going on.

#!/usr/bin/python

from z3 import *

s = Solver()

x = Int('x')
y = Int('y')
z = Int('z')

s.add(x - 2*y + 3*z == 7)
s.add(2*x + y + z == 4)
s.add(-3*x + 2*y - 2*z == -10)

print s.check()
print s.model()

The Solver() function solves a system of constraints. The function Int('x') creates an integer variable in Z3 named x. The example above uses 3 variables x, y, and z, and three constraints. In the example above, the expression x - 2y + 3z = 7 is a Z3 constraint. Z3Py like Python uses = for assignment and the operators <, <=, >, >=, == and != for comparison.

You can try to solve the equation above by hand. You’ll save a lot of time if you use Z3 though.

$ time python sample1.py
sat
[z = 1, x = 2, y = -1]

real    0m0.342s
user    0m0.265s
sys     0m0.036s

The s.check() returns sat, meaning Z3 was able to find the solution. If we change the equation in a way that it won’t have a solution, it will return unsat. Since the equation was SATisfiable (in other words, has a solution) we got a model (s.model()).

I’ve used Int because I only had integers in my equation. Obviously you can also have Real variables, Bool, and so on. Check the z3py tutorial and the Z3 guide for more details.

As I said above, I’m using Python because its practical and very popular among the ‘hacker’ community. However, there’s a standard language for SMT solvers called SMT-LIB. If I go back to the first example and re-write it, it will look like this:

(declare-const x Int)
(declare-const y Int)
(declare-const z Int)

(assert(=(+(- x (* 2 y)) (* 3 z)) 7))
(assert(=(+(+(* 2 x) y) z) 4))
(assert(=(-(-(* 2 y) (* 3 x)) (* 2 z)) -10))

(check-sat)
(get-model)

As you can see a lot more tricky, but still readable. We start by declaring constants, then we add constraints by adding assertions using Polish notation. Then we check the satisfiability (check-sat), and finally we get the interpretation that makes all formulas true (get-model).

If we save the code above in a file, we can run it as shown below.

$ z3 -smt2 sample1.smt
sat
(model
  (define-fun z () Int
    1)
  (define-fun x () Int
    2)
  (define-fun y () Int
    (- 1))
)

Z3 can also read commands from STDIN, just type instead:

$ z3 -smt2 -in

Before we move on, see below a small UNSATisfiable example (in Python again).

#!/usr/bin/python

from z3 import *

s = Solver()

a = Int('a')
b = Int('b')

s.add(a > 0)
s.add(a + 1 == b)
s.add(b < 0)

if s.check() == unsat:
  print("NO SOLUTION!")
else:
  print s.model()

If we run it, we won’t obviously get a solution.

$ python sample3.py
NO SOLUTION!

This was only for demonstration purposes. If you want to fix the formula to make it SATisfiable, go ahead.

The basic examples above were enough to get us started. However, if you want to play with more complex equations/examples I recommend you, again, to look at:

Puzzles

If SAT/SMT solvers can be applied to equations, any kind of puzzle that you can describe as an equation can also be solved with Z3. You’ll find many examples of puzzles solved with Z3. The best compilation that I’m aware of is in the Quick introduction into SAT/SMT solvers and symbolic execution by Dennis Yurichev. You definitely should look at this resource if you want to go deep on the subject.

Another good resource with plenty of solved puzzles can be found at z3-playground, maintained by Axel @0vercl0k Souchet.

The Zebra Puzzle

As we can read on wikipedia, The zebra puzzle is a well-known logic puzzle. Many versions of the puzzle exist, including a version published in Life International magazine on December 17, 1962.

This puzzle is often called Einstein’s Riddle and you’ll be able to find online some slight variations. Try to think how you can translate the problem into a data structure and then start adding your constraints. In my solution I used a matrix of integers.

You can find my solution below, and a different approach on Yurichev’s pdf. Axel Souchet used an approach very close to mine, even though a bit more elegant. Yurichev’s might be the easiest to understand. Try to solve the puzzle yourself and then compare the solutions, there are many ways to solve the puzzle and you’ll learn more that just looking at what others did. After all, this a good way to practice.

#!/usr/bin/env python
'''
My solution for: https://en.wikipedia.org/wiki/Zebra_Puzzle

The following version of the puzzle appeared in Life International in 1962:

0.  There are five houses.
1.  The Englishman lives in the red house.
2.  The Spaniard owns the dog.
3.  Coffee is drunk in the green house.
4.  The Ukrainian drinks tea.
5.  The green house is immediately to the right of the ivory house.
6.  The Old Gold smoker owns snails.
7.  Kools are smoked in the yellow house.
8.  Milk is drunk in the middle house.
9.  The Norwegian lives in the first house.
10. The man who smokes Chesterfields lives in the house next to the man with the fox.
11. Kools are smoked in the house next to the house where the horse is kept.
12. The Lucky Strike smoker drinks orange juice.
13. The Japanese smokes Parliaments.
14. The Norwegian lives next to the blue house.

Now, who drinks water? Who owns the zebra?

In the interest of clarity, it must be added that each of the five houses is painted a different color, and their inhabitants are of different national extractions, own different pets, drink different beverages and smoke different brands of American cigarets [sic]. One other thing: in statement 6, right means your right.
'''

from z3 import *

s = Solver()

zebra = {
         'color' : ('red', 'green', 'ivory', 'yellow', 'blue'),
         'nationality' : ('British', 'Spanish', 'Ukrainian', 'Norwegian', 'Japanese'),
         'pet' : ('dog', 'snail', 'fox', 'horse', 'zebra'),
         'beverage' : ('coffee', 'tea', 'milk', 'juice', 'water'),
         'cigarets' : ('Old Gold', 'Kools', 'Lucky Strike', 'Parliaments', 'Chesterfields')
   }

# assign an integer/index to zebra[name] from 0 to 5
color, nationality, pet, beverage, cigarets = range(5) 

# Create Z3 integer variables for matrix cells
cells = [ [ Int("%s_%d" % (j, i)) for j in zebra ] for i in range(5) ]

# debug
#print(cells)

# Add cell constraints
for y in range(5):
  for x in range(5):
    s.add(cells[x][y] >= 0, cells[x][y] <= 4)

# Add row constraints
for y in range(5):
  s.add(Distinct([cells[x][y] for x in range(5)]))


# feed the solver with the hints above

# The Englishman lives in the red house
s.add( Or( [ And(cells[i][nationality] == zebra['nationality'].index('British'), cells[i][color] == zebra['color'].index('red') ) for i in range(5) ] ) )

# The Spaniard owns the dog
s.add( Or( [ And(cells[i][nationality] == zebra['nationality'].index('Spanish'), cells[i][pet] == zebra['pet'].index('dog') ) for i in range(5) ] ) )

# Coffee is drunk in the green house
s.add( Or( [ And(cells[i][beverage] == zebra['beverage'].index('coffee'), cells[i][color] == zebra['color'].index('green') ) for i in range(5) ] ) )

# The Ukrainian drinks tea
s.add( Or( [ And(cells[i][nationality] == zebra['nationality'].index('Ukrainian'), cells[i][beverage] == zebra['beverage'].index('tea') ) for i in range(5) ] ) )

# The green house is immediately to the right of the ivory house
s.add( Or( [ And(cells[i][color] == zebra['color'].index('green'), cells[i - 1][color] == zebra['color'].index('ivory') ) for i in range(5) ] ) )

# The Old Gold smoker owns snails
s.add( Or( [ And(cells[i][cigarets] == zebra['cigarets'].index('Old Gold'), cells[i][pet] == zebra['pet'].index('snail') ) for i in range(5) ] ) ) 

# Kools are smoked in the yellow house
s.add( Or( [ And(cells[i][cigarets] == zebra['cigarets'].index('Kools'), cells[i][color] == zebra['color'].index('yellow') ) for i in range(5) ] ) )

# Milk is drunk in the middle house
s.add(cells[2][beverage] == zebra['beverage'].index('milk'))

# The Norwegian lives in the first house
s.add(cells[0][nationality] == zebra['nationality'].index('Norwegian'))

# The man who smokes Chesterfields lives in the house next to the man with the fox
s.add( Or(
  And(cells[0][cigarets] == zebra['cigarets'].index('Chesterfields'), cells[1][pet] == zebra['pet'].index('fox')),
  And(cells[1][cigarets] == zebra['cigarets'].index('Chesterfields'), Or(cells[0][pet] == zebra['pet'].index('fox'), cells[2][pet] == zebra['pet'].index('fox'))),
  And(cells[2][cigarets] == zebra['cigarets'].index('Chesterfields'), Or(cells[1][pet] == zebra['pet'].index('fox'), cells[3][pet] == zebra['pet'].index('fox'))),
  And(cells[3][cigarets] == zebra['cigarets'].index('Chesterfields'), Or(cells[2][pet] == zebra['pet'].index('fox'), cells[4][pet] == zebra['pet'].index('fox'))),
  And(cells[4][cigarets] == zebra['cigarets'].index('Chesterfields'), cells[3][pet] == zebra['pet'].index('fox')),
))

# Kools are smoked in the house next to the house where the horse is kept
s.add( Or(
  And(cells[0][cigarets] == zebra['cigarets'].index('Kools'), cells[1][pet] == zebra['pet'].index('horse')),
  And(cells[1][cigarets] == zebra['cigarets'].index('Kools'), Or(cells[0][pet] == zebra['pet'].index('horse'), cells[2][pet] == zebra['pet'].index('horse'))),
  And(cells[2][cigarets] == zebra['cigarets'].index('Kools'), Or(cells[1][pet] == zebra['pet'].index('horse'), cells[3][pet] == zebra['pet'].index('horse'))),
  And(cells[3][cigarets] == zebra['cigarets'].index('Kools'), Or(cells[2][pet] == zebra['pet'].index('horse'), cells[4][pet] == zebra['pet'].index('horse'))),
  And(cells[4][cigarets] == zebra['cigarets'].index('Kools'), cells[3][pet] == zebra['pet'].index('horse')),
))
 
# The Lucky Strike smoker drinks orange juice
s.add( Or( [ And(cells[i][cigarets] == zebra['cigarets'].index('Lucky Strike'), cells[i][beverage] == zebra['beverage'].index('juice') ) for i in range(5) ] ) )

# The Japanese smokes Parliaments
s.add( Or( [ And(cells[i][nationality] == zebra['nationality'].index('Japanese'), cells[i][cigarets] == zebra['cigarets'].index('Parliaments') ) for i in range(5) ] ) )

# The Norwegian lives next to the blue house
s.add( Or(
  And(cells[0][nationality] == zebra['nationality'].index('Norwegian'), cells[1][color] == zebra['color'].index('blue')),
  And(cells[1][nationality] == zebra['nationality'].index('Norwegian'), Or(cells[0][color] == zebra['color'].index('blue'), cells[2][color] == zebra['color'].index('blue'))),
  And(cells[2][nationality] == zebra['nationality'].index('Norwegian'), Or(cells[1][color] == zebra['color'].index('blue'), cells[3][color] == zebra['color'].index('blue'))),
  And(cells[3][nationality] == zebra['nationality'].index('Norwegian'), Or(cells[2][color] == zebra['color'].index('blue'), cells[4][color] == zebra['color'].index('blue'))),
  And(cells[4][nationality] == zebra['nationality'].index('Norwegian'), cells[3][color] == zebra['color'].index('blue'))
))

if s.check() == unsat:
  print('Failed to solve. Most likely some of the constraints are wrong.') 

m = s.model()
solution = [[m[case].as_long() for case in line] for line in cells]

for i in range(5):
  print('House: %s, Nationality: %s, Beverage: %s, Smoke: %s, Pet: %s' % (
    zebra['color'][solution[i][color]],
    zebra['nationality'][solution[i][nationality]],
    zebra['beverage'][solution[i][beverage]],
    zebra['cigarets'][solution[i][cigarets]],
    zebra['pet'][solution[i][pet]]
  ))

Compare the results to the ones at wikipedia even if you get a SAT result. Your constraints might be subtly “wrong” and the solution will be different.

$ time python zebra_puzzle.py
House: yellow, Nationality: Norwegian, Beverage: water, Smoke: Kools, Pet: fox
House: blue, Nationality: Ukrainian, Beverage: tea, Smoke: Chesterfields, Pet: horse
House: red, Nationality: British, Beverage: milk, Smoke: Old Gold, Pet: snail
House: ivory, Nationality: Spanish, Beverage: juice, Smoke: Lucky Strike, Pet: dog
House: green, Nationality: Japanese, Beverage: coffee, Smoke: Parliaments, Pet: zebra

real    0m0.469s
user    0m0.299s
sys     0m0.052s

Find the Serial Number

Below is just a very simple example of a, let’s say, “program” that validates a Serial Number. Nowadays, serial numbers validation routines are way more complex. But don’t get too surprised if you are still able to find software that doesn’t do any online validation and have all the logic implemented in the binary itself.

The idea here is just to show how Z3 can be used to generate a valid serial number given a known checking algorithm. This is not often the case, at most we might know what’s the format of the Serial Number that a program expects. For example, the program tells you the SN is in the form XXXXX-XXXXX. In theory, based on the information we have we can brute force the SN. However, usually the program doesn’t accept only one single combination of CHARs as a valid SN, but multiple valid combinations that must pass certain validation routines.

So what happens most of the times, including in CTFs (we’ll get there after this example), is we reverse the program, and we try to locate the code that’s responsible to validate our serial number. From there we try to figure what are the validations performed. In other words, what are the constraints. Most of the times the algorithm is complex and trying to solve it manualy is quite hard. That’s when an SMT solver can come in handy.

We’ll use a simple C program I wrote. The validation checks are very easy to understand, that’s the idea. After this example we’ll see an example from a CTF where we only have access to the binary itself so, we’ll also need to do the reversing part.

Here’s the code.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

void usage(char *progname)
{
  printf("Usage: %s <serial number>\n", progname);
  printf("Serial Number is the in the form XXXXX-XXXXX\n");
  exit(0);
}

int is_digit(int d)
{
  if(d >= 48 && d <= 57)
    return 1;

  return 0;
}

int is_char(int d)
{
  if(d >= 65 && d <= 90)
    return 1;

  return 0;
}

int validate_sn(char *sn)
{
  // constraint 0: the whole SN has 11 chars
  if (strlen(sn) != 11)
    return 0;

  // constraint 1: index below must be equal to '-'
  if(sn[5] != 45)
    return 0;

  // constraint 2: the first char of first XXXXX sequence must be 1 and first char of second XXXXX sequence must be 5 
  if(sn[0] != 49)
    return 0;
  if(sn[6] != 53)
    return 0;

  // constraint 3: 2nd char is a digit, 7th char is digit
  if(is_digit(sn[1]) != 1)
    return 0;
  if(is_digit(sn[7]) != 1)
    return 0;

  // constraint 4: 3rd and 8th chars are uper case letters
  if(is_char(sn[2]) != 1)
    return 0; 
  if(is_char(sn[8]) != 1)
    return 0;

  // constraint 5: last 2 chars of every XXXXX are numbers
  if(is_digit(sn[3]) != 1)
    return 0;
  if(is_digit(sn[4]) != 1)
    return 0;
  if(is_digit(sn[9]) != 1)
    return 0;
  if(is_digit(sn[10]) != 1)
    return 0;

  // constraint 6: the last 2 chars of every XXXXX sequence must be the same
  if(sn[3] != sn[4])
    return 0;
  if(sn[9] != sn[10])
    return 0;

  return 1;
}

int main(int argc, char *argv[])
{
  char *serial = argv[1];

  if (argc < 2)
    usage(argv[0]);

  if (validate_sn(serial))
    printf("Thank You. Your license is now active.\n");
  else
    printf("Invalid Serial Number. Please try again.\n");

  return 0;
}

The constraints are clearly identified in the source code. So let’s see if we can “ask” Z3 to give us a valid serial number. See below.

#!/usr/bin/python
import sys
from z3 import *

s = Solver()

sn = IntVector('sn', 11)

def is_valid(x):
  return Or(And(x >= 0x41, x <= 0x5a), And(x >= 0x30, x <= 0x39))

for i in range(11):
  if (i != 5):
    s.add(is_valid(sn[i]))

# constraint 1
s.add(sn[5] == 0x2d)

# constraint 2
s.add(sn[0] == 0x31)
s.add(sn[6] == 0x35)

# constraint 3
s.add(sn[1] >= 0x30, sn[1] <= 0x39)
s.add(sn[7] >= 0x30, sn[7] <= 0x39)

# constraint 4
s.add(sn[2] >= 0x41, sn[2] <= 0x5a)
s.add(sn[8] >= 0x41, sn[8] <= 0x5a)

# constraint 5
s.add(sn[3] >= 0x30, sn[3] <= 0x39)
s.add(sn[4] >= 0x30, sn[4] <= 0x39)
s.add(sn[9] >= 0x30, sn[9] <= 0x39)
s.add(sn[10] >= 0x30, sn[10] <= 0x39)

# constraint 6
s.add(sn[3] == sn[4])
s.add(sn[9] == sn[10])

if s.check() == unsat:
  print "UNSAT"
  sys.exit()

m = s.model()

for x in m:
  print x, chr(m[x].as_long())

And if we run it.

$ time python sn.py | sort -V
sn__0 1
sn__1 0
sn__2 A
sn__3 0
sn__4 0
sn__5 -
sn__6 5
sn__7 0
sn__8 A
sn__9 0
sn__10 0

real    0m0.332s
user    0m0.283s
sys     0m0.032s

We see that it found indeed a valid SN number according to the constraints we have defined. If we go back, and we test our serial_validation binary with the serial number that Z3 found for us, we get:

$ gcc -o serial_validation serial_validation.c
$ ./serial_validation
Usage: ./serial_validation <serial number>
Serial Number is the in the form XXXXX-XXXXX
$ ./serial_validation 10A00-50A00
Thank You. Your license is now active.

This a very contrived example. Hopefully, you tried to solve it and you (didn’t) struggled a bit and were forced to learn at least a little more.

Before we talk about symbolic execution, let’s try to solve a CTF challenge where we also have to find a valid serial number. Well, find a flag. But, you know what I mean.

CTF challenge (get the flag)

We will try to solve a challenge from the Codegate CTF 2018 Preliminary Round, more precisely the “RedVelvet” challenge (Reversing category).

We start by looking at the binary and run it.

$ file RedVelvet
RedVelvet: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=84e7ef91c33878cf9eefc00a7a450895aa573494, not stripped
$ ./RedVelvet
Your flag : aaa

The *nix strings utility only showed the interesting ASCII strings below. If you want to see the whole output check here.

HAPPINESS:)
Your flag :
flag : {" %s "}

So I opened the binary in IDA and looked at the main function.

It calls printf with "Your flag : ", calls _fgets to read our input, and then calls a bunch of functions always with the “same” parameters. These functions (func1, func2, funcX, …, func15) are for sure validating the flag.

If I look inside func1 for example I can see it performs manipulations and comparisons based on the user input. If everything goes well the execution will continue, otherwise the execution will stop and the program exits.

This is when IDA Pro Hex-Rays’ decompiler comes in handy. I pressed F5 to see the pseudo-code.

Much better. If the checks pass, the string HAPPINESS:) is printed and the code flows to the next function. If I look at func2, the story is the same.

And the story goes on for all the other 13 functions. I also looked at the strings, even though as expected the solution is not there.

If I follow the highlighted string above, I will end in a basic block near the end of the main function which will print the flag if I have provided the correct input.

This looks perfect for Z3, or even better for a symbolic execution framework. Let’s stick with Z3 for now.

Well, we have everything we need to solve the challenge (I hope). So I’ll copy all the pseudo code from the functions (from func1, to func15) and translate it to Z3 constraints. I’ll start by simply replicating the functions in Python, which receives 2 arguments each (some will receive 3). I’ll look at the remaining work after that.

"""
int __fastcall func1(char a1, char a2)
{
  if ( a1 * 2 * (char)(a2 ^ a1) - a2 != 10858 )
    exit(1);
  if ( a1 <= 85 || a1 > 95 || a2 <= 96 || a2 > 111 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func1(a1, a2):
  s.add(a1 * 2 * (a2 ^ a1) - a2 == 10858)
  s.add(a1 > 85)
  s.add(a1 <= 95)
  s.add(a2 > 96 )
  s.add(a2 <= 111)

Hopefully, that’s it. Note that I’m reversing the logic because if the constraints above evaluate to true the program exits. And we’ll be doing the same for another 14 functions. Whoever told you Reversing is fun, lied to you. It’s actually quite boring, and persistence and patience are ‘the’ master skill.

"""
int __fastcall func2(char a1, char a2)
{
  if ( a1 % a2 != 7 )
    exit(1);
  if ( a2 <= 90 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func2(a1, a2):
  s.add(a1 % a2 == 7)
  s.add(a2 > 90)


"""
int __fastcall func3(char a1, char a2)
{
  if ( a1 / a2 + (char)(a2 ^ a1) != 21 || a1 > 99 || a2 > 119 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func3(a1, a2):
  s.add(a1 / a2 + (a2 ^ a1) == 21)
  s.add(a1 <= 99 )
  s.add(a2 <= 119)


"""
int __fastcall func4(char a1, char a2)
{
  signed __int64 v2; // rtt@1

  LODWORD(v2) = (char)(a2 ^ a1 ^ a2);
  HIDWORD(v2) = (unsigned __int64)(char)(a2 ^ a1 ^ a2) >> 32;
  if ( (unsigned int)(v2 % a2) + a1 != 137 || a1 <= 115 || a2 > 99 || a2 != 95 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func4(a1, a2):
  v2 = a1 << 32 | a1
  s.add((v2 % a2) + a1 == 137)
  s.add(a1 > 115)
  s.add(a2 <= 99)
  s.add(a2 == 95)


"""
int __fastcall func5(char a1, char a2)
{
  if ( ((a2 + a1) ^ (char)(a1 ^ a2 ^ a1)) != 225 || a1 <= 90 || a2 > 89 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func5(a1, a2):
  s.add((a1 + a2) ^ a2 == 225)
  s.add(a1 > 90 )
  s.add(a2 <= 89 )


"""
int __fastcall func6(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 85 || a2 <= 110 || a3 <= 115 || ((a2 + a3) ^ (a1 + a2)) != 44 || (a2 + a3) % a1 + a2 != 161 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func6(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add(a1 > 85)
  s.add(a2 > 110)
  s.add(a3 > 115)
  s.add(((a2 + a3) ^ (a1 + a2)) == 44)
  s.add(URem(a2 + a3, a1) + a2 == 161 ) # Use the function URem() for unsigned remainder, and SRem() for signed remainder. https://z3prover.github.io/api/html/classz3py_1_1_bit_vec_ref.html


"""
int __fastcall func7(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a1 > 119 || a2 <= 90 || a3 > 89 || ((a1 + a3) ^ (a2 + a3)) != 122 || (a1 + a3) % a2 + a3 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func7(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a1 <= 119)
  s.add(a2 > 90)
  s.add(a3 <= 89)
  s.add(((a1 + a3) ^ (a2 + a3)) == 122 )
  s.add(URem(a1 + a3,  a2) + a3 == 101)


"""
int __fastcall func8(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a3 > 114 || (a1 + a2) / a3 * a2 != 97 || (a3 ^ (a1 - a2)) * a2 != -10088 || a3 > 114 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func8(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add( a3 <= 114)
  s.add( UDiv(a1 + a2, a3) * a2 == 97 ) # Use the function UDiv() for unsigned division. https://z3prover.github.io/api/html/classz3py_1_1_bit_vec_ref.html
  s.add( a3 ^ (a1 - a2) == -8*13)


"""
int __fastcall func9(char a1, char a2, char a3)
{
  if ( a1 != a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a3 > 99 || a3 + a1 * (a3 - a2) - a1 != -1443 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func9(a1, a2, a3):
  s.add(a1 == a2)
  s.add(a2 >= a3)
  s.add(a3 <= 99)
  s.add(a3 + a1 * (a3 - a2) - a1 == -1443)


"""
int __fastcall func10(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 * (a1 + a3 + 1) - a3 != 15514 || a2 <= 90 || a2 > 99 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func10(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a2 * (a1 + a3 + 1) - a3 == 15514)
  s.add(a2 > 90)
  s.add(a2 <= 99)


"""
int __fastcall func11(char a1, char a2, char a3)
{
  if ( a2 < a1 )
    exit(1);
  if ( a1 < a3 )
    exit(1);
  if ( a2 <= 100 || a2 > 104 || a1 + (a2 ^ (a2 - a3)) - a3 != 70 || (a2 + a3) / a1 + a1 != 68 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func11(a1, a2, a3):
  s.add(a2 >= a1)
  s.add(a1 >= a3)
  s.add(a2 > 100)
  s.add(a2 <= 104)
  s.add(a1 + (a2 ^ (a2 - a3)) - a3 == 70)
  s.add(UDiv(a2 + a3, a1) + a1 == 68 )


"""
int __fastcall func12(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 59 || a3 > 44 || a1 + (a2 ^ (a3 + a2)) - a3 != 111 || (a2 ^ (a2 - a3)) + a2 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func12(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a2 <= 59)
  s.add(a3 <= 44)
  s.add(a1 + (a2 ^ (a3 + a2)) - a3 == 111)
  s.add((a2 ^ (a2 - a3)) + a2 == 101 )


"""
int __fastcall func13(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 40 || a2 <= 90 || a3 > 109 || a3 + (a2 ^ (a3 + a1)) - a1 != 269 || (a3 ^ (a2 - a1)) + a2 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func13(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add(a1 > 40 )
  s.add(a2 > 90 )
  s.add(a3 <= 109)
  s.add(a3 + (a2 ^ (a3 + a1)) - a1 == 269)
  s.add((a3 ^ (a2 - a1)) + a2 == 185)


"""
int __fastcall func14(char a1, char a2, char a3)
{
  if ( a1 < a3 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 99 || a3 <= 90 || a1 + (a2 ^ (a2 + a1)) - a3 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func14(a1, a2, a3):
  s.add(a1 >= a3)
  s.add(a2 >= a3)
  s.add(a2 <= 99)
  s.add(a3 > 90)
  s.add(a1 + (a2 ^ (a2 + a1)) - a3 == 185 )


"""
int __fastcall func15(char a1, char a2, char a3)
{
  if ( a2 < a3 )
    exit(1);
  if ( a2 < a1 )
    exit(1);
  if ( a3 <= 95 || a2 > 109 || ((a2 - a1) * a2 ^ a3) - a1 != 1214 || ((a3 - a2) * a3 ^ a1) + a2 != -1034 )
    exit(1);
  return puts("HAPPINESS:)");
}
"""
def func15(a1, a2, a3):
  s.add(a2 >= a3)
  s.add(a2 >= a1)
  s.add(a3 > 95)
  s.add( a2 <= 109)
  s.add(((a2 - a1) * a2 ^ a3) - a1 == 1214)
  s.add(((a3 - a2) * a3 ^ a1) + a2 == -1034)

Ok, that was the easiest part.

If we decompile the main function, we can see the following code.

int __cdecl main(int argc, const char **argv, const char **envp)
{
  char v3; // bl@0
  __int64 v4; // rax@1
  __int64 v5; // rcx@1
  __int64 v6; // rsi@2
  size_t v7; // rax@2
  char v9; // t1@8
  bool v10; // zf@8
  signed int i; // [sp+8h] [bp-1B8h]@2
  char v12; // [sp+50h] [bp-170h]@2
  char s; // [sp+C0h] [bp-100h]@1
  char v14; // [sp+C1h] [bp-FFh]@1
  char v15; // [sp+C2h] [bp-FEh]@1
  char v16; // [sp+C3h] [bp-FDh]@1
  char v17; // [sp+C4h] [bp-FCh]@1
  char v18; // [sp+C5h] [bp-FBh]@1
  char v19; // [sp+C6h] [bp-FAh]@1
  char v20; // [sp+C7h] [bp-F9h]@1
  char v21; // [sp+C8h] [bp-F8h]@1
  char v22; // [sp+C9h] [bp-F7h]@1
  char v23; // [sp+CAh] [bp-F6h]@1
  char v24; // [sp+CBh] [bp-F5h]@1
  char v25; // [sp+CCh] [bp-F4h]@1
  char v26; // [sp+CDh] [bp-F3h]@1
  char v27; // [sp+CEh] [bp-F2h]@2
  char v28; // [sp+CFh] [bp-F1h]@2
  char v29; // [sp+D0h] [bp-F0h]@2
  char v30; // [sp+D1h] [bp-EFh]@2
  char v31; // [sp+D2h] [bp-EEh]@2
  char v32; // [sp+D3h] [bp-EDh]@2
  char v33; // [sp+D4h] [bp-ECh]@2
  char v34; // [sp+D5h] [bp-EBh]@2
  char v35; // [sp+D6h] [bp-EAh]@2
  char v36; // [sp+D7h] [bp-E9h]@2
  char v37; // [sp+D8h] [bp-E8h]@2
  char v38; // [sp+D9h] [bp-E7h]@2
  char v39[32]; // [sp+E0h] [bp-E0h]@2
  char s1[80]; // [sp+100h] [bp-C0h]@3
  char s2[8]; // [sp+150h] [bp-70h]@1
  __int64 v42; // [sp+158h] [bp-68h]@1
  __int64 v43; // [sp+160h] [bp-60h]@1
  __int64 v44; // [sp+168h] [bp-58h]@1
  __int64 v45; // [sp+170h] [bp-50h]@1
  __int64 v46; // [sp+178h] [bp-48h]@1
  __int64 v47; // [sp+180h] [bp-40h]@1
  __int64 v48; // [sp+188h] [bp-38h]@1
  __int64 v49; // [sp+190h] [bp-30h]@1
  __int64 v50; // [sp+198h] [bp-28h]@1
  __int64 v51; // [sp+1A0h] [bp-20h]@1
  __int64 v52; // [sp+1A8h] [bp-18h]@1
  int v53; // [sp+1B0h] [bp-10h]@1
  __int64 v54; // [sp+1B8h] [bp-8h]@1

  v54 = *MK_FP(__FS__, 40LL);
  *(_QWORD *)s2 = 3905859155515433264LL;
  v42 = 3990529441497888818LL;
  v43 = 7017565014431380534LL;
  v44 = 3977633058323522358LL;
  v45 = 7364290724313116725LL;
  v46 = 3991705742141175652LL;
  v47 = 7363494672811320633LL;
  v48 = 3847534474596595814LL;
  v49 = 0LL;
  v50 = 0LL;
  v51 = 0LL;
  v52 = 0LL;
  v53 = 0;
  printf((const char *)&loc_4016CF + 1, argv, envp);
  fgets(&s, 27, edata);
  func1(s, v14);
  func2(v14, v15);
  func3(v15, v16);
  func4(v16, v17);
  func5(v17, v18);
  func6(v18, v19, v20);
  func7(v20, v21, v22);
  func8(v22, v23, v24);
  func9(v24, v25, v26);
  v4 = ptrace(0, 0LL, 1LL, 0LL);
  if ( v4 == -1 )
  {
    v9 = *(_BYTE *)v5;
    v10 = v3 + *(_BYTE *)(v5 + 111) == 0;
    *(_BYTE *)(v5 + 111) += v3;
    JUMPOUT(!v10, &unk_401746);
    v6C &= BYTE1(v4);
    JUMPOUT(*(_QWORD *)"ag : ");
  }
  func10(v26, v27, v28);
  func11(v28, v29, v30);
  func12(v30, v31, v32);
  func13(v32, v33, v34);
  func14(v34, v35, v36);
  v6 = (unsigned int)v37;
  func15(v36, v37, v38);
  SHA256_Init(&v12, v6);
  v7 = strlen(&s);
  SHA256_Update(&v12, &s, v7);
  SHA256_Final(v39, &v12);
  for ( i = 0; i <= 31; ++i )
    sprintf(&s1[2 * i], "%02x", (unsigned __int8)v39[i]);
  if ( strcmp(s1, s2) )
    exit(1);
  printf("flag : {\" %s \"}\n", &s);
  return 0;
}

If we look at the line where our input is read, we know that the flag is supposed to have at maximum 27 chars (including the final null terminator).

fgets(&s, 27, edata);

It is also important to look at which chars are passed to each function. We saw that some funcX functions receive 2 chars, other 3 chars. If we look at the pseudo code of the function main, we can also figure which chars are passed to each function.

So, anyway I need to define which chars I will accept as valid. I’ll consider the following chars only. If for some reason I can’t get a solution, I’ll expand it.

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_!@#$%^&*?:)(

I add the following to my Z3 python script.

def is_valid(x):
  return Or(And(x >= 65, x <= 90), And(x >= 97, x <= 122), And(x >= 48, x <= 57), And(x == 95), And(x == 33), And(x == 64), And(x == 35), And(x == 36), And(x == 37), And(x == 94), And(x == 38), And(x == 42), And(x == 63), And(x == 58), And(x == 40), And(x == 41))

x = [ BitVec('a%i' % i, 32) for i in range(26) ]

for i in range(26):
  s.add(is_valid(x[i]))

As you can see above I create a BitVec and I initialize it. With size 26 because we saw the input string would take 26 chars above (excluding the null char terminator of C strings).

Based on the logic I got from the main function pseudo code I add the following to my Z3 python script.

func1(x[0], x[1])
func2(x[1], x[2])
func3(x[2], x[3])
func4(x[3], x[4])
func5(x[4], x[5])
func6(x[5], x[6], x[7])
func7(x[7], x[8], x[9])
func8(x[9], x[10], x[11])
func9(x[11], x[12], x[13])
func10(x[13], x[14], x[15])
func11(x[15], x[16], x[17])
func12(x[17], x[18], x[19])
func13(x[19], x[20], x[21])
func14(x[21], x[22], x[23])
func15(x[23], x[24], x[25])

And I should be ready to go. I only need to print the “solution”.

if (s.check() == sat):
  solution = s.model()
  flag = ""
  for n in x:
      char = solution[n].as_long()
      flag += chr(char)
  print flag
else:
  print "UNSAT: go back and check your constraints"

If I run it.

$ time python red_velvet.py
What_You_Wanna_Be?:)_lc_la

real    0m0.878s
user    0m0.845s
sys     0m0.032s

Wow, apparently it works. It took me close to an hour to put this together but the execution is really fast. Let’s validate the solution.

$ ./RedVelvet
Your flag : What_You_Wanna_Be?:)_lc_la
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)

Hmm, we have 15 HAPPINESS:) strings printed but apparently the solution is not the one expected. Why? If we look at the pseudo code of the main function we’ll see that after all the validations a SHA256 hash is computed and the result apparently is different. Don’t panic. We know that solvers can give us more than one valid solution.

What we need to do is ask Z3 to give as another satisfying model. If we don’t want to find all possible solutions, and we are just feeling lazy we could do something like this.

while (s.check() == sat):
  solution = s.model()
  flag = ""
  for n in x:
      char = solution[n].as_long()
      flag += chr(char)
  print flag
  s.add(Or(solution != s.model()))
else:
  print "UNSAT: go back and check your constraints"

I’m not using any kind of finite sorts and this is not the proper way to do it. But I can run it and just get another possible solution.

$ python red_velvet.py
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
^C

However, what we should do is add a new constraint that blocks the model returned by Z3. The solution below still has some limitations but should work for us.

while s.check() == sat:
  solution = s.model()
  block = []
  for d in solution:
    c = d()
    block.append(c != solution[d])
    flag = ""
    for n in x:
      char = solution[n].as_long()
      flag += chr(char)
    print flag
  s.add(Or(block))

If we run it now.

$ time python red_velvet.py
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_lc_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lb_la

real    0m2.200s
user    0m2.131s
sys     0m0.060s

Ok, let’s sort this.

$ python red_velvet.py | sort -u
What_You_Wanna_Be?:)_la_la
What_You_Wanna_Be?:)_lb_la
What_You_Wanna_Be?:)_lc_la

Apparently we have 3 possible solutions (actually more if we expand the accepted charset). Let’s try them and see if we are luckier this time, and we don’t get stopped by the hash validation routine.

$ ./RedVelvet
Your flag : What_You_Wanna_Be?:)_la_la
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
flag : {" What_You_Wanna_Be?:)_la_la "}

Cool. So What_You_Wanna_Be?:)_la_la is the solution, and the solver gave us 3 possible solutions. Keep this in mind.

The final version of the script is below.

from z3 import *

def func1(a1, a2):
  s.add(a1 * 2 * (a2 ^ a1) - a2 == 10858)
  s.add(a1 > 85)
  s.add(a1 <= 95)
  s.add(a2 > 96 )
  s.add(a2 <= 111)

def func2(a1, a2):
  s.add(a1 % a2 == 7)
  s.add(a2 > 90)

def func3(a1, a2):
  s.add(a1 / a2 + (a2 ^ a1) == 21)
  s.add(a1 <= 99 )
  s.add(a2 <= 119)

def func4(a1, a2):
  v2 = a1 << 32 | a1
  s.add((v2 % a2) + a1 == 137)
  s.add(a1 > 115)
  s.add(a2 <= 99)
  s.add(a2 == 95)

def func5(a1, a2):
  s.add((a1 + a2) ^ a2 == 225)
  s.add(a1 > 90 )
  s.add(a2 <= 89 )

def func6(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add(a1 > 85)
  s.add(a2 > 110)
  s.add(a3 > 115)
  s.add(((a2 + a3) ^ (a1 + a2)) == 44)
  s.add(URem(a2 + a3, a1) + a2 == 161 ) # Use the function URem() for unsigned remainder, and SRem() for signed remainder. https://z3prover.github.io/api/html/classz3py_1_1_bit_vec_ref.html

def func7(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a1 <= 119)
  s.add(a2 > 90)
  s.add(a3 <= 89)
  s.add(((a1 + a3) ^ (a2 + a3)) == 122 )
  s.add(URem(a1 + a3,  a2) + a3 == 101)

def func8(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add( a3 <= 114)
  s.add( UDiv(a1 + a2, a3) * a2 == 97 ) # Use the function UDiv() for unsigned division. https://z3prover.github.io/api/html/classz3py_1_1_bit_vec_ref.html
  s.add( a3 ^ (a1 - a2) == -8*13)

def func9(a1, a2, a3):
  s.add(a1 == a2)
  s.add(a2 >= a3)
  s.add(a3 <= 99)
  s.add(a3 + a1 * (a3 - a2) - a1 == -1443)

def func10(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a2 * (a1 + a3 + 1) - a3 == 15514)
  s.add(a2 > 90)
  s.add(a2 <= 99)

def func11(a1, a2, a3):
  s.add(a2 >= a1)
  s.add(a1 >= a3)
  s.add(a2 > 100)
  s.add(a2 <= 104)
  s.add(a1 + (a2 ^ (a2 - a3)) - a3 == 70)
  s.add(UDiv(a2 + a3, a1) + a1 == 68 )

def func12(a1, a2, a3):
  s.add(a1 >= a2)
  s.add(a2 >= a3)
  s.add(a2 <= 59)
  s.add(a3 <= 44)
  s.add(a1 + (a2 ^ (a3 + a2)) - a3 == 111)
  s.add((a2 ^ (a2 - a3)) + a2 == 101 )

def func13(a1, a2, a3):
  s.add(a1 <= a2)
  s.add(a2 <= a3)
  s.add(a1 > 40 )
  s.add(a2 > 90 )
  s.add(a3 <= 109)
  s.add(a3 + (a2 ^ (a3 + a1)) - a1 == 269)
  s.add((a3 ^ (a2 - a1)) + a2 == 185)

def func14(a1, a2, a3):
  s.add(a1 >= a3)
  s.add(a2 >= a3)
  s.add(a2 <= 99)
  s.add(a3 > 90)
  s.add(a1 + (a2 ^ (a2 + a1)) - a3 == 185 )

def func15(a1, a2, a3):
  s.add(a2 >= a3)
  s.add(a2 >= a1)
  s.add(a3 > 95)
  s.add( a2 <= 109)
  s.add(((a2 - a1) * a2 ^ a3) - a1 == 1214)
  s.add(((a3 - a2) * a3 ^ a1) + a2 == -1034)

def is_valid(x):
  return Or(And(x >= 65, x <= 90), And(x >= 97, x <= 122), And(x >= 48, x <= 57), And(x == 95), And(x == 33), And(x == 64), And(x == 35), And(x == 36), And(x == 37), And(x == 94), And(x == 38), And(x == 42), And(x == 63), And(x == 58), And(x == 40), And(x == 41))

s = Solver()

x = [ BitVec('a%i' % i, 32) for i in range(26) ]

for i in range(26):
  s.add(is_valid(x[i]))

func1(x[0], x[1])
func2(x[1], x[2])
func3(x[2], x[3])
func4(x[3], x[4])
func5(x[4], x[5])
func6(x[5], x[6], x[7])
func7(x[7], x[8], x[9])
func8(x[9], x[10], x[11])
func9(x[11], x[12], x[13])
func10(x[13], x[14], x[15])
func11(x[15], x[16], x[17])
func12(x[17], x[18], x[19])
func13(x[19], x[20], x[21])
func14(x[21], x[22], x[23])
func15(x[23], x[24], x[25])

while s.check() == sat:
  solution = s.model()
  block = []
  for d in solution:
    c = d()
    block.append(c != solution[d])
    flag = ""
    for n in x:
      char = solution[n].as_long()
      flag += chr(char)
    print flag
  s.add(Or(block))

Our job was considerably easier than a real world situation because the binary was easy to reverse, since there was no other functionality in itself besides the flag validation functions. Anyway, Z3 is indeed a very good tool to keep around if you need to find what some binary is doing and you don’t feel like spending too much time in front of a debugger.

Symbolic Execution

Symbolic execution is a program analysis technique which provides the ability to automatically enumerate the feasible paths of a program. This technique has received more and more attention, as mentioned before, due to the increased performance in computers but also due to improvements to the constraint solvers on which the technique relies.

Instead of a program run with concrete input, symbolic execution runs the program with symbolic input. Each input will then be represented by a placeholder for an initially non contrained value. As the program runs, symbolic execution will keep track on how the variables depend on the symbolic input.

When a branch that depends on symbolic input, for example an if statement, is found a constraint solver is used to check which branches are feasible. So, this means that actually we can use, for example KLEE, to solve our linear equation and Zebra puzzle. This might not be a smart use of KLEE but will allow us to play a bit with it.

The symbolic execution process is well described by Daniel Liew in his recent published thesis on “Symbolic execution of verification languages and floating-point code”.

Instead of running the program on concrete input, where a particular input component might e.g. take the value 3, symbolic execution runs the program on symbolic input, where each input component is represented by a placeholder for an initially unconstrained value. As the program runs, symbolic execution keeps track of how program variables depend on the symbolic input. When a branch point (e.g. an if statement) dependent on symbolic input is encountered a constraint solver is used to check which branches are feasible. Each feasible path is then executed, making a copy (known as forking) of the current program state if necessary. On each feasible path (at least one must be feasible) the constraint that the branch imposed on the symbolic input is added to a per path set called the path condition (PC). The PC records all the symbolic branch conditions (branches that depend on symbolic input) that were traversed by the path. The PC has two purposes. First, it is used to check feasibility when encountering later branch points on that path. Second, the constraints in the PC can be solved and a satisfying assignment to the symbolic input can be found. This satisfying assignment can be used to replay execution of the program along the same path that (assuming that the program is deterministic) was symbolically executed and is effectively a test case. These automatically generated test cases can be useful to developers because they can be added to an existing test suite to increase code coverage or be used replay bugs.

In my simplistic words, Symbolic Execution, in short, is a program analysis technique that treats input data as symbolic variables (rather than concrete values). Or, quoting Richard @richinseattle Johnson, Symbolic execution lets us “execute” a series of instructions without using concrete values for variables. Instead of a numeric output, we get a formula for the output in terms of input variables that represents a potential range of values.

Or, as pointed by Axel Souchet.

  • Symbolic execution would be f(x) = x**2 + 10
  • Concrete execution would be f(3) = 19

So, symbolic execution creates constraints based on symbolic variables. If used in conjunction with a constraint solver (like Z3) to generate new inputs/test cases, we have Concolic Execution.

Symbolic Execution + Constraint Solving = Concolic Execution

Concolic execution is used to aid Fuzzing and software verification with the goal of maximizing code coverage.

If we look at a simple function, like the one below.

int foobar(int n)
{
  int i = n * 5;

  if(i == 25)
    bug();

  return 0;
}

By using “dumb” fuzzing what are our real changes of hitting the bug? Yes, 1 in 2^32 (4 billion attempts). Read this post for an interesting case of AFL dealing with something slightly complex than the one above.

I recommend you to look at another presentation from Richard @richinseattle Johnson to get good foundations on this subject. You can watch the video here.

Anyway, this post it’s not about fuzzing. So let’s move on and solve our linear equation again, this time with KLEE.

KLEE

KLEE can be a bit though to install, so just use the Klee’s Docker image.

$ docker pull klee/klee
$ docker run -ti --name=my_klee_container --ulimit='stack=-1:-1' klee/klee
klee@02ca010455db:~$

Linear equation with 3 variables

We’ll use the same linear equation system with 3 variables as before, that I grabbed from here.

x - 2y + 3z = 7
2x + y + z = 4
-3x + 2y - 2z = -10

To solve it with KLEE I simply created the following logic in a file called equation.c.

#include <assert.h>
#include "./klee_src/include/klee/klee.h"

int main(int argc, char* argv[]) 
{
  int x, y, z;

  klee_make_symbolic(&x, sizeof x, "x"); 
  klee_make_symbolic(&y, sizeof y, "y"); 
  klee_make_symbolic(&z, sizeof z, "z");

  if (x - 2 * y + 3 * z != 7) return 0;
  if (2 * x + y + z != 4) return 0;
  if (-3 * x + 2 * y - 2 * z != -10) return 0;

  klee_assert(0);
}

And ran it through KLEE as below.

$ clang -emit-llvm -g -c equation.c
$ klee equation.bc
KLEE: output directory is "/home/klee/klee-out-2"
KLEE: Using STP solver backend
KLEE: ERROR: /home/klee/equation.c:16: ASSERTION FAIL: 0
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 55
KLEE: done: completed paths = 4
KLEE: done: generated tests = 4
$ ls klee-last | grep err
test000003.assert.err
$ ktest-tool --write-ints klee-last/test000003.ktest
ktest file : 'klee-last/test000003.ktest'
args       : ['equation.bc']
num objects: 3
object    0: name: b'x'
object    0: size: 4
object    0: data: 2
object    1: name: b'y'
object    1: size: 4
object    1: data: -1
object    2: name: b'z'
object    2: size: 4
object    2: data: 1

This is indeed the solution to the equation system (x = 2, y = -1, z = 1).

This can be slightly optimized if we use klee_assume(), which tells KLEE to cut path if some constraint is not satisfied. So our equation could be written as below.

#include <assert.h>
#include "./klee_src/include/klee/klee.h"

int main(int argc, char* argv[]) 
{
  int x, y, z;

  klee_make_symbolic(&x, sizeof x, "x"); 
  klee_make_symbolic(&y, sizeof y, "y"); 
  klee_make_symbolic(&z, sizeof z, "z");

  klee_assume(x - 2 * y + 3 * z == 7);
  klee_assume(2 * x + y + z == 4);
  klee_assume(-3 * x + 2 * y - 2 * z == -10);

  klee_assert(0);
}
$ clang -emit-llvm -g -c equation2.c 
$ klee equation2.bc
KLEE: output directory is "/home/klee/klee-out-3"
KLEE: Using STP solver backend
KLEE: ERROR: /home/klee/equation2.c:16: ASSERTION FAIL: 0
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 49
KLEE: done: completed paths = 1
KLEE: done: generated tests = 1
$ ls klee-last | grep err
test000001.assert.err
$ ktest-tool --write-ints klee-last/test000001.ktest
ktest file : 'klee-last/test000001.ktest'
args       : ['equation2.bc']
num objects: 3
object    0: name: b'x'
object    0: size: 4
object    0: data: 2
object    1: name: b'y'
object    1: size: 4
object    1: data: -1
object    2: name: b'z'
object    2: size: 4
object    2: data: 1

I’m not going to solve the Zebra puzzle with KLEE, since we won’t really learn anything new compared to what we have seen until now. You can try it yourself, or if you want just look at Yurichev’s solution here (page 107 at the time of this writing).

CTF Challenge (get the flag)

Instead, we’ll see how to solve the RedVelvet CTF challenge again, this time with KLEE. In order to use KLEE you need access to the source code. In this case, since we don’t have access to the source code of the RedVelvet CTF challenge we’ll use IDA Pro Hex-Rays decompiler (we already saw how to do that) to get the code. Yes, we’ll need to fix the code, so we can compile it. Unfortunately RetDec doesn’t support 64bits so yes, we’ll have to stick with IDA.

Below are the main steps we need to complete in order to solve the challenge with KLEE.

  • Decompile the RedVelvet binary to source code with IDA Pro Hex-Rays decompiler
  • Include defs.h from the plugins folder of IDA Pro in our source file
  • Replace the user input with klee_make_symbolic(input, sizeof(input), "input");
  • Replace the flag printf with klee_assert(0);

The code we get from IDA Pro Hex-Rays decompiler won’t compile, I’ll skip the step of fixing the code and I’ll just give you the code ready to compile below. If you are curious, you can compare the changes yourself.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <openssl/sha.h>
#include "defs.h"

int  func1(char a1, char a2)
{
  if ( a1 * 2 * (char)(a2 ^ a1) - a2 != 10858 )
    exit(1);
  if ( a1 <= 85 || a1 > 95 || a2 <= 96 || a2 > 111 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func2(char a1, char a2)
{
  if ( a1 % a2 != 7 )
    exit(1);
  if ( a2 <= 90 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func3(char a1, char a2)
{
  if ( a1 / a2 + (char)(a2 ^ a1) != 21 || a1 > 99 || a2 > 119 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func4(char a1, char a2)
{
  signed __int64 v2; // rtt@1

  LODWORD(v2) = (char)(a2 ^ a1 ^ a2);
  HIDWORD(v2) = (unsigned __int64)(char)(a2 ^ a1 ^ a2) >> 32;
  if ( (unsigned int)(v2 % a2) + a1 != 137 || a1 <= 115 || a2 > 99 || a2 != 95 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func5(char a1, char a2)
{
  if ( ((a2 + a1) ^ (char)(a1 ^ a2 ^ a1)) != 225 || a1 <= 90 || a2 > 89 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func6(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 85 || a2 <= 110 || a3 <= 115 || ((a2 + a3) ^ (a1 + a2)) != 44 || (a2 + a3) % a1 + a2 != 161 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func7(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a1 > 119 || a2 <= 90 || a3 > 89 || ((a1 + a3) ^ (a2 + a3)) != 122 || (a1 + a3) % a2 + a3 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func8(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a3 > 114 || (a1 + a2) / a3 * a2 != 97 || (a3 ^ (a1 - a2)) * a2 != -10088 || a3 > 114 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func9(char a1, char a2, char a3)
{
  if ( a1 != a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a3 > 99 || a3 + a1 * (a3 - a2) - a1 != -1443 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func10(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 * (a1 + a3 + 1) - a3 != 15514 || a2 <= 90 || a2 > 99 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func11(char a1, char a2, char a3)
{
  if ( a2 < a1 )
    exit(1);
  if ( a1 < a3 )
    exit(1);
  if ( a2 <= 100 || a2 > 104 || a1 + (a2 ^ (a2 - a3)) - a3 != 70 || (a2 + a3) / a1 + a1 != 68 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func12(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 59 || a3 > 44 || a1 + (a2 ^ (a3 + a2)) - a3 != 111 || (a2 ^ (a2 - a3)) + a2 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func13(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 40 || a2 <= 90 || a3 > 109 || a3 + (a2 ^ (a3 + a1)) - a1 != 269 || (a3 ^ (a2 - a1)) + a2 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func14(char a1, char a2, char a3)
{
  if ( a1 < a3 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 99 || a3 <= 90 || a1 + (a2 ^ (a2 + a1)) - a3 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func15(char a1, char a2, char a3)
{
  if ( a2 < a3 )
    exit(1);
  if ( a2 < a1 )
    exit(1);
  if ( a3 <= 95 || a2 > 109 || ((a2 - a1) * a2 ^ a3) - a1 != 1214 || ((a3 - a2) * a3 ^ a1) + a2 != -1034 )
    exit(1);
  return puts("HAPPINESS:)");
}

int main(int argc, const char **argv, const char **envp)
{
  char buf[27];
  size_t v6; // rax@2
  signed int i; // [sp+8h] [bp-1B8h]@2
  SHA256_CTX v11; // [sp+50h] [bp-170h]@2
  unsigned char v38[32]; // [sp+E0h] [bp-E0h]@2

  char s1[80]; // [sp+100h] [bp-C0h]@3
  char s2[8]; // [sp+150h] [bp-70h]@1
  *(_QWORD *)s2 = 3905859155515433264LL;
  char v41[8];
  *(_QWORD *)v41 = 3990529441497888818LL;
  char v42[8];
  *(_QWORD *)v42 = 7017565014431380534LL;
  char v43[8];
  *(_QWORD *)v43 = 3977633058323522358LL;
  char v44[8];
  *(_QWORD *)v44 = 7364290724313116725LL;
  char v45[8];
   *(_QWORD *)v45 = 3991705742141175652LL;
  char v46[8];
   *(_QWORD *)v46 = 7363494672811320633LL;
  char v47[8];
   *(_QWORD *)v47 = 3847534474596595814LL;
  char v48[8];
  *(_QWORD *)v48 = 0LL;
  char v49[8];
  *(_QWORD *)v49 = 0LL;
  char v50[8];
  *(_QWORD *)v50 = 0LL;
  char v51[8];
  *(_QWORD *)v51 = 0LL;
  char v52[8];
  *(_QWORD *)v52 = 0;

  printf("Your flag : ");
  fgets(buf, 27, stdin);
  func1((unsigned int)buf[0], (unsigned int)buf[1]);
  func2((unsigned int)buf[1], (unsigned int)buf[2]);
  func3((unsigned int)buf[2], (unsigned int)buf[3]);
  func4((unsigned int)buf[3], (unsigned int)buf[4]);
  func5((unsigned int)buf[4], (unsigned int)buf[5]);
  func6((unsigned int)buf[5], (unsigned int)buf[6], (unsigned int)buf[7]);
  func7((unsigned int)buf[7], (unsigned int)buf[8], (unsigned int)buf[9]);
  func8((unsigned int)buf[9], (unsigned int)buf[10], (unsigned int)buf[11]);
  func9((unsigned int)buf[11], (unsigned int)buf[12], (unsigned int)buf[13]);
  func10((unsigned int)buf[13], (unsigned int)buf[14], (unsigned int)buf[15]);
  func11((unsigned int)buf[15], (unsigned int)buf[16], (unsigned int)buf[17]);
  func12((unsigned int)buf[17], (unsigned int)buf[18], (unsigned int)buf[19]);
  func13((unsigned int)buf[19], (unsigned int)buf[20], (unsigned int)buf[21]);
  func14((unsigned int)buf[21], (unsigned int)buf[22], (unsigned int)buf[23]);
  func15((unsigned int)buf[23], (unsigned int)buf[24], (unsigned int)buf[25]);
  SHA256_Init(&v11);
  v6 = strlen(buf);
  SHA256_Update(&v11, buf, v6);
  SHA256_Final(v38, &v11);
  for ( i = 0; i <= 31; ++i )
    sprintf(&s1[2 * i], "%02x", (unsigned __int8)v38[i]);
  if ( strcmp(s1, s2) )
    exit(1);
  printf("flag : {\" %s \"}\n", buf);
  return 0;
}

After our fixes, we compile it, and we run it to make sure it mimics exactly the original binary.

$ gcc -o red_velvet red_velvet.c -lssl -lcrypto
$ ./red_velvet
Your flag : lalala

Well, it runs so it must be ok. Let’s proceed with our changes. Connect to the KLEE docker container.

$ docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                    PORTS               NAMES
98643f6c5c8a        klee/klee           "/bin/bash"         24 hours ago        Exited (0) 22 hours ago                       my_first_klee_container
$ docker start -ai 98643f6c5c8a
$ 

Now, we need to change, again, our red_velvet.c with the required KLEE modifications. Below is a unified diff showing the changes I made.

$ diff -uN re-built/red_velvet.c red_velvet.klee.c
--- re-built/red_velvet.c       2018-05-19 13:36:49.593046595 +0100
+++ red_velvet.klee.c   2018-05-19 13:38:51.945568668 +0100
@@ -3,6 +3,8 @@
 #include <string.h>
 #include <openssl/sha.h>
 #include "defs.h"
+#include <assert.h>
+#include "./klee_src/include/klee/klee.h"

 int  func1(char a1, char a2)
 {
@@ -193,8 +195,12 @@
   char v52[8];
   *(_QWORD *)v52 = 0;

+#ifdef KLEE
+  klee_make_symbolic(buf, sizeof(buf), "buf");
+#else
   printf("Your flag : ");
   fgets(buf, 27, stdin);
+#endif
   func1((unsigned int)buf[0], (unsigned int)buf[1]);
   func2((unsigned int)buf[1], (unsigned int)buf[2]);
   func3((unsigned int)buf[2], (unsigned int)buf[3]);
@@ -218,6 +224,10 @@
     sprintf(&s1[2 * i], "%02x", (unsigned __int8)v38[i]);
   if ( strcmp(s1, s2) )
     exit(1);
+#ifdef KLEE
+  klee_assert(0);
+#else
   printf("flag : {\" %s \"}\n", buf);
+#endif
   return 0;
 }

I added my changes inside a #ifdef KLEE to make things clear. As we can see, the changes are pretty straight forward. Note that in order to build our example we need to install libssl-dev first. The password for the KLEE docker container is klee.

$ sudo apt-get install libssl-dev

Once we have it installed, we can proceed. Don’t forget to copy the defs.h file from IDA Pro.

$ clang -emit-llvm -g -c red_velvet.c -I/usr/include -lssl -lcrypto -DKLEE
clang: warning: -lssl: 'linker' input unused
clang: warning: -lcrypto: 'linker' input unused

This is a problem. It’s only possible to compile when emitting LLVM IR, not link. This means our openssl functions “won’t work” and we’ll miss the checksum validation. Since the code uses some standard C/C++ functions, we need to add --libc=uclibc switch, so KLEE will use its own implementation.

$ clang -emit-llvm -g -c red_velvet.c -I/usr/include
$ klee --libc=uclibc --posix-runtime red_velvet.bc
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-0"
KLEE: Using STP solver backend
KLEE: WARNING: undefined reference to function: SHA256_Final
KLEE: WARNING: undefined reference to function: SHA256_Init
KLEE: WARNING: undefined reference to function: SHA256_Update
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 62031344) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:22: divide by zero
KLEE: NOTE: now ignoring this error at this location
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:31: divide by zero
KLEE: NOTE: now ignoring this error at this location
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:42: divide by zero
KLEE: NOTE: now ignoring this error at this location
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
KLEE: WARNING ONCE: calling external: SHA256_Init(66061536) at /home/klee/red_velvet.c:224
KLEE: ERROR: /home/klee/red_velvet.c:223: failed external call: SHA256_Init
KLEE: NOTE: now ignoring this error at this location

KLEE: done: total instructions = 13518
KLEE: done: completed paths = 67
KLEE: done: generated tests = 67
$ ls klee-last | grep err
test000003.div.err
test000006.div.err
test000008.div.err
test000067.external.err
$ ktest-tool --write-ints klee-last/test000067.ktest 
ktest file : 'klee-last/test000067.ktest'
args       : ['red_velvet.bc']
num objects: 2
object    0: name: b'model_version'
object    0: size: 4
object    0: data: 1
object    1: name: b'buf'
object    1: size: 27
object    1: data: b'What_You_Wanna_Be?:)_lb_la\x00'

We know the solution above is valid. However, as you can see above there are 3 warnings.

KLEE: WARNING: undefined reference to function: SHA256_Final
KLEE: WARNING: undefined reference to function: SHA256_Init
KLEE: WARNING: undefined reference to function: SHA256_Update

And an error.

KLEE: ERROR: /home/klee/red_velvet.c:223: failed external call: SHA256_Init

And they shouldn’t be ignored. If one of these functions get’s called the program will terminate, as it happened. KLEE has several limitations, including floating point numbers, assembly code, threads, memory objects of variable size, sockets, and any external function should also be compiled and linked. The openssl external function calls limitation affect us.

What can we do? One option is to move those 3 functions from openssl into our red_velvet.c. I actually did it. However, if you look at the code you’ll see inlined assembly. Which means, it won’t work either (but at least you can see KLEE complaining about it as a reward). Another option is to build openssl into LLVM IR and then launch KLEE with the option -link-llvm-lib=<library file>. I’m aware of some other possible options. First two, use Gold linker with the LLVM plugin or Whole Program LLVM plus llvm-link.

I didn’t try any of those options though. Why? Well, if we think about it the way symbolic execution engines work, and the constraint solvers they use can’t accurately generate and solve the constraints that describe the complete process of a checksum algorithm. However, a constraint solver can produce inputs that will pass a difficult check such as a password, a magic number, or even a checksum.

In our example, the checksum is at the end of our code, so by that time KLEE should have already found a “solution” that should pass the checksum. Right? Well, no. We actually already know that this challenge has more than one possible solution. That is, more than one input that satisfies all the constraints (except the checksum itself). So these inputs won’t pass the program’s first validation (checksum) unless we are really lucky.

Anyway, didn’t I say that as soon as one of those 3 openssl functions are called the program will terminate? Yes, but you can use a KLEE option to make it call the external function despite the warnings we saw before. The option is -load. See below.

$ clang -emit-llvm -g -c red_velvet.c -DKLEE
$ klee --optimize -load=/usr/lib/x86_64-linux-gnu/libssl.so -load=/usr/lib/x86_64-linux-gnu/libcrypto.so --libc=uclibc --posix-runtime -allow-external-sym-calls -emit-all-errors red_velvet.bc
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-16"
KLEE: Using STP solver backend
KLEE: WARNING: undefined reference to function: SHA256_Final
KLEE: WARNING: undefined reference to function: SHA256_Init
KLEE: WARNING: undefined reference to function: SHA256_Update
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 43118368) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:21: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:30: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:41: divide by zero
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
KLEE: WARNING ONCE: calling external: SHA256_Init(59051440) at /home/klee/klee_build/klee-uclibc/libc/string/strlen.c:22
KLEE: ERROR: /home/klee/klee_build/klee-uclibc/libc/string/strlen.c:22: memory error: out of bound pointer
KLEE: WARNING ONCE: calling external: SHA256_Update(59051440, 58599440, 26) at /home/klee/red_velvet.c:224
KLEE: WARNING ONCE: calling external: SHA256_Final(59051152, 59051440) at /home/klee/red_velvet.c:225

KLEE: done: total instructions = 115768
KLEE: done: completed paths = 57
KLEE: done: generated tests = 57
$ ls klee-last | grep err
test000003.div.err
test000006.div.err
test000008.div.err
test000056.ptr.err
$ ktest-tool --write-ints klee-last/test000056.ktest 
ktest file : 'klee-last/test000056.ktest'
args       : ['red_velvet.bc']
num objects: 2
object    0: name: b'model_version'
object    0: size: 4
object    0: data: 1
object    1: name: b'buf'
object    1: size: 27
object    1: data: b'What_You_Wanna_Be?:)_l`_la\x01'

As we can see above the external functions were called, still the checksum didn’t succeed. However, it wasn’t only because the valid solution found wasn’t the expected one.

If I want I can actually modify the code by changing the function func14 as shown below to force the constraint solver to find the solution we want, that is, the solution that will be valid according to the checksum.

int  func14(char a1, char a2, char a3)
{
  /* check to make it accept only one solution */
  if(a2 != 97)
    exit(1);

Anyway, if you run it you will see that despite the solution being valid the KLEE assert is not reached. Why? Well the code for validating the checksum really sucks, if you print the strings being compared you’ll see that actually when the strcmp happens one of the strings is messed up. I fixed that too, see here. However, for some reasons even though after KLEE finds the desired solution the checksum is still wrong. See below.

$ clang -emit-llvm -g -c red_velvet.checksum.fixed.c -DKLEE
$ klee --optimize -load=/usr/lib/x86_64-linux-gnu/libssl.so -load=/usr/lib/x86_64-linux-gnu/libcrypto.so --libc=uclibc --posix-runtime -allow-external-sym-calls -emit-all-errors red_velvet.checksum.fixed.bc
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-6"
KLEE: Using STP solver backend
KLEE: WARNING: undefined reference to function: SHA256_Final
KLEE: WARNING: undefined reference to function: SHA256_Init
KLEE: WARNING: undefined reference to function: SHA256_Update
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 71898288) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.checksum.fixed.c:21: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.checksum.fixed.c:30: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.checksum.fixed.c:41: divide by zero
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
KLEE: WARNING ONCE: calling external: SHA256_Init(54017408) at /home/klee/red_velvet.checksum.fixed.c:202
KLEE: WARNING ONCE: calling external: SHA256_Update(54017408, 68083424, 26) at /home/klee/red_velvet.checksum.fixed.c:203
KLEE: WARNING ONCE: calling external: SHA256_Final(73422336, 54017408) at /home/klee/red_velvet.checksum.fixed.c:204
KLEE: WARNING ONCE: calling external: printf(64453968, 54018336) at /home/klee/red_velvet.checksum.fixed.c:207
s1= 659d36ca563ba4622daabb36a71dafaf6060cdcbf89bb12e75426198496d272c
s2= 0a435f46288bb5a764d13fca6c901d3750cee73fd7689ce79ef6dc0ff8f380e5
buf=
KLEE: ERROR: /home/klee/red_velvet.checksum.fixed.c:213: ASSERTION FAIL: 0

KLEE: done: total instructions = 115625
KLEE: done: completed paths = 57
KLEE: done: generated tests = 57
$ ls klee-last/*err
klee-last/test000003.div.err  klee-last/test000006.div.err  klee-last/test000008.div.err  klee-last/test000057.assert.err
$ ktest-tool --write-ints --trim-zeros klee-last/test000057.ktest
ktest file : 'klee-last/test000057.ktest'
args       : ['red_velvet.checksum.fixed.bc']
num objects: 2
object    0: name: b'model_version'
object    0: size: 4
object    0: data: 1
object    1: name: b'buf'
object    1: size: 27
object    1: data: b'What_You_Wanna_Be?:)_la_la\x00'

Why it is wrong I don’t know. Above I print both checksums and I used klee_assume( strcmp(s1, s2) != 0); to hit the assert and create our solution file. While the checksum for What_You_Wanna_Be?:)_la_la is correct, the other that comes from KLEE it’s not. I checked if it was actually the checksum of What_You_Wanna_Be?:)_la_la\x00 but it is not. Anyway, I spent too much time on this checksum already. If you know why this happens please let me know.

So, what to do? Easy. Delete all the checksum routines. This code is actually a very big hack and really buggy. So let’s remove the checksum routines.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "defs.h"
#include <assert.h>
#include "./klee_src/include/klee/klee.h"

int  func1(char a1, char a2)
{
  if ( a1 * 2 * (char)(a2 ^ a1) - a2 != 10858 )
    exit(1);
  if ( a1 <= 85 || a1 > 95 || a2 <= 96 || a2 > 111 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func2(char a1, char a2)
{
  if ( a1 % a2 != 7 )
    exit(1);
  if ( a2 <= 90 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func3(char a1, char a2)
{
  if ( a1 / a2 + (char)(a2 ^ a1) != 21 || a1 > 99 || a2 > 119 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func4(char a1, char a2)
{
  signed __int64 v2; // rtt@1

  LODWORD(v2) = (char)(a2 ^ a1 ^ a2);
  HIDWORD(v2) = (unsigned __int64)(char)(a2 ^ a1 ^ a2) >> 32;
  if ( (unsigned int)(v2 % a2) + a1 != 137 || a1 <= 115 || a2 > 99 || a2 != 95 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func5(char a1, char a2)
{
  if ( ((a2 + a1) ^ (char)(a1 ^ a2 ^ a1)) != 225 || a1 <= 90 || a2 > 89 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func6(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 85 || a2 <= 110 || a3 <= 115 || ((a2 + a3) ^ (a1 + a2)) != 44 || (a2 + a3) % a1 + a2 != 161 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func7(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a1 > 119 || a2 <= 90 || a3 > 89 || ((a1 + a3) ^ (a2 + a3)) != 122 || (a1 + a3) % a2 + a3 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func8(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a3 > 114 || (a1 + a2) / a3 * a2 != 97 || (a3 ^ (a1 - a2)) * a2 != -10088 || a3 > 114 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func9(char a1, char a2, char a3)
{
  if ( a1 != a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a3 > 99 || a3 + a1 * (a3 - a2) - a1 != -1443 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func10(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 * (a1 + a3 + 1) - a3 != 15514 || a2 <= 90 || a2 > 99 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func11(char a1, char a2, char a3)
{
  if ( a2 < a1 )
    exit(1);
  if ( a1 < a3 )
    exit(1);
  if ( a2 <= 100 || a2 > 104 || a1 + (a2 ^ (a2 - a3)) - a3 != 70 || (a2 + a3) / a1 + a1 != 68 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func12(char a1, char a2, char a3)
{
  if ( a1 < a2 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 59 || a3 > 44 || a1 + (a2 ^ (a3 + a2)) - a3 != 111 || (a2 ^ (a2 - a3)) + a2 != 101 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func13(char a1, char a2, char a3)
{
  if ( a1 > a2 )
    exit(1);
  if ( a2 > a3 )
    exit(1);
  if ( a1 <= 40 || a2 <= 90 || a3 > 109 || a3 + (a2 ^ (a3 + a1)) - a1 != 269 || (a3 ^ (a2 - a1)) + a2 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func14(char a1, char a2, char a3)
{
  if ( a1 < a3 )
    exit(1);
  if ( a2 < a3 )
    exit(1);
  if ( a2 > 99 || a3 <= 90 || a1 + (a2 ^ (a2 + a1)) - a3 != 185 )
    exit(1);
  return puts("HAPPINESS:)");
}

int  func15(char a1, char a2, char a3)
{
  if ( a2 < a3 )
    exit(1);
  if ( a2 < a1 )
    exit(1);
  if ( a3 <= 95 || a2 > 109 || ((a2 - a1) * a2 ^ a3) - a1 != 1214 || ((a3 - a2) * a3 ^ a1) + a2 != -1034 )
    exit(1);
  return puts("HAPPINESS:)");
}

int main(int argc, const char **argv, const char **envp)
{
  char buf[27];

#ifdef KLEE
  klee_make_symbolic(buf, sizeof(buf), "buf");
#else
  printf("Your flag : ");
  fgets(buf, 27, stdin);
#endif
  func1((unsigned int)buf[0], (unsigned int)buf[1]);
  func2((unsigned int)buf[1], (unsigned int)buf[2]);
  func3((unsigned int)buf[2], (unsigned int)buf[3]);
  func4((unsigned int)buf[3], (unsigned int)buf[4]);
  func5((unsigned int)buf[4], (unsigned int)buf[5]);
  func6((unsigned int)buf[5], (unsigned int)buf[6], (unsigned int)buf[7]);
  func7((unsigned int)buf[7], (unsigned int)buf[8], (unsigned int)buf[9]);
  func8((unsigned int)buf[9], (unsigned int)buf[10], (unsigned int)buf[11]);
  func9((unsigned int)buf[11], (unsigned int)buf[12], (unsigned int)buf[13]);
  func10((unsigned int)buf[13], (unsigned int)buf[14], (unsigned int)buf[15]);
  func11((unsigned int)buf[15], (unsigned int)buf[16], (unsigned int)buf[17]);
  func12((unsigned int)buf[17], (unsigned int)buf[18], (unsigned int)buf[19]);
  func13((unsigned int)buf[19], (unsigned int)buf[20], (unsigned int)buf[21]);
  func14((unsigned int)buf[21], (unsigned int)buf[22], (unsigned int)buf[23]);
  func15((unsigned int)buf[23], (unsigned int)buf[24], (unsigned int)buf[25]);

#ifdef KLEE
  klee_assert(0);
#else
  printf("flag : {\" %s \"}\n", buf);
#endif
  return 0;
}

If we run it with KLEE now…

$ clang -emit-llvm -g -c red_velvet.c -DKLEE
klee@98643f6c5c8a:~$ klee --optimize --libc=uclibc --posix-runtime -allow-external-sym-calls -emit-all-errors red_velvet.bc
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-37"
KLEE: Using STP solver backend
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 45360912) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:20: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:29: divide by zero
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:40: divide by zero
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
KLEE: ERROR: /home/klee/red_velvet.c:190: ASSERTION FAIL: 0

KLEE: done: total instructions = 5811
KLEE: done: completed paths = 56
KLEE: done: generated tests = 56
$ ls klee-last | grep err
test000003.div.err
test000006.div.err
test000008.div.err
test000056.assert.err
$ ktest-tool --write-ints klee-last/test000056.ktest
ktest file : 'klee-last/test000056.ktest'
args       : ['red_velvet.bc']
num objects: 2
object    0: name: b'model_version'
object    0: size: 4
object    0: data: 1
object    1: name: b'buf'
object    1: size: 27
object    1: data: b'What_You_Wanna_Be?:)_l`_la\x00'

Our KLEE assert is finally reached. If you list the files inside klee-last, you’ll see multiple ktest files which are basically the test cases generated by KLEE.

If we didn’t want to go over the hassle of removing the checksum code, we could also have simply made the modification below.

$ diff -uN red_velvet.0.c red_velvet.1.c
--- red_velvet.0.c      2018-05-17 18:50:17.989838225 +0000
+++ red_velvet.1.c      2018-05-17 18:51:06.711161145 +0000
@@ -222,8 +222,7 @@
   SHA256_Final(v38, &v11);
   for ( i = 0; i <= 31; ++i )
     sprintf(&s1[2 * i], "%02x", (unsigned __int8)v38[i]);
-  if ( strcmp(s1, s2) )
-    exit(1);
+  klee_assume( strcmp(s1, s2) == 0);
 #ifdef KLEE
   klee_assert(0);
 #else

This was a good example to face some of the limitations of KLEE.

Find the serial number

The basic serial number example will allow us to see a couple of more things while using KLEE. First, instead of using klee_make_symbolic() to make the user input symbolic we can just leave it and make the command-line argument symbolic with the option --sym-arg, in this case it will look for solutions that are under 11 chars.

For reference, in “real” programs the number of code paths can be extremely big if not infinite. This means KLEE will not terminate and it will run until we press Ctrl+C (SIGINT). We can use -max-time=<num seconds>, -max-forks=N (stop forking after N symbolic branches) and -max-memory=N (limit the memory consumption to N bytes) to have some control over KLEE.

The only change we did to the original source code sn.c file was the one below.

$ diff -uN sn.orig.c sn.c 
--- sn.orig.c 2018-05-18 19:01:26.578303833 +0000
+++ sn.c  2018-05-18 19:00:27.068119927 +0000
@@ -82,10 +82,13 @@
     usage(argv[0]);
 
   if (validate_sn(serial))
+#ifdef KLEE
+    klee_assert(0);
+#else
     printf("Thank You. Your license is now active.\n");
   else
     printf("Invalid Serial Number. Please try again.\n");
-
+#endif
   return 0;
 }

We ran it as below.

$ clang -emit-llvm -g -c sn.c -DKLEE
$ time klee -emit-all-errors --libc=uclibc --posix-runtime sn.bc --sym-arg 11
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-23"
KLEE: Using STP solver backend
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 48856800) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: calling __user_main with extra arguments.
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: ERROR: /home/klee/sn.c:86: ASSERTION FAIL: 0

KLEE: done: total instructions = 6158
KLEE: done: completed paths = 25
KLEE: done: generated tests = 25

real  0m0.732s
user  0m0.536s
sys 0m0.191s
$ ls klee-last/*err  
klee-last/test000024.assert.err
$ ktest-tool --write-ints --trim-zeros klee-last/test000024.ktest 
ktest file : 'klee-last/test000024.ktest'
args       : ['sn.bc', '--sym-arg', '11']
num objects: 2
object    0: name: b'arg0'
object    0: size: 12
object    0: data: b'10A00-50A00\x00'
object    1: name: b'model_version'
object    1: size: 4
object    1: data: 1

We now have 25 execution paths, we also have 25 test cases. We can replay these test cases, but we won’t look into that today. Refer to KLEE’s documentation if you want to know more.

We know there are more possible “solutions” than the one above. KLEE is deterministic, which means that if you run it again you’ll get the same result. There are a couple of “hacks” we can try to get different results, but I won’t follow that path. You can try the KLEE-DEV mailing list for options. One easy thing you can do is to use a different solver than the default one, STP. For example, we can use Z3. Actually, it’s not uncommon to use a multiple solvers to increase performance.

$ time klee -emit-all-errors --libc=uclibc --posix-runtime -solver-backend=z3 sn.bc --sym-arg 11
KLEE: NOTE: Using klee-uclibc : /home/klee/klee_build/klee/Release+Debug+Asserts/lib/klee-uclibc.bca
KLEE: NOTE: Using model: /home/klee/klee_build/klee/Release+Debug+Asserts/lib/libkleeRuntimePOSIX.bca
KLEE: output directory is "/home/klee/klee-out-24"
KLEE: Using Z3 solver backend
KLEE: WARNING ONCE: calling external: syscall(16, 0, 21505, 55623456) at /home/klee/klee_src/runtime/POSIX/fd.c:980
KLEE: WARNING ONCE: calling __user_main with extra arguments.
KLEE: WARNING ONCE: Alignment of memory from call "malloc" is not modelled. Using alignment of 8.
KLEE: ERROR: /home/klee/sn.c:86: ASSERTION FAIL: 0

KLEE: done: total instructions = 6158
KLEE: done: completed paths = 25
KLEE: done: generated tests = 25

real  0m0.612s
user  0m0.583s
sys 0m0.028s
$ ls klee-last/*err
klee-last/test000024.assert.err
$ ktest-tool --write-ints klee-last/test000024.ktest 
ktest file : 'klee-last/test000024.ktest'
args       : ['sn.bc', '--sym-arg', '11']
num objects: 2
object    0: name: b'arg0'
object    0: size: 12
object    0: data: b'12K22-52K222'
object    1: name: b'model_version'
object    1: size: 4
object    1: data: 1

Which generated a different valid SN number, in this case actually a bit faster than STP. You can find more about what solvers and solver options you can play with here.

angr

What is angr? angr is a framework for analyzing binaries, with the capability to perform dynamic symbolic execution (like KLEE, etc) and various static analyses on binaries. It is very popular among the CTF community, since Shellphish has used angr in many CTFs successfully. If you are already familiar with angr, probably you want to skip this part.

I find angr very interesting as it focuses on both static and dynamic symbolic (“concolic”) analysis, making it applicable to a variety of tasks. It is complex though, as any other tool in this area anyway. Their documentation is very good and I definitely recommend you to go through it if you want to know more than what we’ll see here.

We’ll look again at the RedVelvet binary from the CODEGATE CTF. We already reversed it, and we used IDA Pro Hex-Rays decompiler to look at its pseudo code. If we don’t read the pseudo code carefully, we might think at first that if we pass all the 15 validation functions we get our flag. And that was my first thought. I could try to print HAPPINESS:) 15 times…

We’ll use angr docker image. If you aren’t familiar with docker, below I get the angr image, I start a container, I copy our RedVelvet binary to the container, and I connect back to the container.

$ docker pull angr/angr
$ docker run -it angr/angr
(angr) angr@66fb14721ee2:~$ exit
logout
$ docker ps -a
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS                      PORTS               NAMES
66fb14721ee2        angr/angr           "/bin/sh -c 'su - ..."   35 minutes ago      Exited (0) 17 minutes ago                       fervent_northcutt
$ docker cp ~/CTFs/2018/CODEGATE/RedVelvet fervent_northcutt:/home/angr/
$ docker start -ai 66fb14721ee2
(angr) angr@66fb14721ee2:~$ ls
RedVelvet  angr-dev

So I looked at angr documentation and found something that might work and it’s almost a copy and paste. “The most important control interface in angr is the SimulationManager, which allows you to control symbolic execution over groups of states simultaneously, applying search strategies to explore a program’s state space.” And little down on the same page we have a crackme example.

First, we load the binary.

>>> proj = angr.Project('examples/CSCI-4968-MBE/challenges/crackme0x00a/crackme0x00a')

Next, we create a SimulationManager.

>>> simgr = proj.factory.simgr()

Now, we symbolically execute until we find a state that matches our condition (i.e., the win condition).

>>> simgr.explore(find=lambda s: "Congrats" in s.posix.dumps(1))
<SimulationManager with 1 active, 1 found>

Now, we can get the flag out of that state!

>>> s = simgr.found[0]
>>> print s.posix.dumps(1)
Enter password: Congrats!

>>> flag = s.posix.dumps(0)
>>> print(flag)
g00dJ0B!

Pretty simple, isn’t it? Well, it looks like.

I create the script below, which is basically a copy of the documentation.

import angr

proj = angr.Project("./RedVelvet")
simgr = proj.factory.simgr()
simgr.explore(find=lambda s: "HAPPINESS:)\n" * 15 in s.posix.dumps(1))
s = simgr.found[0]
print s.posix.dumps(1)
flag = s.posix.dumps(0)[:26]
print flag

Let’s try it.

$ time python red.angr.1.py
WARNING | 2018-05-19 09:34:37,083 | angr.procedures.definitions | unsupported syscall: ptrace
WARNING | 2018-05-19 09:34:43,210 | angr.engines.successors | Exit state has over 256 possible solutions. Likely unconstrained; skipping. <BV64 reg_28_4_64>

Hmm, after 10 minutes it was still running… so I hit Ctrl+C. We can see ptrace is unsupported and most likely it’s only creating overhead and increasing complexity. Looking at the code I don’t even know why it is used, probably just to annoy us. So let’s patch the binary to remove that call. Open the binary in IDA Pro, I assume you have Keypatch plugin installed. If you don’t, just follow the instruction on GitHub. Locate the ptrace call and fill it with nops.

Right click over the ptrace call.

Nothing special. You just need to adjust the JNZ instruction to follow the desired path. Simply change it from JNZ to JMP and that’s it. In the end it should look like this.

Finally, save your modified binary.

And, it’s always a good idea to check if it still works.

$ ./RedVelvet.noptrace
Your flag : What the hack
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)

It doesn’t crash. Should be good. Let’s try with this binary instead.

$ docker cp ~/CTFs/2018/CODEGATE/RedVelvet.noptrace fervent_northcutt:/home/angr/
$ docker start -ai 66fb14721ee2
(angr) angr@66fb14721ee2:~$ ls
RedVelvet  RedVelvet.noptrace  angr-dev  red.angr.1.py

I changed the script to load this binary and I run it again.

$ time python red.angr.1.py
Your flag : HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)
HAPPINESS:)

What_You_Wanna_Be?:)_lc_la

real    3m38.992s
user    3m37.447s
sys     0m1.503s

Nice, it worked. In terms of execution it was slower than Z3 or KLEE, but definitely a lot faster overall. I mean, writing this script was trivial. Took me more time looking at the documentation than typing the script.

While looking at angr’s documentation I also found the following while looking for symbolic execution. “At heart, angr is a symbolic execution engine. angr exposes a standard way to write and perform dynamic symbolic execution: the Surveyor class. A Surveyor is the engine that drives symbolic execution: it tracks what paths are active, identifies which paths to step forward and which paths to prune, and optimizes resource allocation.”

And, “angr.surveyors.Explorer is a Surveyor subclass that implements symbolic exploration. It can be told where to start, where to go, what to avoid, and what paths to stick to. It also tries to avoid getting stuck in loops.”

And, in the explorer section, we can read the following.

# This creates an Explorer that tries to find 0x4006ed (successful auth),
# while avoiding 0x4006fd (failed auth) or 0x4006aa (the authentication
# routine). In essense, we are looking for a backdoor.
>>> e = proj.surveyors.Explorer(find=(0x4006ed,), avoid=(0x4006aa,0x4006fd))

Looks interesting, let’s try to play with it. The RedVelvet binary is not loaded into random locations within virtual memory each time it is executed. See, No PIE. Even if it was, it wouldn’t be a problem because you could just use offsets plus the base address the angr always uses.

$ ./checksec -f ~/CTFs/2018/CODEGATE/RedVelvet
RELRO           STACK CANARY      NX            PIE             RPATH      RUNPATH      FORTIFY Fortified Fortifiable  FILE
Partial RELRO   Canary found      NX enabled    No PIE          No RPATH   No RUNPATH   Yes     0               6       /home/rui/CTFs/2018/CODEGATE/RedVelvet

So it seems we can give this a try. We’ll look at the binary in IDA Pro again. According to angr documentation we need to give it the address we are trying to find. We’ll be looking at the already patched RedVelvet binary to avoid the ptrace call.

So if we reach the address above (0x401537), after all the 15 validation function calls, we should be fine. We also need the address we want to avoid. That should be address of the exit calls inside each validation function.

If we enter func1().

And we follow the exit call.

We see that the address we want to avoid is 0x4007d0. So here’s the script.

import angr

proj = angr.Project('RedVelvet.noptrace')

e = proj.surveyors.Explorer(find=0x401537, avoid=0x4007D0)
e.run()
flag = e.found[0].state.posix.dumps(0)
print flag[:26]

Let’s try it.

$ time python red.angr.2.py
What_You_Wanna_Be?:)_lc_la

real    3m40.224s
user    3m38.746s
sys     0m1.419s

It worked, again. More or less the same time as before. We know this is not the correct solution though. Sometimes there are multiple solutions when dealing with constraint solvers as we saw before. It’s common they will find solutions that we aren’t even aware. In such cases we can modify the constraints to exclude the unwanted solution.

Final notes

Symbolic execution is slow and costly. However, it is undoubtedly powerful. Symbolic execution uses a constraint solver to create inputs/test cases which will exercise a given path in the binary. The main problem with symbolic execution, even though we didn’t observe it in this post, is that an approach based only on symbolic execution will succumb to path explosion. As we can easily guess, the number of paths in a real world big binary exponentially increases with each branch.

If you are playing CTFs, getting familiar with angr will definitely help you solve the type of challenges we saw faster. If you are looking at applying symbolic execution to step up your fuzzing/software analysis, I would say have a look at KLEE. Both have advantages and disadvantages depending on the problem you are facing. Both have an eventually long learning curve. KLEE can be a pain to install and it’s old LLVM version dependency holds it back a bit. If you are playing CTFs, when using KLEE dealing with symbols and external library calls can be a pain as we saw. Talking about limitations, there’s a nice gist here from moyix with some KLEE limitations too, plus some workarounds. Worth reading.

To increase code coverage and bug count “everyone” is using a combination of tools and techniques in this area, so it’s worth knowing the basics. There are many (big) projects out there combining Z3, KLEE, angr, AFL, etc, with impressive results. See the references for more information.

I don’t play CTFs very often, my interest in symbolic execution lies in how to apply it to code coverage and fuzzing. At least understand it so, I can make a better use of it. This post documents the first steps if you want to get into this area. All the code snippets used are available at my GitHub. Next steps will be looking at Triton and Manticore plus all the impressive set of tools that Trail of Bits has in this area. I already did a couple of things with these, but this post is way too long already…

Credits

Thanks to Luis @countuponsec Rocha for proofreading this post.

References (in no particular order)

Video