home ::
syllabus ::
timetable ::
groups ::
moodle ::
chat ::
© 2020
- Also known as functional testing
- Can't look inside the code
- e.g. Reflect on input space to find interesting regions
- e..g Look for clusters in the input or output space
- e..g if age runs 0...120
- try less-than-min, in-range, more-than-max
- e.g. 1, 60, 150
- e.g. all pairs testing
- Let the inputs be a vector showing choices per input
- Find inouts that never use the same pair of values, twice.
- e.g. three binary inputs, one "days of week"
and something with a range of 10 inputs
- e.g. (happy:boolean rich:boolean healthy:boolean restDay:dayOfWeek exercise:1..10))
- ranges: (2 2 2 7 10)
- so fi wever you use happy=true and restDay=sunday then you can never test that pair again
- e.g. (t * * sunday *) can only appear once
- when processed by an
all-pairs
generator
(ipo '(2 2 2 7 10))(see below) - BTW, all pairs is an amazing heuristic for exploring a large space
- Let the inputs be a vector showing choices per input
- e..g if age runs 0...120
((2 2 1 1 1) ; e.g. (true true true and first value of rest)
(2 1 2 2 2) (1 2 2 3 3) (1 1 1 4 4)
(2 2 2 7 5) (2 2 2 6 6) (2 2 2 5 7)
(2 2 2 4 8) (1 1 2 1 9) (1 1 1 7 10)
(1 1 1 6 5) (1 1 1 5 6) (2 1 1 3 3)
(1 2 1 2 2) (2 2 1 7 9) (1 1 1 7 8)
(1 1 1 7 7) (0 0 0 7 6) ; note "0" means "don't care"
(2 2 2 7 4) (0 0 0 7 3) (0 0 0 7 2)
(1 1 2 7 1) (2 2 2 6 10) (0 0 0 6 9)
(0 0 0 6 8) (0 0 0 6 7) (0 0 0 6 4)
(0 0 0 6 3) (0 0 0 6 2) (0 0 0 6 1)
(0 0 0 5 10) (0 0 0 5 9) (0 0 0 5 8)
(0 0 0 5 5) (0 0 0 5 4) (0 0 0 5 3)
(0 0 0 5 2) (0 0 0 5 1) (0 0 0 4 10)
(0 0 0 4 9) (0 0 0 4 7) (0 0 0 4 6)
(0 0 0 4 5) (0 0 0 4 3) (0 0 0 4 2)
(0 0 0 4 1) (0 0 0 3 10) (0 0 0 3 9)
(0 0 0 3 8) (0 0 0 3 7) (0 0 0 3 6)
(0 0 0 3 5) (0 0 0 3 4) (0 0 0 3 2)
(0 0 0 3 1) (0 0 0 2 10) (0 0 0 2 9)
(0 0 0 2 8) (0 0 0 2 7) (0 0 0 2 6)
(0 0 0 2 5) (0 0 0 2 4) (0 0 0 2 3)
(0 0 0 2 1) (0 0 0 1 10) (0 0 0 1 8)
(0 0 0 1 7) (0 0 0 1 6) (0 0 0 1 5)
(0 0 0 1 4) (0 0 0 1 3) (0 0 0 1 2))- One trick in black box testing
- Read the doc
- Doodle a model showing expectations
- Generate tests over that doodle
- Coverage criteria (for finite-state machines)
- Test coverage: cover every path: no feasible due to infinite number of paths (cycles)
- State coverage: every node coverage (minimal testing criterion)
- Transition coverage: every edge covered
- E.g. here are five tests covering every edge
- Is this still "just" functional testing.
Dumb Fuzzing :
- Fuzz testing was originally developed by Barton Miller at the University of Wisconsin in 1989.
- throw random cr*p at a program till it crashed
- brute force mutation
- History:
-
1981: Duran and Ntafos investigated the effectiveness of testing a program with random inputs.
- Previously: random testing had widely perceived to be worst means of testing
- Their analysis: random proves are a cost-effective alternative to more systematic testing
-
1983: Steve Capps developed "The Monkey", a tool that would generate random inputs for classic Mac OS applications, such as MacPaint.
- The figurative "monkey" refers to the infinite monkey theorem which states that a monkey hitting keys at random on a typewriter keyboard for an infinite amount of time will eventually type out the entire works of Shakespeare. In the case of testing, the monkey would write the particular sequence of inputs that will trigger a crash.
-
1988: Barton Miller: when he was logged to a modem during a storm, there was a lot of line noise generating junk characters and those characters caused programs to crash
- New term: _fuzzing = automatically generate random files and command-line parameters for the utility.
- The project was designed to test the reliability of Unix programs by executing a large number of random inputs in quick succession until they crashed.
-
Recently, numerous examples where fuzzing found bugs other approaches missed
-
- Simple fuzzing: advantages:
- Very cheap to generate a test
- Exploit CPUs to explore a broad range of options
- Useful for dodging incorrect preconceptions of what the program should do
- Good for detecting events that lead to buffer overflow, DOS (denial of service), cross-site scripting and SQL injection
- Simple fuzzing: disadvantages:
- It cannot provide a complete picture of the overall security, quality or effectiveness of a program
- Spends much time generating impossible inputs or very unlikely events
Smarter fuzzing:
- Express input as a grammar
- Generate from tree
- Generational fuzzing
US_PHONE_GRAMMAR = {
"<start>": ["<phone-number>"],
"<phone-number>": ["(<area>)<exchange>-<line>"],
"<area>": ["<lead-digit><digit><digit>"],
"<exchange>": ["<lead-digit><digit><digit>"],
"<line>": ["<digit><digit><digit><digit>"],
"<lead-digit>": ["2", "3", "4", "5", "6", "7", "8", "9"],
"<digit>": ["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
}- Example
[simple_grammar_fuzzer(US_PHONE_GRAMMAR) for i in range(5)]
['(692)449-5179',
'(519)230-7422',
'(613)761-0853',
'(979)881-3858',
'(810)914-5475']
Mutational fuzzing:
- Take a known valid input
- Mutate it
def mutate(s):
"""Return s with a random mutation applied"""
mutators = [
delete_random_character,
insert_random_character,
flip_random_character
]
mutator = random.choice(mutators)
# print(mutator)
return mutator(s)
for i in range(10):
print(repr(mutate("A quick brown fox")))
'A qzuick brown fox'
' quick brown fox'
'A quick Brown fox'
'A qMuick brown fox'
'A qu_ick brown fox'
'A quick bXrown fox'
'A quick brown fx'
'A quick!brown fox'
'A! quick brown fox'
'A quick brownfox'Coverage fuzzing
- Track parts of the grammar seen so far
- Fuzz to some new place.
Mining examples to weight crammers:
- Take a library of good examples
- Weight sub-trees on (e.g.) Probability
- Stochastic recursive descent:
- Stochastically select sub-trees according to their weights
- If weight = random then generational fuzzing
- If select to prefer min weights, then coverage fuzzing
- Recurs into sub tree.
- Stochastically select sub-trees according to their weights
Smarter smarter fuzzing = genetic programming
- We we dynamically adjust weights to prefer certain goals
- then "testing" becomes AI
- then "testing" becomes "mitigation" or "optimization"
How to test with an oracle for the specifics of the domain?
Metamorphic relations (MRs) are necessary properties of the intended functionality of the software
-
high-level statements that should be true across all inputs
-
e.g. conjunctions do not lead to more output
- RESULT1= "all males"
- RESULT2="bald males"
- RESULT2 should not be larger than RESULT1
-
e.g. When testing a booking website, a web search for RESULT1= accommodation in Sydney, Australia, returns 1,671 results
- RESULT2= Filter the price range or star rating and apply the search again;
- RESULT2 should be a subset of RESULT1
A wonderful metamorphic testing result: Z. Q. Zhou, T. H. Tse and M. Witheridge, Metamorphic Robustness Testing: Exposing Hidden Defects in Citation Statistics and Journal Impact Factors in IEEE Transactions on Software Engineering, doi: 10.1109/TSE.2019.2915065.
White box: we can open up the code and look inside:
- Coverage criteria (for code)
- Functions (all functions called once);
- A very weak test
- Statement coverage
- Supported by many tools
- du coverage:
- find all paths between where a variable is defined and used.
- Branch coverage:
- has every condition in the program be explored;
- Functions (all functions called once);
- Warning: you can succeed on all the above, and the code still crashes.
Symbolic execution:
- Find the abstract syntax tree of the code
- e.g. python3's
ctreepackage
- e.g. python3's
import ctree
def f(a):
for x in range(10):
a[x] += x
tree1 = ctree.get_ast(f)
ctree.ipython_show_ast(tree1)Applications of symbolic execution:
- Walk the tree to collect the constraints to build the tests.
- Can lead to spectacular reductions to black box testing
- e.g. BigTest: White-Box Testing of Big Data Analytics [ESEC/FSE 2019]
- Scripts processing gigabytes of data sets
- Is this a hard testing problem?
- Q: Do the tests have to handle all the possible combinations in the data?
- A: No: they only need to cover all the branches of the code
Express english requirements as checkable logic, then use logic to reason about it
Other examples:
-
Product lines:
-
Temporal logic (add operators for until ("U") and always "[]" eventually "<>"
- always ∀, there exists at least one ∃
- e.g. elevator door stays open between X and Y
(From One-Click Formal Methods:
- FORMAL METHODS: mathematically based approaches for specifying, building, and reasoning about software.
- Despite 50 years of research and development, formal methods have had only limited impact in industry.
- Some in such domains as microprocessor design and aerospace.
Why not widely used?
- The modeling cost: Analysts must create a systems model (what is the system) and a properties model (what is meant to do). Properties model usually much smaller than systems model.
- The execution cost: Rigorous analysis of formal properties needs a full search of systems model.
- The personnel cost: Analysts skilled in formal methods must be recruited or trained. Such analysts are generally hard to find and retain.
- The development brake: The above costs can be so high that the requirements must be frozen for some time while we perform the formal analysis. Hence, one of the costs of formal analysis is that it can slow the process of requirements evolution.
Recent experience at Amazon:
- More and more, web-based systems are configured in sufficient detail
- Such that processes can be bounced around from node to node on the cloud (to make best use of spare resources)
- Application program interfaces (APIs) of cloud services are computer-readable contracts that establish and govern how the system behaves.
- Most importantly, since those models are utilized by a large user community,
- now economically feasible to build the tools needed to verify them
- Most importantly, since those models are utilized by a large user community,
- Which means that we have enough information to auto-configure our formal methods
- and the size of the potential user community and the business value now justifies the cost of formal methods.

.gif)








