Property-Based Testing: Let Your Computer Find Bugs You Can't Imagine
The Bug That Changed My Testing Philosophy
Picture this: You've written a function to parse timestamps, tested it with dozens of examples, and it's been running in production for months. Then one day, it crashes on "2020-02-29T23:59:60". A leap second on a leap day—a combination you never thought to test.
This is where property-based testing shines. Instead of trying to imagine every possible edge case, you describe the properties your code should satisfy, and let the computer generate thousands of test cases, including the weird ones you'd never think of.
What Makes Property-Based Testing Different?
Traditional unit testing is example-based: you, the developer, provide a few specific inputs and assert that they produce specific outputs. Property-based testing flips this on its head: you define the general properties or "rules" your code must obey, and a framework generates hundreds or thousands of examples to try and prove you wrong.
Traditional unit tests are example-based: you provide specific inputs and check for specific outputs.
python
1
deftest_sort_examples():
2
assert sort([3,1,2])==[1,2,3]
3
assert sort([])==[]
4
assert sort([1])==[1]
5
assert sort([2,2,1])==[1,2,2]
Property-based tests describe general truths about your code:
python
1
from hypothesis import given, strategies as st
2
3
@given(st.lists(st.integers()))
4
deftest_sort_properties(lst):
5
sorted_list = sort(lst)
6
7
# Property 1: Output length equals input length
8
assertlen(sorted_list)==len(lst)
9
10
# Property 2: Output is ordered
11
for i inrange(len(sorted_list)-1):
12
assert sorted_list[i]<= sorted_list[i +1]
13
14
# Property 3: Output contains same elements as input
15
assertsorted(lst)== sorted_list
✨
The key insight: You don't specify what to test, you specify how to test. The framework generates the what.
Interactive Testing Comparison
How Example-Based Testing Works:
You manually write specific test cases with known inputs and expected outputs.
"hello""HELLO"
"World""WORLD"
"123""123"
""""
Getting Started with Hypothesis
Let's build intuition with a simple example: a function that reverses strings.
python
1
defreverse_string(s:str)->str:
2
"""Reverse a string."""
3
return s[::-1]
4
5
# Traditional test
6
deftest_reverse_examples():
7
assert reverse_string("hello")=="olleh"
8
assert reverse_string("")==""
9
assert reverse_string("a")=="a"
10
11
# Property-based test
12
from hypothesis import given
13
from hypothesis import strategies as st
14
15
@given(st.text())
16
deftest_reverse_properties(s):
17
reversed_s = reverse_string(s)
18
19
# Property: Reversing twice gives original
20
assert reverse_string(reversed_s)== s
21
22
# Property: Length is preserved
23
assertlen(reversed_s)==len(s)
24
25
# Property: First char becomes last (if non-empty)
26
if s:
27
assert reversed_s[-1]== s[0]
28
assert reversed_s[0]== s[-1]
When you run this test, Hypothesis will generate hundreds of strings: empty strings, single characters, Unicode snowmen (☃), null bytes, extremely long strings, and more.
How Property Testing Explores the Input Space
Property being tested: isInsideTriangle(x, y) correctly classifies points
Compare testing strategies: Random sampling vs intelligent shrinking. Property-based testing doesn't know boundaries beforehand - it discovers them by shrinking failures to minimal cases.
30
Total Tests
0
Inside
0
Outside
🔵 Blue dots: Triangle vertices
🟢 Green dots: Points inside the triangle
🔴 Red dots: Points outside the triangle
🟡 Yellow dot: Currently testing
Real-World Properties to Test
1. Invariants
Best for: Enforcing universal rules about your data structures or system state. For example, ensuring a cache never exceeds its capacity, or a user's balance never drops below zero in a banking application.
Properties that remain true regardless of the operation:
python
1
@given(st.dictionaries(st.text(), st.integers()))
2
deftest_cache_size_invariant(initial_data):
3
cache = LRUCache(capacity=100)
4
5
for key, value in initial_data.items():
6
cache.put(key, value)
7
# Invariant: size never exceeds capacity
8
assertlen(cache)<=100
2. Round-trip Properties
Best for: Verifying that data is not lost or corrupted during serialization/deserialization, compression/decompression, or any other pair of inverse operations. This is critical for data integrity in file storage, network communication, and database interactions.
Operations that can be reversed:
python
1
@given(st.text())
2
deftest_json_roundtrip(data):
3
# Skip if the string contains invalid JSON characters
4
try:
5
json_str = json.dumps(data)
6
assert json.loads(json_str)== data
7
except(UnicodeDecodeError, UnicodeEncodeError):
8
# Some strings can't be JSON encoded
9
pass
10
11
@given(st.binary())
12
deftest_compression_roundtrip(data):
13
compressed = zlib.compress(data)
14
decompressed = zlib.decompress(compressed)
15
assert decompressed == data
3. Metamorphic Relations
Best for: Testing functions where the exact output is hard to predict, but the relationship between different inputs and outputs is well-defined. This is common in scientific computing, machine learning (e.g., "does adding a positive value to all inputs increase the average?"), or complex business logic.
Best for: When you're refactoring a complex algorithm or replacing a slow, simple implementation with a highly optimized one. You can use the old, trusted code as an "oracle" to verify that the new version behaves identically.
When you have a trusted reference implementation:
python
1
@given(st.lists(st.integers()))
2
deftest_custom_sort_matches_builtin(lst):
3
custom_sorted = my_custom_sort(lst.copy())
4
builtin_sorted =sorted(lst)
5
assert custom_sorted == builtin_sorted
Hypothesis Strategies: Generating Complex Data
Hypothesis provides powerful strategies for generating test data:
The median should be 0.5, but our function returns 0 due to integer division!
⚠️
This bug is particularly insidious because it only appears with even-length lists where the two middle values have an odd sum. Traditional tests often miss this.
Shrinking: Finding Minimal Failing Examples
One of Hypothesis's killer features is shrinking. When it finds a failing example, it automatically simplifies it to find the minimal case that still fails.
How Hypothesis Shrinking Works
Property: buggySort(list) should preserve all elements
Bug: The function filters out negative numbers
Current Test Case
❌ Fails property
[42, -17, 0, 23, -5, 99, -1, 7, -33, 15]
Initial failing test case
Hypothesis found this failing example. Now it will try to simplify it.
Step 1 of 7Simplification Progress
python
1
defremove_duplicates(items):
2
"""Remove duplicates while preserving order."""
3
seen =set()
4
result =[]
5
for item in items:
6
if item notin seen:
7
seen.add(item)
8
result.append(item)
9
# Bug: returning the set of seen items, which is unordered
10
return seen
11
12
@given(st.lists(st.integers()))
13
deftest_remove_duplicates_properties(items):
14
result = remove_duplicates(items)
15
16
# Property 1: All items in the result are unique
17
assertlen(result)==len(set(result))
18
19
# Property 2: The result contains only items from the original list
20
assertset(result).issubset(set(items))
21
22
# Property 3 (the one that fails): Order is preserved
23
# We can build the expected list and compare
24
expected =[]
25
seen =set()
26
for item in items:
27
if item notin seen:
28
seen.add(item)
29
expected.append(item)
30
31
# This assertion will fail because `result` is an unordered set
32
assertlist(result)== expected
Hypothesis might initially find a failure with [47, -23, 0, 47, 12, -23, 99, 47], but it will shrink this to the minimal failing case: [0, 1].
Property-based testing isn't just another testing tool—it's a different way of thinking about correctness. Instead of asking "does my code work for these examples?", you ask "what should always be true about my code?"
This shift in perspective helps you:
•Find bugs you didn't know existed
•Understand your code's behavior more deeply
•Build more robust systems
•Sleep better at night
Start small. Pick one pure function in your codebase and write a property-based test for it. Let Hypothesis show you the edge cases you've been missing. Once you see it catch its first real bug, you'll be hooked.
Remember: The goal isn't to replace all your example-based tests. It's to add another powerful tool to your testing arsenal—one that helps you think differently about what it means for code to be correct.