Use this skill when the user has a list of behavioral invariants for a function/module/feature and wants property-based tests generated from them. Triggers on phrases like "generate property tests", "make hypothesis tests for X", "property-test these invariants", "fuzz-test this function", or when the feature-development skill's Stage 1 spec audit identifies invariants worth exploring with Hypothesis. ALSO use proactively if you read a spec_audit.md that lists invariants without corresponding...
Scanned 5/28/2026
Install via CLI
openskills install neuralforge-labs/tlmforge---
name: property-test-generator
description: |
Use this skill when the user has a list of behavioral invariants for a function/module/feature
and wants property-based tests generated from them. Triggers on phrases like
"generate property tests", "make hypothesis tests for X", "property-test these invariants",
"fuzz-test this function", or when the feature-development skill's Stage 1 spec audit
identifies invariants worth exploring with Hypothesis. ALSO use proactively if you read a
spec_audit.md that lists invariants without corresponding property tests — generating them
now (during spec audit) is cheaper than discovering edge cases after impl.
Output: a Python test file using `hypothesis` (default) or `dart_check` for Flutter, with
one `@given(...)` test per invariant, calibrated input strategies, and shrink-friendly
assertions. The skill does NOT run the tests — it generates them, the user runs them.
---
# Property test generator
This skill turns English invariants into mechanical input-space exploration. Property-based
tests catch the bugs the reviewer trio misses BECAUSE they explore inputs you didn't think
to enumerate. Especially valuable for memory/encryption/auth/PII code paths where input
spaces are large and the failure mode is "weird input → silent corruption."
## When to use
**Trigger conditions:**
1. User explicitly asks: "generate property tests", "make Hypothesis tests for X", "fuzz-test this".
2. A `spec_audit.md` lists invariants in a section (typically labeled "Invariants" or
"Properties this must satisfy"). Generating tests at spec-audit time is the cheapest place
to do it — invariants that turn out un-testable are signal that the spec is fuzzy.
3. The user is modifying a function in a security-sensitive area (auth, encryption, PII,
payments) — propose property tests proactively even if not asked.
**When NOT to use:**
- Pure UI / integration tests (property tests don't fit click-flow surfaces well — Playwright is the right tool).
- Unit tests for trivial functions where example-based tests are sufficient and clearer.
- If the user just wants 1-2 tests for a happy path (overhead of `@given` + strategy auth doesn't earn its keep for tiny surfaces).
## Recipe
### Step 1 — extract invariants
Read the user-provided spec / spec_audit.md / invariant list. Each invariant should fit the
shape: "**For all <input X> satisfying <precondition>, <function Y> produces output that
satisfies <postcondition>.**"
Examples:
- "For any valid user, `modify_memory(user_id, memory_id, content)` always leaves a `referenced_memory_id` field on the result."
- "For any string input, `slugify(text)` returns a string of length ≤ 80."
- "For any positive integer N, `paginate(items, N)` returns at most N items per page."
If the invariant doesn't fit the for-all shape, push back: "this looks like a specific test
case, not a property. Should we make it `@pytest.mark.parametrize` or refine the property?"
### Step 2 — calibrate input strategies
Each invariant's input gets a Hypothesis strategy that reflects realistic input space:
| Real-world input | Hypothesis strategy |
|---|---|
| User ID (UUID) | `st.uuids()` or `st.from_regex(uuid_re)` |
| Email | `st.emails()` |
| Free text | `st.text(min_size=0, max_size=1000)` |
| Integer in range | `st.integers(min_value=1, max_value=10000)` |
| Memory content | `st.text(alphabet=st.characters(blacklist_categories=("Cs",)))` (no surrogates) |
| List of items | `st.lists(item_strategy, max_size=100)` |
| Custom dataclass | `st.builds(MyClass, field1=..., field2=...)` |
**Default strategy hygiene:**
- Cap `max_size` for collections — unbounded sizes burn CI time on shrink.
- Exclude surrogates and control chars unless the invariant specifically targets them.
- Use `st.from_regex()` only when the format is well-defined; prefer typed strategies otherwise.
- Add `@settings(max_examples=200)` for fast loops; bump to 1000 for security-critical paths.
### Step 3 — emit the test file
Format: one Python file per feature, multiple `@given` tests inside.
```python
"""Property-based tests for <feature>. Generated by property-test-generator skill.
Each test corresponds to an invariant in specs/<feature>/spec_audit.md.
"""
from hypothesis import given, settings, strategies as st
from <module> import <function_under_test>
@given(user_id=st.uuids(), memory_id=st.text(min_size=1, max_size=64), content=st.text(max_size=1000))
@settings(max_examples=200)
def test_modify_memory_preserves_referenced_memory_id(user_id, memory_id, content):
"""Invariant: for any valid user, modify_memory always leaves referenced_memory_id."""
# Precondition: a memory exists for user_id+memory_id
create_memory(str(user_id), memory_id, "initial content")
# Property: after modify, the result has referenced_memory_id
result = modify_memory(str(user_id), memory_id, content)
assert "referenced_memory_id" in result, (
f"Invariant violated: modify_memory({user_id=}, {memory_id=}, {content=!r}) "
f"returned result without referenced_memory_id: {result!r}"
)
```
**Assertion hygiene:**
- Always include the inputs in the assertion message (Hypothesis shrinks; the message is
what the user sees on failure).
- Use `assert <condition>, <message>` — never bare `assert <condition>` for invariants.
- Cluster related properties into one test only if they share setup; otherwise one property
per `@given`.
### Step 4 — wire into the feature
Save the generated file to `<project>/<tests-dir>/property/test_<feature>_invariants.py`
(create the `property/` subdirectory if absent). Update the project's pytest config if needed
to discover the new directory.
If the user is using the `feature-development` skill, append the path to
`specs/<feature>/STATUS.md` under "Tests" so the audit trail captures it.
## Edge cases this skill handles correctly
1. **Invariants that aren't actually properties.** Push back, don't generate a `@given` for "test that adding a memory returns 200" — that's a unit test, not a property.
2. **Invariants the user can't articulate.** Ask: "what makes this true *for any* valid input? What's the precondition? What's the postcondition?" If they can't answer, the invariant is too vague to property-test.
3. **Strategies that are too broad.** If `st.text()` finds the function crashes on a single emoji, the right move is usually to constrain the input contract (or fix the function) — not to narrow the strategy. Document the choice.
4. **Stateful properties** (e.g., "subsequent calls return increasing IDs"). Use `RuleBasedStateMachine` from `hypothesis.stateful` rather than `@given` for these.
## Calibration examples
### CRITICAL example — property catches a real bug
```
Invariant: for any string s, slugify(s) returns ASCII-only.
Generated test:
@given(text=st.text(alphabet=st.characters(blacklist_categories=("Cs",))))
@settings(max_examples=200)
def test_slugify_returns_ascii_only(text):
result = slugify(text)
assert all(ord(c) < 128 for c in result), \
f"Non-ASCII in slugify({text!r}): {result!r}"
Hypothesis finds: input="①" → slugify returns "①" (no NFKD normalization).
This is a real bug a benign-test never finds.
```
### MEDIUM example — property is correct but not the right tool
```
Invariant: page count = ceil(total_items / page_size).
Property test:
@given(total=st.integers(min_value=0, max_value=10000),
page_size=st.integers(min_value=1, max_value=1000))
def test_page_count_formula(total, page_size):
assert paginate(total, page_size).count == math.ceil(total / page_size)
Honest assessment: this works but a 4-line example-based test covers it equally well.
Property tests earn their keep when input space is large and weird; pagination math
is small and well-defined. Recommend example-based here.
```
### DO NOT FLAG example
```
Invariant: send_email(to, subject, body) always sends an email.
Why this isn't a property: side-effect-only, no observable post-condition. Property
tests need observable invariants. Stub the email service and assert the stub was
called — that's a regular test, not a property test.
```
## Output checklist
Before declaring the generated file done:
- [ ] One `@given(...)` per invariant in the spec
- [ ] Every `@given` has explicit `@settings(max_examples=...)` (default 200; 1000 for security)
- [ ] Every input strategy has a `max_size` or `max_value` (no unbounded)
- [ ] Every assertion includes the inputs in the failure message
- [ ] Test file imports the function under test (don't fake the module)
- [ ] Test file has a docstring naming the source spec
- [ ] If `feature-development` skill is in play, the path is referenced from `specs/<feature>/STATUS.md`
No comments yet. Be the first to comment!