Property Test Generator

Name: Property Test Generator
Author: neuralforge-labs
ASecurity
Use this skill when the user has a list of behavioral invariants for a function/module/feature and wants property-based tests generated from them. Triggers on phrases like "generate property tests", "make hypothesis tests for X", "property-test these invariants", "fuzz-test this function", or when the feature-development skill's Stage 1 spec audit identifies invariants worth exploring with Hypothesis. ALSO use proactively if you read a spec_audit.md that lists invariants without corresponding...
2 stars
0 votes
0 copies
3 views
Added 5/28/2026
developmentpythongosecurity
Works with

cli
Security Analysis

A100/100
Scanned 5/28/2026
Install via CLI
$openskills install neuralforge-labs/tlmforge
Files
SKILL.md
---
name: property-test-generator
description: |
  Use this skill when the user has a list of behavioral invariants for a function/module/feature
  and wants property-based tests generated from them. Triggers on phrases like
  "generate property tests", "make hypothesis tests for X", "property-test these invariants",
  "fuzz-test this function", or when the feature-development skill's Stage 1 spec audit
  identifies invariants worth exploring with Hypothesis. ALSO use proactively if you read a
  spec_audit.md that lists invariants without corresponding property tests — generating them
  now (during spec audit) is cheaper than discovering edge cases after impl.

  Output: a Python test file using `hypothesis` (default) or `dart_check` for Flutter, with
  one `@given(...)` test per invariant, calibrated input strategies, and shrink-friendly
  assertions. The skill does NOT run the tests — it generates them, the user runs them.
---

# Property test generator

This skill turns English invariants into mechanical input-space exploration. Property-based
tests catch the bugs the reviewer trio misses BECAUSE they explore inputs you didn't think
to enumerate. Especially valuable for memory/encryption/auth/PII code paths where input
spaces are large and the failure mode is "weird input → silent corruption."

## When to use

**Trigger conditions:**
1. User explicitly asks: "generate property tests", "make Hypothesis tests for X", "fuzz-test this".
2. A `spec_audit.md` lists invariants in a section (typically labeled "Invariants" or
   "Properties this must satisfy"). Generating tests at spec-audit time is the cheapest place
   to do it — invariants that turn out un-testable are signal that the spec is fuzzy.
3. The user is modifying a function in a security-sensitive area (auth, encryption, PII,
   payments) — propose property tests proactively even if not asked.

**When NOT to use:**
- Pure UI / integration tests (property tests don't fit click-flow surfaces well — Playwright is the right tool).
- Unit tests for trivial functions where example-based tests are sufficient and clearer.
- If the user just wants 1-2 tests for a happy path (overhead of `@given` + strategy auth doesn't earn its keep for tiny surfaces).

## Recipe

### Step 1 — extract invariants

Read the user-provided spec / spec_audit.md / invariant list. Each invariant should fit the
shape: "**For all <input X> satisfying <precondition>, <function Y> produces output that
satisfies <postcondition>.**"

Examples:
- "For any valid user, `modify_memory(user_id, memory_id, content)` always leaves a `referenced_memory_id` field on the result."
- "For any string input, `slugify(text)` returns a string of length ≤ 80."
- "For any positive integer N, `paginate(items, N)` returns at most N items per page."

If the invariant doesn't fit the for-all shape, push back: "this looks like a specific test
case, not a property. Should we make it `@pytest.mark.parametrize` or refine the property?"

### Step 2 — calibrate input strategies

Each invariant's input gets a Hypothesis strategy that reflects realistic input space:

| Real-world input | Hypothesis strategy |
|---|---|
| User ID (UUID) | `st.uuids()` or `st.from_regex(uuid_re)` |
| Email | `st.emails()` |
| Free text | `st.text(min_size=0, max_size=1000)` |
| Integer in range | `st.integers(min_value=1, max_value=10000)` |
| Memory content | `st.text(alphabet=st.characters(blacklist_categories=("Cs",)))` (no surrogates) |
| List of items | `st.lists(item_strategy, max_size=100)` |
| Custom dataclass | `st.builds(MyClass, field1=..., field2=...)` |

**Default strategy hygiene:**
- Cap `max_size` for collections — unbounded sizes burn CI time on shrink.
- Exclude surrogates and control chars unless the invariant specifically targets them.
- Use `st.from_regex()` only when the format is well-defined; prefer typed strategies otherwise.
- Add `@settings(max_examples=200)` for fast loops; bump to 1000 for security-critical paths.

### Step 3 — emit the test file

Format: one Python file per feature, multiple `@given` tests inside.

```python
"""Property-based tests for <feature>. Generated by property-test-generator skill.

Each test corresponds to an invariant in specs/<feature>/spec_audit.md.
"""
from hypothesis import given, settings, strategies as st

from <module> import <function_under_test>


@given(user_id=st.uuids(), memory_id=st.text(min_size=1, max_size=64), content=st.text(max_size=1000))
@settings(max_examples=200)
def test_modify_memory_preserves_referenced_memory_id(user_id, memory_id, content):
    """Invariant: for any valid user, modify_memory always leaves referenced_memory_id."""
    # Precondition: a memory exists for user_id+memory_id
    create_memory(str(user_id), memory_id, "initial content")
    # Property: after modify, the result has referenced_memory_id
    result = modify_memory(str(user_id), memory_id, content)
    assert "referenced_memory_id" in result, (
        f"Invariant violated: modify_memory({user_id=}, {memory_id=}, {content=!r}) "
        f"returned result without referenced_memory_id: {result!r}"
    )
```

**Assertion hygiene:**
- Always include the inputs in the assertion message (Hypothesis shrinks; the message is
  what the user sees on failure).
- Use `assert <condition>, <message>` — never bare `assert <condition>` for invariants.
- Cluster related properties into one test only if they share setup; otherwise one property
  per `@given`.

### Step 4 — wire into the feature

Save the generated file to `<project>/<tests-dir>/property/test_<feature>_invariants.py`
(create the `property/` subdirectory if absent). Update the project's pytest config if needed
to discover the new directory.

If the user is using the `feature-development` skill, append the path to
`specs/<feature>/STATUS.md` under "Tests" so the audit trail captures it.

## Edge cases this skill handles correctly

1. **Invariants that aren't actually properties.** Push back, don't generate a `@given` for "test that adding a memory returns 200" — that's a unit test, not a property.
2. **Invariants the user can't articulate.** Ask: "what makes this true *for any* valid input? What's the precondition? What's the postcondition?" If they can't answer, the invariant is too vague to property-test.
3. **Strategies that are too broad.** If `st.text()` finds the function crashes on a single emoji, the right move is usually to constrain the input contract (or fix the function) — not to narrow the strategy. Document the choice.
4. **Stateful properties** (e.g., "subsequent calls return increasing IDs"). Use `RuleBasedStateMachine` from `hypothesis.stateful` rather than `@given` for these.

## Calibration examples

### CRITICAL example — property catches a real bug
```
Invariant: for any string s, slugify(s) returns ASCII-only.

Generated test:
@given(text=st.text(alphabet=st.characters(blacklist_categories=("Cs",))))
@settings(max_examples=200)
def test_slugify_returns_ascii_only(text):
    result = slugify(text)
    assert all(ord(c) < 128 for c in result), \
        f"Non-ASCII in slugify({text!r}): {result!r}"

Hypothesis finds: input="①" → slugify returns "①" (no NFKD normalization).
This is a real bug a benign-test never finds.
```

### MEDIUM example — property is correct but not the right tool
```
Invariant: page count = ceil(total_items / page_size).

Property test:
@given(total=st.integers(min_value=0, max_value=10000),
       page_size=st.integers(min_value=1, max_value=1000))
def test_page_count_formula(total, page_size):
    assert paginate(total, page_size).count == math.ceil(total / page_size)

Honest assessment: this works but a 4-line example-based test covers it equally well.
Property tests earn their keep when input space is large and weird; pagination math
is small and well-defined. Recommend example-based here.
```

### DO NOT FLAG example
```
Invariant: send_email(to, subject, body) always sends an email.

Why this isn't a property: side-effect-only, no observable post-condition. Property
tests need observable invariants. Stub the email service and assert the stub was
called — that's a regular test, not a property test.
```

## Output checklist

Before declaring the generated file done:

- [ ] One `@given(...)` per invariant in the spec
- [ ] Every `@given` has explicit `@settings(max_examples=...)` (default 200; 1000 for security)
- [ ] Every input strategy has a `max_size` or `max_value` (no unbounded)
- [ ] Every assertion includes the inputs in the failure message
- [ ] Test file imports the function under test (don't fake the module)
- [ ] Test file has a docstring naming the source spec
- [ ] If `feature-development` skill is in play, the path is referenced from `specs/<feature>/STATUS.md`
Property Test Generator

Works with

Security Analysis

Attribution

Comments (0)