'"Preventing lookahead bias in backtesting through strict causality enforcement"
Scanned 6/12/2026
Install via CLI
openskills install paulpas/agent-skill-router---
name: backtest-lookahead-bias
compatibility: opencode
completeness: 95
content-types:
- code
- guidance
- config
- do-dont
description: '"Preventing lookahead bias in backtesting through strict causality enforcement"
time-based validation, and comprehensive detection frameworks.'
license: MIT
maturity: stable
metadata:
domain: trading
output-format: code
related-skills: backtest-position-exits, backtest-sharpe-ratio, backtest-walk-forward,
paper-performance-attribution fundamentals-trading-plan
role: implementation
scope: implementation
triggers: backtest lookahead bias, backtest-lookahead-bias, backtesting, preventing,
strict, unit tests, testing, test automation
archetypes:
- tactical
anti_triggers:
- brainstorming
- vague ideation
- no risk management
response_profile:
verbosity: low
directive_strength: high
abstraction_level: operational
version: "1.0.0"
---
**Role:** Backtest Quality Engineer
**Philosophy:** No-Future-Data Policy - backtests must be strictly causal with no access to future data during signal generation. Every calculation must only use information available at or before the decision time.
## Key Principles
1. **Strict Causality**: Every signal and indicator must be computed using only data available up to the current timestamp. Future data points cannot influence current decisions.
2. **Time-Based Validation**: Implement automated checks that verify data alignment by timestamp and ensure no backward-looking references exist in signal generation logic.
3. **Calculation Delay Detection**: Identify and flag any patterns where indicators use future data through rolling window analysis and lag verification.
4. **Walk-Forward Testing**: Implement walk-forward optimization that simulates real trading by retraining models on historical data and testing on unseen future data.
5. **Index Component Bias Prevention**: Ensure backtests account for delisted components and avoid survivorship bias by including historical component data and weights.
## Implementation Guidelines
### Structure
- Core logic: `backtesting/lookahead_detector.py`
- Helper functions: `backtesting/utils/time_alignment.py`
- Tests: `tests/backtesting/test_lookahead_bias.py`
### Patterns to Follow
- Use pandas `shift()` operations explicitly to demonstrate lag awareness
- Implement data validation at entry points before any signal generation
- Maintain timestamp-indexed DataFrames throughout the pipeline
- Include comprehensive logging of all bias detection events
## Adherence Checklist
Before completing your task, verify:
- [ ] All indicator calculations use `shift(1)` or equivalent for proper lag
- [ ] Walk-forward analysis uses only in-sample data for out-of-sample testing
- [ ] Time-based validation confirms no future data leakage
- [ ] Survivorship bias checks include delisted components in historical data
- [ ] All backtest reports include lookahead bias detection results
## Code Examples
### Correct Backtesting Implementation
```python
# Complete Python implementation (50-100 lines)
# Includes proper signal generation with lookahead bias prevention
from dataclasses import dataclass
from typing import List, Dict, Optional, Tuple
from enum import Enum
import numpy as np
import pandas as pd
from datetime import datetime
import warnings
class SignalType(Enum):
"""Types of trading signals."""
MOVING_AVERAGE_CROSS = "moving_avg_cross"
RSI = "rsi"
BOLLINGER_BANDS = "bollinger_bands"
MACD = "macd"
@dataclass
class SignalResult:
"""Result of signal generation."""
timestamp: datetime
signal_type: SignalType
signal_value: float
is_valid: bool
lookahead_bias_detected: bool = False
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
"timestamp": self.timestamp.isoformat() if hasattr(self.timestamp, 'isoformat') else str(self.timestamp),
"signal_type": self.signal_type.value,
"signal_value": self.signal_value,
"is_valid": self.is_valid,
"lookahead_bias_detected": self.lookahead_bias_detected
}
class LookaheadSafeIndicator:
"""
Base class for indicators that ensures no lookahead bias.
All indicators must compute values using only available data.
"""
def __init__(self, window_size: int = 20):
self.window_size = window_size
self.history: List[SignalResult] = []
def compute_signal(self, data: pd.Series) -> pd.Series:
"""
Compute indicator signal with proper lag.
Args:
data: Price series indexed by timestamp
Returns:
Signal series with same index as data
"""
if len(data) < self.window_size:
return pd.Series([np.nan] * len(data), index=data.index)
# Correct: Use shift(1) to avoid lookahead bias
# The signal for time t is computed using data up to time t-1
signal = self._compute_raw_signal(data)
return signal.shift(1)
def _compute_raw_signal(self, data: pd.Series) -> pd.Series:
"""Subclasses implement actual signal computation."""
raise NotImplementedError
def generate_signal(
self,
data: pd.Series,
signal_type: SignalType
) -> SignalResult:
"""Generate single signal result with bias check."""
signal = self.compute_signal(data)
if len(signal) == 0:
return SignalResult(
timestamp=datetime.now(),
signal_type=signal_type,
signal_value=np.nan,
is_valid=False
)
latest_value = signal.iloc[-1]
is_valid = not pd.isna(latest_value)
result = SignalResult(
timestamp=data.index[-1] if len(data) > 0 else datetime.now(),
signal_type=signal_type,
signal_value=latest_value if is_valid else np.nan,
is_valid=is_valid
)
self.history.append(result)
return result
class MovingAverageCrossover(LookaheadSafeIndicator):
"""
Moving average crossover strategy with lookahead bias prevention.
"""
def __init__(self, fast_window: int = 10, slow_window: int = 30):
super().__init__(slow_window)
self.fast_window = fast_window
self.slow_window = slow_window
def _compute_raw_signal(self, data: pd.Series) -> pd.Series:
"""Compute moving average crossover signal."""
fast_ma = data.rolling(window=self.fast_window).mean()
slow_ma = data.rolling(window=self.slow_window).mean()
return fast_ma - slow_ma
def generate_signal(self, data: pd.Series) -> SignalResult:
"""Generate crossover signal."""
signal = self.compute_signal(data)
if len(signal) < 2:
return SignalResult(
timestamp=datetime.now(),
signal_type=SignalType.MOVING_AVERAGE_CROSS,
signal_value=np.nan,
is_valid=False
)
current = signal.iloc[-1]
previous = signal.iloc[-2]
# Detect crossover
signal_value = 0.0
is_valid = True
if pd.notna(current) and pd.notna(previous):
if previous <= 0 and current > 0:
signal_value = 1.0 # Bullish crossover
elif previous >= 0 and current < 0:
signal_value = -1.0 # Bearish crossover
else:
signal_value = 0.0 # No crossover
result = SignalResult(
timestamp=data.index[-1],
signal_type=SignalType.MOVING_AVERAGE_CROSS,
signal_value=signal_value,
is_valid=is_valid
)
self.history.append(result)
return result
class RSICalculator(LookaheadSafeIndicator):
"""RSI calculator with proper lag handling."""
def __init__(self, window: int = 14):
super().__init__(window)
self.window = window
def _compute_raw_signal(self, data: pd.Series) -> pd.Series:
"""Compute RSI with lookahead bias prevention."""
if len(data) < self.window + 1:
return pd.Series([np.nan] * len(data), index=data.index)
# Calculate price changes
delta = data.diff()
# Separate gains and losses
gain = (delta.where(delta > 0, 0)).rolling(window=self.window).mean()
loss = (-delta.where(delta < 0, 0)).rolling(window=self.window).mean()
# CalculateRSI
rs = gain / loss
rsi = 100 - (100 / (1 + rs))
return rsi
def generate_signal(self, data: pd.Series, overbought: float = 70, oversold: float = 30) -> SignalResult:
"""Generate RSI signal."""
rsi = self.compute_signal(data)
if len(rsi) == 0 or pd.isna(rsi.iloc[-1]):
return SignalResult(
timestamp=datetime.now(),
signal_type=SignalType.RSI,
signal_value=np.nan,
is_valid=False
)
latest_rsi = rsi.iloc[-1]
signal_value = 0.0
if latest_rsi > overbought:
signal_value = -1.0 # Overbought - potential sell
elif latest_rsi < oversold:
signal_value = 1.0 # Oversold - potential buy
result = SignalResult(
timestamp=data.index[-1],
signal_type=SignalType.RSI,
signal_value=signal_value,
is_valid=True
)
self.history.append(result)
return result
# Example usage and testing
if __name__ == "__main__":
# Generate sample price data
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=100, freq="D")
prices = pd.Series(
100 + np.random.randn(100).cumsum(),
index=dates,
name="price"
)
# Test moving average crossover
print("Testing Moving Average Crossover:")
ma_crossover = MovingAverageCrossover(fast_window=5, slow_window=20)
# Incremental processing (mimics real-time trading)
signals = []
for i in range(30, len(prices)):
historical_data = prices.iloc[:i+1]
result = ma_crossover.generate_signal(historical_data)
if result.is_valid:
signals.append(result.to_dict())
print(f"Generated {len(signals)} valid signals")
# Test RSI calculator
print("\nTesting RSI Calculator:")
rsi_calc = RSICalculator(window=14)
rsi_result = rsi_calc.generate_signal(prices)
print(f"Latest RSI Signal: {rsi_result.to_dict()}")
```
### Common Lookahead Bias Patterns and How to Detect Them
```python
# Common lookahead bias patterns and detection methods (50-100 lines)
from dataclasses import dataclass
from typing import List, Dict, Optional, Tuple, Set
import numpy as np
import pandas as pd
from datetime import datetime
from enum import Enum
class BiasType(Enum):
"""Types of lookahead bias."""
CALCULATION_BIAS = "calculation_bias"
DATA_BIAS = "data_bias"
INDEX_BIAS = "index_bias"
SIMULATION_BIAS = "simulation_bias"
@dataclass
class BiasReport:
"""Report of detected lookahead bias."""
bias_type: BiasType
severity: str # "critical", "high", "medium", "low"
description: str
affected_columns: List[str]
timestamp: datetime
recommendation: str
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
"bias_type": self.bias_type.value,
"severity": self.severity,
"description": self.description,
"affected_columns": self.affected_columns,
"timestamp": self.timestamp.isoformat() if hasattr(self.timestamp, 'isoformat') else str(self.timestamp),
"recommendation": self.recommendation
}
class LookaheadBiasDetector:
"""
Comprehensive lookahead bias detection system.
Identifies and classifies different types of bias in backtesting code.
"""
def __init__(self):
self.reports: List[BiasReport] = []
self.known_bias_patterns: Dict[str, str] = {
"future_mean": "Using rolling mean with center=True or without proper shift",
"future_std": "Standard deviation calculated on full dataset",
"future_max": "Maximum calculated with future data access",
"future_min": "Minimum calculated with future data access",
"cumsum": "Cumulative sum without proper lag application",
"cumprod": "Cumulative product without lag",
"shift_negative": "Using shift(-1) which looks into future",
"bfill": "Backward fill introducing future data",
"interpolate": "Interpolation using future points",
}
def detect_calculation_bias(self, df: pd.DataFrame) -> List[BiasReport]:
"""
Detect calculation bias in DataFrame operations.
Args:
df: DataFrame to analyze
Returns:
List of detected calculation biases
"""
reports = []
# Check for center=True in rolling operations
# This is a common pattern that causes lookahead bias
for col in df.columns:
if df[col].dtype in [np.float64, np.float32, np.int64, np.int32]:
# Check if column is centered (future data used)
if df[col].is_monotonic_increasing or df[col].is_monotonic_decreasing:
# Monotonic columns with smooth changes may indicate future bias
first_half_mean = df[col].iloc[:len(df)//2].mean()
second_half_mean = df[col].iloc[len(df)//2:].mean()
if abs(second_half_mean - first_half_mean) < abs(df[col].std()):
reports.append(BiasReport(
bias_type=BiasType.CALCULATION_BIAS,
severity="medium",
description=f"Column {col} shows potential smoothing bias. "
f"Consider if calculations use future data.",
affected_columns=[col],
timestamp=datetime.now(),
recommendation=f"Verify {col} doesn't use future data in calculation. "
f"Ensure proper lag with shift(1) if needed."
))
return reports
def detect_data_bias(self, df: pd.DataFrame, index_df: pd.DataFrame) -> List[BiasReport]:
"""
Detect data bias from index alignment issues.
Args:
df: Data DataFrame
index_df: Index DataFrame to check alignment
Returns:
List of detected data biases
"""
reports = []
# Check for columns that don't exist in historical index
for col in df.columns:
if col not in index_df.columns:
reports.append(BiasReport(
bias_type=BiasType.DATA_BIAS,
severity="high",
description=f"Column '{col}' not present in historical index data. "
f"May indicate survivorship bias.",
affected_columns=[col],
timestamp=datetime.now(),
recommendation="Include delisted components in historical data. "
"Use full index history with weight adjustments."
))
return reports
def detect_index_bias(self, portfolio_df: pd.DataFrame, index_df: pd.DataFrame) -> List[BiasReport]:
"""
Detect index component bias in portfolio回testing.
Args:
portfolio_df: Portfolio performance DataFrame
index_df: Index composition DataFrame
Returns:
List of detected index biases
"""
reports = []
# Check if index composition changes over time
if 'weight' in index_df.columns and 'component' in index_df.columns:
# Look for weight changes that might indicate bias
for component in index_df['component'].unique():
component_data = index_df[index_df['component'] == component]
# Check for weight dropping to zero (delisting)
if component_data['weight'].iloc[-1] == 0 and component_data['weight'].iloc[0] > 0:
# This component was likely dropped - ensure backtest handled correctly
reports.append(BiasReport(
bias_type=BiasType.INDEX_BIAS,
severity="high",
description=f"Component '{component}' shows delisting pattern. "
f"Ensure backtest accounts for delisted components.",
affected_columns=['weight'],
timestamp=datetime.now(),
recommendation="Include delisted component in backtest until delisting date. "
"Use total return with weight adjustment."
))
return reports
def detect_simulation_bias(self, backtest_results: pd.DataFrame) -> List[BiasReport]:
"""
Detect simulation bias in backtest results.
Args:
backtest_results: Backtest performance DataFrame
Returns:
List of detected simulation biases
"""
reports = []
# Check for unrealistically smooth equity curves
if 'equity' in backtest_results.columns:
equity = backtest_results['equity']
returns = equity.pct_change().dropna()
# Calculate Sharpe ratio using full data (lookahead)
if len(returns) > 1:
full_mean = returns.mean()
full_std = returns.std()
full_sharpe = full_mean / full_std if full_std != 0 else np.nan
# Compare with walk-forward Sharpe
half_point = len(returns) // 2
in_sample_sharpe = returns.iloc[:half_point].mean() / returns.iloc[:half_point].std()
out_of_sample_sharpe = returns.iloc[half_point:].mean() / returns.iloc[half_point:].std()
# If out-of-sample Sharpe is dramatically lower, possible lookahead bias
if abs(out_of_sample_sharpe) < abs(in_sample_sharpe) * 0.5:
reports.append(BiasReport(
bias_type=BiasType.SIMULATION_BIAS,
severity="high",
description="Out-of-sample Sharpe ratio significantly lower than in-sample. "
"Possible lookahead bias in strategy optimization.",
affected_columns=['equity'],
timestamp=datetime.now(),
recommendation="Implement walk-forward optimization. "
"Re-estimate parameters on rolling training set."
))
return reports
def run_all_checks(self, data: Dict) -> List[BiasReport]:
"""
Run all bias detection checks.
Args:
data: Dictionary containing:
- 'df': main DataFrame
- 'index_df': index composition
- 'portfolio_df': portfolio results
- 'backtest_results': backtest performance
Returns:
List of all bias reports
"""
all_reports = []
df = data.get('df')
index_df = data.get('index_df')
portfolio_df = data.get('portfolio_df')
backtest_results = data.get('backtest_results')
if df is not None:
all_reports.extend(self.detect_calculation_bias(df))
all_reports.extend(self.detect_data_bias(df, index_df if index_df is not None else pd.DataFrame()))
if index_df is not None:
all_reports.extend(self.detect_index_bias(portfolio_df if portfolio_df is not None else pd.DataFrame(), index_df))
if backtest_results is not None:
all_reports.extend(self.detect_simulation_bias(backtest_results))
self.reports.extend(all_reports)
return all_reports
def get_summary(self) -> Dict:
"""Get summary of all detected biases."""
severity_counts = {"critical": 0, "high": 0, "medium": 0, "low": 0}
bias_type_counts = {}
for report in self.reports:
severity_counts[report.severity] = severity_counts.get(report.severity, 0) + 1
bias_type_counts[report.bias_type.value] = bias_type_counts.get(report.bias_type.value, 0) + 1
return {
"total_biases": len(self.reports),
"by_severity": severity_counts,
"by_type": bias_type_counts
}
# Example usage
if __name__ == "__main__":
# Create sample data with potential biases
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=100, freq="D")
# Normal data
prices = pd.DataFrame({
'close': 100 + np.random.randn(100).cumsum(),
'volume': np.random.randint(1000000, 5000000, 100)
}, index=dates)
# Create biased data (centered mean - lookahead bias)
prices['biased_mean'] = prices['close'].rolling(window=20, center=True).mean()
# Index data
index_data = pd.DataFrame({
'component': ['AAPL', 'GOOGL', 'MSFT', 'AMZN'],
'weight': [0.3, 0.25, 0.25, 0.2],
'date': dates[-1]
})
# Backtest results
backtest = pd.DataFrame({
'equity': 100 * (1 + np.random.randn(100).cumsum() * 0.001).clip(lower=0),
'date': dates
})
# Run detection
detector = LookaheadBiasDetector()
reports = detector.run_all_checks({
'df': prices,
'index_df': index_data,
'backtest_results': backtest
})
print("Bias Detection Results:")
for report in reports:
print(f"\n{report.bias_type.value} ({report.severity}):")
print(f" Description: {report.description}")
print(f" Recommendation: {report.recommendation}")
print(f"\nSummary: {detector.get_summary()}")
```
### Lookahead Bias Detection Framework
```python
# Comprehensive lookahead bias detection framework (50-100 lines)
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Callable, Tuple
from enum import Enum
import numpy as np
import pandas as pd
from datetime import datetime
from abc import ABC, abstractmethod
class Severity(Enum):
"""Detection severity levels."""
CRITICAL = "critical"
HIGH = "high"
MEDIUM = "medium"
LOW = "low"
@dataclass
class DetectionResult:
"""Result of a single detection check."""
test_name: str
passed: bool
severity: Severity
message: str
details: Dict = field(default_factory=dict)
timestamp: datetime = field(default_factory=datetime.now)
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
"test_name": self.test_name,
"passed": self.passed,
"severity": self.severity.value,
"message": self.message,
"details": self.details,
"timestamp": self.timestamp.isoformat()
}
@dataclass
class DetectionSession:
"""Complete detection session results."""
start_time: datetime
end_time: Optional[datetime] = None
tests_passed: int = 0
tests_failed: int = 0
results: List[DetectionResult] = field(default_factory=list)
def to_dict(self) -> Dict:
"""Convert to dictionary."""
return {
"start_time": self.start_time.isoformat(),
"end_time": self.end_time.isoformat() if self.end_time else None,
"tests_passed": self.tests_passed,
"tests_failed": self.tests_failed,
"success_rate": self.tests_passed / (self.tests_passed + self.tests_failed) if (self.tests_passed + self.tests_failed) > 0 else 0,
"results": [r.to_dict() for r in self.results]
}
class BiasDetectionTest(ABC):
"""Abstract base class for bias detection tests."""
@abstractmethod
def run(self, data: pd.DataFrame) -> DetectionResult:
"""Run the detection test."""
pass
@property
@abstractmethod
def test_name(self) -> str:
"""Name of the test."""
pass
@property
@abstractmethod
def severity(self) -> Severity:
"""Severity level for this test."""
pass
class TimeAlignmentTest(BiasDetectionTest):
"""Test that data is properly aligned by time."""
@property
def test_name(self) -> str:
return "time_alignment"
@property
def severity(self) -> Severity:
return Severity.HIGH
def run(self, data: pd.DataFrame) -> DetectionResult:
"""Verify data is sorted by time and has no gaps."""
if not isinstance(data.index, pd.DatetimeIndex):
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="Data index is not datetime. Skipping time alignment check.",
details={"index_type": type(data.index).__name__}
)
# Check if sorted
is_sorted = data.index.is_monotonic_increasing
if not is_sorted:
return DetectionResult(
test_name=self.test_name,
passed=False,
severity=self.severity,
message="Data is not sorted by time index.",
details={"sample_dates": [str(d) for d in data.index[:5].tolist()]}
)
# Check for gaps
if len(data) > 1:
time_diffs = data.index.to_series().diff().dropna()
# Find unusual gaps (more than 2x expected frequency)
expected_freq = time_diffs.median()
large_gaps = (time_diffs > 2 * expected_freq).sum()
if large_gaps > 0:
return DetectionResult(
test_name=self.test_name,
passed=False,
severity=Severity.MEDIUM,
message=f"Found {large_gaps} time gaps larger than expected.",
details={"expected_frequency": str(expected_freq), "large_gaps": int(large_gaps)}
)
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="Time alignment check passed.",
details={"data_points": len(data), "date_range": f"{data.index[0]} to {data.index[-1]}"}
)
class LagVerificationTest(BiasDetectionTest):
"""Test that lagged operations use proper shift."""
@property
def test_name(self) -> str:
return "lag_verification"
@property
def severity(self) -> Severity:
return Severity.MEDIUM
def run(self, data: pd.DataFrame) -> DetectionResult:
"""Verify column values are properly lagged."""
results = {}
# Check each numeric column
for col in data.select_dtypes(include=[np.number]).columns:
series = data[col]
# Check if column has negative shift pattern (lookahead)
if len(series) > 1:
# Simple heuristic: if values are strongly correlated with future values
# that's suspicious for indicators
autocorr_1 = series.autocorr(lag=1)
autocorr_2 = series.autocorr(lag=2)
# If lag-2 is stronger than lag-1, may indicate lookahead
if abs(autocorr_2) > abs(autocorr_1) * 1.5 and abs(autocorr_2) > 0.5:
results[col] = {
"autocorr_1": float(autocorr_1),
"autocorr_2": float(autocorr_2),
"concern": "Higher than expected lag-2 autocorrelation"
}
if results:
return DetectionResult(
test_name=self.test_name,
passed=False,
severity=Severity.HIGH,
message="Potential lookahead bias detected in lagged operations.",
details=results
)
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="Lag verification passed.",
details={"columns_analyzed": len(data.select_dtypes(include=[np.number]).columns)}
)
class FutureDataLeakTest(BiasDetectionTest):
"""Test for data that appears to use future information."""
@property
def test_name(self) -> str:
return "future_data_leak"
@property
def severity(self) -> Severity:
return Severity.CRITICAL
def run(self, data: pd.DataFrame) -> DetectionResult:
"""Check for potential future data leakage."""
suspicious_patterns = {}
for col in data.columns:
series = data[col]
# Check for NaN values at start (indicating forward-fill from future)
first_valid_idx = series.first_valid_index()
if first_valid_idx is not None:
first_valid_pos = series.index.get_loc(first_valid_idx)
# More than 5% NaN at start could indicate forward-fill
if first_valid_pos > 0 and first_valid_pos / len(series) > 0.05:
suspicious_patterns[col] = {
"nan_at_start": first_valid_pos,
"total_length": len(series),
"nan_ratio": first_valid_pos / len(series)
}
# Check for values that perfectly match future calculations
for col in data.columns:
series = data[col]
# Check if current value equals next value (potential forward-fill)
if len(series) > 1:
same_as_next = (series == series.shift(-1)).sum()
if same_as_next / len(series) > 0.8:
suspicious_patterns[col] = {
"same_as_next": int(same_as_next),
"ratio": float(same_as_next / len(series)),
"pattern": "Many values equal to next (forward-fill detected)"
}
if suspicious_patterns:
return DetectionResult(
test_name=self.test_name,
passed=False,
severity=Severity.CRITICAL,
message="Potential future data leakage detected.",
details=suspicious_patterns
)
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="No future data leakage detected.",
details={"columns_checked": len(data.columns)}
)
class WalkForwardConsistencyTest(BiasDetectionTest):
"""Test consistency between training and testing periods."""
@property
def test_name(self) -> str:
return "walk_forward_consistency"
@property
def severity(self) -> Severity:
return Severity.MEDIUM
def run(self, data: pd.DataFrame) -> DetectionResult:
"""Check walk-forward consistency for equity or performance data."""
if 'equity' not in data.columns and 'pnl' not in data.columns:
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="No equity or PnL column found. Skipping walk-forward check.",
details={"available_columns": list(data.columns)}
)
metric_col = 'equity' if 'equity' in data.columns else 'pnl'
series = data[metric_col].dropna()
if len(series) < 20:
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="Insufficient data for walk-forward analysis.",
details={"data_points": len(series)}
)
# Split into training and testing periods
mid_point = len(series) // 2
in_sample = series.iloc[:mid_point]
out_of_sample = series.iloc[mid_point:]
# Calculate statistics
is_return = (in_sample.iloc[-1] - in_sample.iloc[0]) / in_sample.iloc[0] if in_sample.iloc[0] != 0 else 0
oos_return = (out_of_sample.iloc[-1] - out_of_sample.iloc[0]) / out_of_sample.iloc[0] if out_of_sample.iloc[0] != 0 else 0
# Calculate volatility
is_returns = in_sample.pct_change().dropna()
oos_returns = out_of_sample.pct_change().dropna()
is_vol = is_returns.std() * np.sqrt(252) if len(is_returns) > 1 else 0
oos_vol = oos_returns.std() * np.sqrt(252) if len(oos_returns) > 1 else 0
# Check for significant performance degradation
if abs(oos_return) < abs(is_return) * 0.5 and abs(is_return) > 0.1:
return DetectionResult(
test_name=self.test_name,
passed=False,
severity=Severity.MEDIUM,
message="Significant performance degradation in out-of-sample period.",
details={
"in_sample_return": float(is_return),
"out_of_sample_return": float(oos_return),
"in_sample_volatility": float(is_vol),
"out_of_sample_volatility": float(oos_vol)
}
)
return DetectionResult(
test_name=self.test_name,
passed=True,
severity=self.severity,
message="Walk-forward consistency check passed.",
details={
"in_sample_return": float(is_return),
"out_of_sample_return": float(oos_return),
"performance_ratio": float(abs(oos_return) / abs(is_return)) if abs(is_return) > 0 else 0
}
)
class LookaheadBiasDetector:
"""
Comprehensive lookahead bias detection framework.
Runs multiple tests to identify various types of lookahead bias.
"""
def __init__(self, tests: Optional[List[BiasDetectionTest]] = None):
"""Initialize detector with test suite."""
default_tests = [
TimeAlignmentTest(),
LagVerificationTest(),
FutureDataLeakTest(),
WalkForwardConsistencyTest()
]
self.tests = tests if tests is not None else default_tests
self.session: Optional[DetectionSession] = None
def run_session(self, data: pd.DataFrame) -> DetectionSession:
"""
Run complete detection session on data.
Args:
data: DataFrame to analyze
Returns:
DetectionSession with all results
"""
self.session = DetectionSession(start_time=datetime.now())
for test in self.tests:
try:
result = test.run(data)
self.session.results.append(result)
if result.passed:
self.session.tests_passed += 1
else:
self.session.tests_failed += 1
except Exception as e:
# Test failed to run
self.session.results.append(DetectionResult(
test_name=test.test_name,
passed=False,
severity=Severity.CRITICAL,
message=f"Test execution failed: {str(e)}",
details={"error": str(e)}
))
self.session.tests_failed += 1
self.session.end_time = datetime.now()
return self.session
def get_summary(self, session: Optional[DetectionSession] = None) -> Dict:
"""Get summary of detection session."""
session = session or self.session
if session is None:
return {"error": "No session available"}
return session.to_dict()
def get_failures(self, session: Optional[DetectionSession] = None) -> List[Dict]:
"""Get list of failed tests."""
session = session or self.session
if session is None:
return []
return [r.to_dict() for r in session.results if not r.passed]
# Example usage
if __name__ == "__main__":
# Generate sample data
np.random.seed(42)
dates = pd.date_range(start="2024-01-01", periods=100, freq="D")
# Create normal price data
prices = pd.DataFrame({
'close': 100 + np.random.randn(100).cumsum(),
'volume': np.random.randint(1000000, 5000000, 100)
}, index=dates)
# Add proper lagged indicators
prices['sma_20'] = prices['close'].rolling(window=20).mean().shift(1)
prices['ema_10'] = prices['close'].ewm(span=10, adjust=False).mean().shift(1)
# Create equity curve
equity = pd.DataFrame({
'equity': 100 * (1 + np.random.randn(100) * 0.01).cumprod().clip(lower=0),
'date': dates
})
equity.set_index('date', inplace=True)
# Run detection
detector = LookaheadBiasDetector()
print("=" * 60)
print("Lookahead Bias Detection Session")
print("=" * 60)
session = detector.run_session(prices)
summary = detector.get_summary(session)
print(f"\nSummary:")
print(f" Tests Passed: {summary['tests_passed']}")
print(f" Tests Failed: {summary['tests_failed']}")
print(f" Success Rate: {summary['success_rate']:.1%}")
failures = detector.get_failures(session)
if failures:
print(f"\nFailed Tests ({len(failures)}):")
for failure in failures:
print(f"\n [{failure['severity'].upper()}] {failure['test_name']}")
print(f" {failure['message']}")
else:
print("\n✓ All tests passed!")
print("\n" + "=" * 60)
print("Individual Test Results:")
print("=" * 60)
for result in session.results:
status = "✓ PASS" if result.passed else "✗ FAIL"
print(f"\n{status}: {result.test_name}")
print(f" Severity: {result.severity.value}")
print(f" Message: {result.message}")
if result.details:
print(f" Details: {result.details}")
```
## Common Mistakes to Avoid
### ❌ Mistake 1: Using Future Data in Entry Signals (Lookahead Bias)
```python
# BAD: Entry signal uses future price data
def calculate_entry_signal(today_index):
"""BUG: Uses future data to make TODAY's decision"""
# Looking FORWARD 10 bars to decide entry
future_prices = prices[today_index:today_index+10]
future_high = max(future_prices)
# If future price goes up, we say "yes, enter today"
# But we can't know the future on the actual day!
if future_high > entry_threshold:
return ENTRY_SIGNAL
# GOOD: Entry signal only uses historical data
def calculate_entry_signal(today_index):
"""Use only data up to and including TODAY"""
# Only use historical data (up to today_index)
historical_prices = prices[0:today_index+1]
historical_ma = moving_average(historical_prices, period=20)
# Decision based only on data available TODAY
if prices[today_index] > historical_ma:
return ENTRY_SIGNAL # Valid signal
```
**Why BAD is wrong:** You're peeking into the future. Real trading can't do this.
**Why GOOD works:** Only uses data available at decision time.
### ❌ Mistake 2: Peeking at Price Extremes in Stop Loss Testing
```python
# BAD: Stop loss evaluation uses price extremes within bar
def simulate_bar_trade(entry_price, stop_loss, bar):
"""BUG: Uses intraday extremes you don't know until end of bar"""
# This happens DURING the bar, we don't know it yet
if bar.low < stop_loss:
fill_price = bar.low # Perfect fill at the worst price?!
return EXIT_AT_FILL
return NO_EXIT
# GOOD: Realistic stop loss execution assumptions
def simulate_bar_trade(entry_price, stop_loss, bar):
"""Simulate realistic stop loss execution"""
# If price goes BELOW stop during bar
if bar.low < stop_loss:
# In reality, you get the first price below your stop,
# not the exact stop level (due to slippage)
realistic_fill = min(
stop_loss * 0.98, # Assume 2% slippage
bar.low # Or worse, if we dip below
)
return EXIT_AT_PRICE(realistic_fill)
return NO_EXIT
```
**Why BAD is wrong:** Stop losses don't always fill at exact price. You're too optimistic.
**Why GOOD works:** Assumes realistic slippage and gapping.
### ❌ Mistake 3: Optimization on Same Data You Test On
```python
# BAD: Optimize and test on identical dataset
def backtest_with_optimization(all_data):
"""BUG: Overfitting - parameters optimized on test set"""
best_sharpe = -999
best_params = None
# Search entire parameter space on ALL data
for sma_length in range(10, 200):
for rsi_period in range(5, 30):
# Backtest on SAME data you're optimizing
results = run_backtest(all_data, sma_length, rsi_period)
if results.sharpe > best_sharpe:
best_sharpe = results.sharpe
best_params = (sma_length, rsi_period)
return best_params # These are overfit to historical data!
# GOOD: Walk-forward validation separates training and testing
def backtest_with_walk_forward_validation(all_data, train_size=252, test_size=63):
"""Train on one period, test on next - never mixing"""
all_results = []
for i in range(0, len(all_data) - train_size - test_size, test_size):
# Split into training and testing periods
train_data = all_data[i:i+train_size]
test_data = all_data[i+train_size:i+train_size+test_size]
# Step 1: Optimize parameters ONLY on training data
best_params = optimize_parameters(train_data)
# Step 2: Evaluate ONLY on unseen test data
test_results = run_backtest(test_data, **best_params)
all_results.append(test_results)
return all_results # No overfitting!
```
**Why BAD fails:** Your parameters are overfit to historical data. Breaks on new data.
**Why GOOD works:** Always test on data you didn't optimize on.
## References
1. **Lo, A. W. (2002). "The Statistics of Sharpe Ratios"** - Discusses proper calculation of risk-adjusted returns without lookahead bias.
2. **Kelleher, J. D., & Tierney, B. (2018). "Guidelines for Conducting and Reporting Machine Learning Experiments"** - Provides comprehensive guidelines for avoiding data leakage in time series experiments.
3. **Meucci, A. (2009). "Risk and Asset Allocation"** - Contains detailed discussion on backtesting validation and the importance of walk-forward analysis.
4. **QuantConnect Documentation - "Backtesting Best Practices"** - Practical guide to avoiding common backtesting pitfalls including lookahead bias.
5. **QuantInsti Blog - "The Dangers of Look-Ahead Bias"** - Real-world examples of lookahead bias in algorithmic trading strategies and how to detect them.
---
---
## Constraints
### MUST DO
- Implement walk-forward validation: optimize on a training window, validate on a subsequent out-of-sample window
- Include realistic transaction costs (commissions, slippage, market impact) in all backtest calculations
- Use point-to-point or tick-level data when available; never use OHLCV with intra-bar assumptions for strategy logic
- Track and report key metrics: Sharpe ratio, max drawdown, win rate, profit factor, average trade duration, and Calmar ratio
- Implement survivorship-bias-free testing using a constant universe list that includes delisted symbols
### MUST NOT DO
- Do not optimize strategy parameters on the same data used for evaluation — always use out-of-sample or walk-forward testing
- Avoid assuming infinite liquidity in backtests; model order book constraints and partial fills for large positions
- Never include future information (survivorship bias, look-ahead) in backtest signals by indexing data correctly
- Do not report only win rate — always include risk-adjusted metrics alongside raw return statistics
- Avoid curve-fitting to historical data; cap the number of optimized parameters and validate with Monte Carlo permutation tests
## Live References
> Authoritative documentation links for this skill's domain. The model follows markdown links at load time to resolve external references and inline content.
- [Backtesting Pitfalls Guide](https://docs.quantconnect.com/tutorials/backtesting-pitfalls)
- [Survivorship Bias in Backtests](https://www.investopedia.com/terms/s/survivorship-bias.asp)
- [Data Snooping and P-Hacking](https://en.wikipedia.org/wiki/Data_dredging)
- [Proper Backtesting Data Handling](https://docs.quantconnect.com/tutorials/data-sources-and-format)
- [Avoiding Look-Ahead Bias in ML Models](https://machinelearningmastery.com/difference-between-a-test-set-and-validation-set/)
No comments yet. Be the first to comment!