Skill

Name: Skill
Author: ZhenyuanPAN822
Convert a two-person chat history into a relationship K-line chart. When invoked, walk the user through providing a chat file, run the local preprocessing pipeline, score every "turn" (consecutive same-sender block) with v3.1 semantic deltas, expand back to message-level scored.jsonl, and hand off to the local web frontend for visualization.
72 stars
0 votes
0 copies
0 views
Added 5/26/2026
data-aipythongobashapifrontend
Works with

cliapi
Install via CLI
$openskills install ZhenyuanPAN822/relationship-candlestick-lab
Files
SKILL.md
---
name: rcl-score
description: |
  Convert a two-person chat history into a relationship K-line chart.
  When invoked, walk the user through providing a chat file, run the local
  preprocessing pipeline, score every "turn" (consecutive same-sender block)
  with v3.1 semantic deltas, expand back to message-level scored.jsonl, and
  hand off to the local web frontend for visualization.
---

# Relationship Candlestick Lab · Scoring Skill (v3.1)

This skill has TWO operating modes. Detect which one you're in by looking
at the first user message:

- **Entry Mode** — user just typed `/rcl-score` (or asked you to "score
  my chat / 画 K 线") with no input file yet. Run the **Entry Protocol**
  below.
- **Batch Scoring Mode** — your user message starts with "Score each TURN
  below" or contains a `=== TURNS ===` block (this is how the API pipeline
  invokes you). Skip the Entry Protocol and jump directly to the
  **Scoring Rules** section, output JSONL only.

---

## ⓪ Entry Protocol（仅当用户直接调用 skill 时）

### 对用户说的话（仅这一段对外输出，Step 1–4 不要复述给用户）

回复用户（中文，简洁，5–8 行以内）：

> 我会把你的聊天记录画成 K 线图，每根 K 线代表一段时间的关系强度变化。
>
> **请准备一个聊天导出文件**（任选其一）：
> - 微信导出 CSV（推荐，pywxdump / Memotrace 都可以）
> - 或 JSON / 纯文本（每行 `YYYY-MM-DD HH:MM[:SS] sender: message`）
>
> **把文件的绝对路径粘贴给我。**
>
> ⚠️ **模型 / effort 推荐**：
> - **Claude**：`Sonnet 4.6` + effort `low`
> - **GPT 系列**：`GPT-5 / 5.4 / 5.5` + effort `low`
>
> ⏱ **耗时预估**：每 1000 条消息约 **7 分钟**（取决于模型 / effort）。
>
> 全程在本地处理，聊天数据不上传云端。

等用户回复文件路径再继续。

### Step 1–4（你内部按顺序执行，不需要把命令贴给用户）

**Step 1 — CSV → messages.jsonl**

`<job_name>` 用源文件名（去后缀）。

```bash
python scripts/wechat_to_standard.py \
    --input "<用户给的路径>" \
    --output "output/_jobs/<job_name>/messages_standard.csv" \
    --me me --them other
```

然后 Python 把标准 CSV 转 messages.jsonl（每行带 `i` 索引）。
如果用户给的是 JSON/TXT，改用 `python -m relationship_candlestick.cli prepare ...`。

**Step 2 — 预处理：剔单字 + 聚合 turns**

```bash
python scripts/preprocess_turns.py \
    --input  output/_jobs/<job_name>/messages.jsonl \
    --out-dir output/_jobs/<job_name>/ \
    --gap-min 10
```

**Step 3 — 评分：你（LLM）按下面的 Scoring Rules 给每个 turn 打分**

读取 `output/_jobs/<job_name>/turns.jsonl`，每次取 30–60 个 turn 一批
（effort low → 60 / medium → 40 / high → 20），按 **Scoring Rules** 输出
JSONL，append 到 `output/_jobs/<job_name>/turns_scored.jsonl`。

**🚫 这一步绝对不要做的事（重要）：**

- **不要给用户出"A / B / C"菜单让他选打分方式。** Skill 触发后就只有一条路：你（IDE 里的 LLM）从头到尾把 turns 全部打完。没有分叉。
- **不要把 `scripts/score_turns_api.py` 作为选项推荐给用户。** 那个脚本是给"不走 skill"的 API 模式用户的，与 skill 用户**无关**。即使 repo 里能看到它，也当它不存在。
- **不要因为 turn 数多就停下来问"要不要继续"。** 500、1000、2000 turns 都按 effort 自动分批往下打，不要中途请示。如果担心耗时，启动前已经在 Entry Protocol 里告知"每 1000 条 ~7 分钟"——用户已经知道。
- **不要因为"会吃当前会话上下文"就推荐用户换流程。** 这就是 skill 的设计——上下文够用，旧批次的 JSONL 输出可以丢，框架靠 `i` 重新拼。
- **不要在打分中途总结"已经打了 X 批，剩下 Y 批，要不要继续"。** 闷头打完再说话。

**正确姿势：** 算好总批数 → 一批一批打 → append 到 turns_scored.jsonl → 全部完成后才进入 Step 4。整个 Step 3 期间不向用户输出任何对话——只调工具、只产 JSONL。

**Step 4 — 反扩展回 i 级**

```bash
python scripts/expand_turns.py \
    --turns        output/_jobs/<job_name>/turns.jsonl \
    --turns-scored output/_jobs/<job_name>/turns_scored.jsonl \
    --auto         output/_jobs/<job_name>/auto_scored.jsonl \
    --messages     output/_jobs/<job_name>/messages.jsonl \
    --out          output/_jobs/<job_name>/scored.jsonl
```

### 完成后对用户说的话（仅这一段对外输出）

> ✅ 评分完成！scored.jsonl 路径：
>
> `<绝对路径，例如 E:/.../output/_jobs/myjob/scored.jsonl>`
>
> **下一步**：打开 K 线网页（如果服务还没起，请运行 `python -m relationship_candlestick.cli serve`）
>
> 1. 浏览器访问 http://127.0.0.1:7000
> 2. 选 **"已有 scored.jsonl"** 流程
> 3. 把上面路径粘贴到 **scored.jsonl 绝对路径** 输入框
> 4. 点 **"▶ 加载并显示 K 线"**

完成 Entry Protocol 后退出，**不要继续按规则给消息打分** — 评分由后续的
Batch Scoring Mode 负责。

> **关于 repo 里的 `scripts/score_turns_api.py`（给执行 skill 的 LLM 看）：**
> 那个脚本属于**另一条独立流程**——是给那些**不调用 skill**、直接通过前端
> ingest CSV、由 server 自动并发外部 API 的用户用的。**他们根本不会触发
> 这个 skill 文件。** 反过来：**用户既然触发了 skill，就意味着他选择了"由 IDE
> 里的 LLM（你）亲自打分"这条路。** 不要把那个脚本拿出来给用户当备选项，
> 也不要因为 turns 多就建议用户去跑那个脚本——那等于你拒绝执行被分配的任务。

---

# Scoring Rules · v3.1（适用于 Batch Scoring Mode）

You are the **semantic scorer** of a relationship-K-line system. Your job is
to read messages **in order, in context** and emit two **relative deltas** per
message — never absolute scores. The framework does all arithmetic, recurrence,
and time decay.

The whole point of using Claude here is **contextual judgment**. Sarcasm,
callbacks, awkward silences, and inside jokes are exactly what you must read.

---

## 🚨 Most important principle: every message moves the needle

**No two consecutive messages are exactly the same temperature.** Even when
the topic and mood feel "identical", real conversations have constant
micro-variation:

- A reply is slightly warmer or cooler than the message it answers
- A continuation message is slightly weaker than the original (loss of momentum)
- An emoji-only reply is slightly lighter than a text reply
- A "嗯" after substance is a small cooling
- A "哈哈" after partner's joke is a small acknowledgment lift

**Default to small nonzero deltas (±0.2 ~ ±0.5), not 0.**

`0, 0` is a strong claim that means "this message contributes literally nothing —
identical temperature to prior AND to atmosphere". This should be **rare**,
reserved for cases like:
- A message inside an opaque sub-thread (file path, link, phone number)
- A literal repeat ("嗯" "嗯" "嗯" — even then the third is -0.2, not 0)

If you find yourself outputting `0, 0` for more than ~15% of messages,
**you are under-scoring**. Real chats have constant ebb and flow.

---

## Core principle: relative, not absolute

You **do not** score "this message has affection 5". There's no objective
anchor for that.

You **do** score "this message is +1 warmer than the prior message" and
"this message is +0.5 vs the recent atmosphere". Both are relative
comparisons you can actually make confidently.

Two reference frames:

- `delta_vs_prior` — change vs the **immediately previous** message
- `delta_vs_atmosphere` — change vs the **mean of recent messages**

The framework blends both: `delta_blend = 0.5 * vs_prior + 0.5 * vs_atmosphere`.

---

## Input

Per API call you receive:

```json
{
  "previous_relationship_index": 67.4,
  "atmosphere": {
    "recent_avg_index":  65.0,
    "recent_avg_delta":  0.3,
    "window_size":       20
  },
  "context_already_scored": [
    {"i":..., "ts":..., "sender":..., "text":...,
     "delta_vs_prior":..., "delta_vs_atmosphere":..., "primary_dim":..., "idx":...},
    ...
  ],
  "new_messages_to_score": [ {"i":..., "ts":..., "sender":..., "text":...}, ... ]
}
```

Time gaps matter: the framework decays the index based on real time between
messages, so don't manually penalize "long silence" — score the **content** only.

---

## Output (one line per input message, same order)

```json
{
  "i": 42,
  "delta_vs_prior":      +1.5,
  "delta_vs_atmosphere": +0.8,
  "primary_dim": "affection",
  "tags": ["intimacy","care"],
  "rationale": "比上条暖一点；vs 整体氛围也是小升"
}
```

`primary_dim` and `tags` are for display/explanation — they do NOT enter the math.

---

## Delta scale (with explicit micro-fluctuation guidance)

| Magnitude | Meaning | Frequency in real chat |
|---|---|---|
| `±0.2 ~ ±0.5` | **MICRO-FLUCTUATION** — natural ebb/flow | **MOST messages live here** |
| `±0.5 ~ ±1.5` | Subtle but clear shift | ~25% |
| `±2 ~ ±4` | Clearly noticeable change | ~10% |
| `±5 ~ ±8` | Big move (probe / invitation / repair / conflict) | ~3% |
| `±9 ~ ±15` | Rare landmark events (告白 / 分手) | < 1% |
| `0, 0` | Literally no change — use sparingly | **< 15%** |

**Distribution check**: In a healthy 100-message scoring batch, you should
have roughly:
- ~15 messages with `0, 0`
- ~60 messages with `±0.2 ~ ±0.5` (micro-fluctuations)
- ~20 messages with `±0.5 ~ ±2`
- ~5 messages with `±2+`

If your output is mostly 0,0, you're flattening reality.

---

## Continuation heuristics (give you concrete starting points)

For messages that don't represent a clear "event", use these defaults
then adjust based on context:

| Pattern | vs_prior default | vs_atmosphere default |
|---|---|---|
| Filler "嗯/哦/好/okok/豪德/对/yes" after substance | **-0.3 ~ -0.5** | depends on atmosphere |
| Emoji-only reply after text reply | **-0.3 ~ -0.4** | depends |
| "哈哈" / "笑死" after partner's joke | **+0.3 ~ +0.5** | usually +0.2 |
| "哈哈" after self joke | 0 ~ +0.2 | 0 |
| Continuation in same topic, similar tone | **±0.2 ~ ±0.4** | ±0.2 |
| Topic switch (no emotional content) | **-0.5 ~ -1** | depends |
| Logistical question | **-0.2 ~ -0.5** vs prior emotional msg | depends |
| First reply to a flirt | depends on warmth: -1 (cold) to +2 (matched) | bigger if breaks atmosphere |
| Pure timestamp/link/file | -0.3 (drains energy) | -0.2 |

These are STARTING points. Read context, then adjust.

### vs_atmosphere quick guide

| This message | atmosphere has been... | vs_atmosphere |
|---|---|---|
| Logistic | hot/intimate (idx ~80+) | **-0.5 ~ -1.5** (below baseline) |
| Logistic | cool (idx ~50) | 0 ~ -0.3 |
| Flirty | hot baseline | 0 ~ +0.5 (just maintains) |
| Flirty | cool baseline | **+2 ~ +5** (breaks up) |
| Cold reply | hot baseline | **-2 ~ -5** (breaks down) |
| Cold reply | already cool | -0.3 ~ -1 |

---

## The 10 dimensions (semantic dictionary, only used for `primary_dim` tag)

### Core 7

**1. `affection`** — 情感温度
- 推涨：宝宝/想你/喜欢你/共笑哈哈哈/暖意问候/晚安宝贝
- 推跌：单字"哦/嗯/随便"/呵呵/无所谓/已读不回式短拒

**2. `engagement`** — 互动投入
- 推涨：长回复/主动延展/秒回/连发深入分享
- 推跌：敷衍短回/转移话题/拖延回/把话终结

**3. `care`** — 关心
- 推涨：关心身体/记得对方细节/迁就对方处境
- 推跌：漠视处境/嘲讽对方状态/求助被无视

**4. `conflict`** — 冲突轴（双向）
- 推涨（修复）：道歉/示弱/让步/解释/承认错误
- 推跌（攻击）：阴阳/翻旧账/最后通牒/冷战/人身攻击

**5. `tension`** — 暧昧张力
- 推涨：试探/吃醋/打闹/暧昧推进
- 推跌：明确划界/拒推进/把对方放友区

**6. `investment`** — 关系投入
- 推涨：主动约/锁定时间地点/送礼/见面/帮跑腿
- 推跌：拒邀约/取消计划/找借口

**7. `awkwardness`** — 默契度（反向：正分=默契）
- 推涨：接梗/共笑/流畅来回
- 推跌：尬聊/答非所问/突兀转场

### 新增 3 个

**8. `future_orientation`** — 未来导向
- 推涨：「等寒假回来」/「考完整起来」/「以后」/共同长期计划
- 推跌：明确切短「先到这吧」/拒绝谈未来

**9. `vulnerability`** — 暴露脆弱
- 推涨：分享秘密/感情史/弱点/家事/收入/恐惧
- 推跌：回避深入/筑墙/拒绝被了解

**10. `shared_identity`** — 内圈语言
- 推涨：用昵称/用内梗/「我们」/复用旧梗（首次或刷新时）
- 推跌：刻意外圈称呼/「同学请问你是哪位」式陌生化
- 注意：长期高频后会基线化——第 50 次「宝宝」是 0~+0.2，不是新增

---

## Worked examples (showing micro-fluctuations)

### A. 密集 flirt 段（atmosphere ~85）
| 上一条 | 这一条 | vs_prior | vs_atmo | 解释 |
|---|---|---|---|---|
| "想你了宝宝" (+3) | "我也想你" | +0.5 | 0 | 接住 +0.5（互文加分）|
| "我也想你" | "嘿嘿" | -0.4 | -0.2 | emoji 化轻 |
| "嘿嘿" | "干嘛呢" | -0.5 | -1 | 转日常，弱于氛围 |
| "干嘛呢" | "刚下课" | -0.3 | -1 | 平淡延续 |
| "刚下课" | "好巧 我也下课了" | +0.3 | -0.5 | 接住但比 flirt 弱 |
| "好巧 我也下课了" | "[旺柴]" | -0.4 | -0.7 | emoji 弱化 |

→ 这段共 6 条消息全部 nonzero，index 自然有上下波动 → 出现影线

### B. 冷战段（atmosphere ~40）
| 上一条 | 这一条 | vs_prior | vs_atmo | 解释 |
|---|---|---|---|---|
| "..." (-1) | "嗯" | +0.2 | -0.3 | 终于回了，轻微回暖 |
| "嗯" | "在吗" | +0.5 | +0.5 | 主动开话 |
| "在吗" | "嗯" | -0.5 | -0.3 | 又冷下去 |
| "嗯" | "对不起 是我不好" | +5 | +6 | 大修复 |
| "对不起 是我不好" | "..." | -2 | -1 | 没接住，短拒 |
| "..." | "你怎么这样" | -1 | -1 | 持续冷 |

→ 即使整段冷，仍有 micro-fluctuation

### C. 单条事件
| 场景 | vs_prior | vs_atmo |
|---|---|---|
| atmosphere 80，突然 "我们就是朋友" | -8 | -10 |
| atmosphere 30，突然 "你会不会喜欢我这种人" | +5 | +8 |
| atmosphere 50，普通"刚下课" | -0.3 | 0 |
| 12/26 后 4 天没聊，重启首条 "在干嘛" | +0.5 | -1 (氛围已凉) |

---

## Hard rules

1. **Stay in input order. Output `i` exactly as given.**
2. **Both deltas are signed real numbers (1 decimal).** Most are NONZERO small (±0.2~0.5).
3. **`0, 0` is rare** — < 15% of messages. If you exceed that, you're flattening.
4. **Don't compute index, OHLC, shadows, volume.** Framework's job.
5. **Don't predict outcomes** ("she likes you", "this will work").
6. **Don't echo timestamp/sender/message** — framework re-joins by `i`.
7. **One JSON object per line, no markdown fences.**
8. **No upper/lower bound on deltas** — index can break 100 or go below 0.
9. **Lean toward small nonzero (±0.3) when uncertain** — NOT toward 0.
10. **Self-check distribution** — re-scan your output before sending; if mostly 0,0, redo.

---

## What the framework does after you

```
gap_hours   = real time between this msg and prior msg
decay       = 1 - exp(-gap_hours / 72)              # 72h half-life-ish
delta_blend = 0.5 * delta_vs_prior + 0.5 * delta_vs_atmosphere

index_t = prev_index * (1 - decay) + 50 * decay + delta_blend
        # NO CLAMP. Can go above 100 or below 0.
```

Time decay automatically pulls long-silent index back toward 50 baseline
(no need for you to model that). Your only job is the two relative deltas
plus dim/tags/rationale for display.

The candlestick chart's shadows and bodies form ONLY when there's within-period
variance in your scores. **If you give all 0,0, the chart will be a flat line.
That's why micro-fluctuations matter.**
Skill

Works with

Attribution

Comments (0)