Find academic papers and read them — searches real scholarly databases for prior work on a topic, then resolves and deep-reads the open-access PDF of any paper you pick. Use this skill WHENEVER the user wants to discover literature or read a study: "find recent papers on X", "what does the research say about Y", "find sources for my thesis on Z", "search for prior work on ...", or "summarise / read / extract the findings from this paper / this arXiv id / this DOI / this PDF". Trigger even whe...
Scanned 6/13/2026
Install via CLI
openskills install academicatstool-netizen/Cat_paper_search---
name: paper-search
description: >-
Find academic papers and read them — searches real scholarly databases for
prior work on a topic, then resolves and deep-reads the open-access PDF of any
paper you pick. Use this skill WHENEVER the user wants to discover literature
or read a study: "find recent papers on X", "what does the research say about
Y", "find sources for my thesis on Z", "search for prior work on ...", or
"summarise / read / extract the findings from this paper / this arXiv id /
this DOI / this PDF". Trigger even when the tool isn't named. Results are real
papers from public APIs (OpenAlex, Crossref, arXiv, Semantic Scholar, Europe
PMC) and reading reports use only the actual extracted PDF text — nothing is
fabricated.
---
# Paper Search
Two chained capabilities: **find papers**, then **deep-read** any one of them.
Both are grounded in real data — search hits come from public scholarly APIs,
and reading reports are built only from the extracted PDF text.
```
search a topic ──▶ ranked real papers ──▶ deep-read the open-access ones
```
> **Running the scripts:** run them **by their full path from your current
> working directory — do NOT `cd` into the skill folder.** That way `--save`
> writes `search-results.md` into the user's workspace, where Claude can open it
> as a clickable preview. (Replace `scripts/…` in the examples below with the
> skill's real path, e.g. `~/.claude/skills/paper-search/scripts/search_papers.py`.)
> If `python` isn't found, use `python3` (needs Python 3.8+).
## Capability 1 — Search
**When:** the user wants to find papers / prior work / sources on a topic, or
asks what the research says about something.
1. **Default = run immediately. Only ask if the topic is missing.** Don't gate
the search behind a menu. If the message already contains a **topic** (e.g.
*"查论文 玩家共鸣机制 200"* or *"find papers on X"*), search **right away** —
take any params they included (count, years, sort, open-access) and sensible
defaults for the rest (20 results, all years, best match). **Do not show the
setup menu and do not wait.** After the results, add ONE optional line so they
can still refine: *"Showing 200, all years, best match — say e.g. 'since 2021,
open access only' to refine."*
Show the setup menu **only when you genuinely don't have a topic to search**
(or it's too vague to query). Then, and only then:
> 🔍 **Search setup** — tell me the topic (tweak the rest if you like):
> 1. **Topic** — `(what should I search?)`
> 2. **How many papers** — `20` *(e.g. 10 / 20 / 40)*
> 3. **Years** — `any` · 4. **Open-access only** — `no` · 5. **Sort** — `best match`
>
> Just give me a topic, or adjust any line.
Map choices to flags: count → `--limit`, years → `--from-year` / `--to-year`,
open-access → `--open-access-only`, sort → your re-rank preference (step 5).
2. For a focused query, search directly. For a broad/exploratory one, first
expand it into directions and search the best 2–4 query strings (see
`references/search.md`, Stages 1–3).
3. Run the retriever (5 keyless APIs in parallel — OpenAlex, Crossref, arXiv,
Semantic Scholar, Europe PMC — deduped and rule-scored):
```bash
python scripts/search_papers.py "your query" --limit 40 --compact --save
# --save : ALWAYS include this — writes the FULL list (every card, with
# abstracts + links) to search-results.md. This is the complete
# record; it can't be truncated by what you show in chat.
# --compact : numbered one-line-per-paper list — USE THIS when the count is
# large (~25+) so the whole list is scannable and fully shown
# --markdown : richer cards (title + abstract + links) — good for smaller
# sets or when the user wants detail
# --brief : plain table, for your own quick skim
# (omit all) : full JSON records with abstracts, for re-ranking / writing
# filters: --from-year 2021 --to-year 2026 --open-access-only
# sources: --sources openalex,crossref,arxiv,s2,europepmc
```
4. **Present the full numbered list, and hand over the saved file.** Always run
with **`--save`** — the script writes every result to `search-results.md`
(printed path on stderr). That file is the guaranteed-complete record, so even
if the chat view ends up partial, nothing is lost. In chat, show the script's
stdout **as-is** (use **`--compact`** for large counts → a clean 1…N list they
can scan, or **`--markdown`** cards for smaller/detailed sets) — the script,
not the model, produces the layout, so it's identical on any model. Then point
the user to the file **as a clickable Markdown link with the relative path** so
it opens in Claude's preview pane — write it exactly as
`📄 Full list (all N, with abstracts): [search-results.md](search-results.md)`.
(Use the relative path, not an absolute one — absolute paths render gray/
non-clickable. This works because you ran the script from the workspace, so the
file is right there.) A `--markdown` card looks like:
> ### 1. [Paper title](https://doi.org/…) ← title links to the original
> Authors · Year · *Venue* · cited by N · via Source · 🟢 Open Access
> > abstract snippet…
> [📄 Open paper](url) · [⬇ PDF](pdf_url) · [🔗 DOI](https://doi.org/…)
Every card carries **clickable links that jump straight to the original paper,
its open-access PDF, and its DOI** — keep them intact.
**Hard output rules (do not override these for "helpfulness"):**
- **Show ALL N cards.** Present the full list, from item 1 through the
`— end of N results —` line. If the user asked for 200 and the databases
returned 169, list all 169. Never stop early, never show only "highlights"
or "the most representative", never collapse the rest into "… and more". The
saved `search-results.md` is your backstop — it always holds the complete set.
- **One flat numbered list, 1…N.** Do NOT reorganise into themes, categories,
sections, or sub-numbered buckets. One continuous `1, 2, 3 … N` sequence —
that number is how the user refers back ("deep-read #3", "summarise #7, #12").
- **Don't shorten cards.** Keep each item's metadata + links. You may add at
most a one-line "why" under one.
- A long reply is expected when the count is large. You may add ONE line
*after* the full list offering to narrow/filter/deep-read — never *instead*.
5. **Re-rank by fit FIRST, then number 1…N.** This is required, not optional —
the script's `rule_score` is only a keyword prior, so its raw order floats
keyword-soup to the top (wrong domain, wrong sense of an ambiguous word — e.g.
"players" matching *football players* or game-theory *players* when the user
meant video-game players). Before presenting: judge each result's fit to the
user's ACTUAL question (see `references/search.md`), **lead with the genuinely
on-target papers, and push clearly off-target ones to the bottom — or tag them
`⚠ likely off-target`**. Then renumber the full set 1…N. Keep every paper
(the file has them all); only DROP papers if the user explicitly asked to
filter. For a very large list where full reordering is impractical, at least
lead with the strongest hits and flag the obvious off-targets. Papers with
a `pdf_url` are open-access and can be deep-read by number next.
The script auto-retries Semantic Scholar on rate-limit (HTTP 429), which clears
most skips. If `s2` still gets skipped a lot, set a free key —
`export S2_API_KEY=...` (from semanticscholar.org/product/api) — and it stops
429-ing. Any skipped source is reported on stderr and the rest carry the search.
Full pipeline in `references/search.md`.
## Capability 2 — Deep Read
**When:** the user wants to actually read, summarise, or extract findings from a
specific paper (by title, DOI, arXiv id, or URL) — including one they just found
in a search.
1. Resolve and extract the PDF (tries direct URL → arXiv-by-DOI → Unpaywall →
OpenAlex → Semantic Scholar):
```bash
python scripts/fetch_pdf.py --doi 10.1145/3313831.3376234 --text-only
python scripts/fetch_pdf.py --arxiv 1706.03762 --text-only
python scripts/fetch_pdf.py --pdf-url https://.../paper.pdf
python scripts/fetch_pdf.py --title "Attention is all you need"
# --text-only prints just the extracted text; drop it for full JSON
# (per-page array + resolved URL + which sources were tried)
# reads the first ~12 pages by default; for a long paper's results section
# raise it, e.g. --max-pages 20 --max-chars 60000 (--email for Unpaywall)
```
2. If `resolved_pdf_url` is null, the paper is paywalled with no free copy — say
so plainly and offer to try another identifier or work from the abstract.
**Never fabricate the contents.**
3. Produce the evidence-aware reading report using the prompt in
`references/deep_read.md`, from the extracted text **only**. If extraction was
`truncated`, disclose that the report covers just the analysed excerpt. You
can also extract typed claims to hand off to a writing task.
## Setup
```bash
python -m pip install -r scripts/requirements.txt # only pypdf, for deep read
```
Search needs nothing beyond the Python 3.8+ standard library; all sources are
keyless. If `python` isn't found, use `python3`. Set `UNPAYWALL_EMAIL=you@domain`
(or pass `--email`) to be polite to Unpaywall and slightly improve OA hit-rate.
## Honesty contract
Search results are real papers, not invented. Deep-read reports use only the
extracted PDF text; gaps are disclosed, not filled with guesses. When a PDF is
paywalled or an abstract is too thin to support a claim, say so rather than
fabricating — that candor is the point.
No comments yet. Be the first to comment!