paper-search

Name: paper-search
Author: academicatstool-netizen
ASecurity
Find academic papers and read them — searches real scholarly databases for prior work on a topic, then resolves and deep-reads the open-access PDF of any paper you pick. Use this skill WHENEVER the user wants to discover literature or read a study: "find recent papers on X", "what does the research say about Y", "find sources for my thesis on Z", "search for prior work on ...", or "summarise / read / extract the findings from this paper / this arXiv id / this DOI / this PDF". Trigger even whe...
2 stars
0 votes
2 copies
42 views
Added 6/13/2026
researchpythongobashapidatabase
Works with

cliapi
Security Analysis

A96/100
mediumInstalls packages at runtime which could introduce malicious dependencies
Scanned 6/13/2026
Install via CLI
$openskills install academicatstool-netizen/Cat_paper_search
Files
SKILL.md
---
name: paper-search
description: >-
  Find academic papers and read them — searches real scholarly databases for
  prior work on a topic, then resolves and deep-reads the open-access PDF of any
  paper you pick. Use this skill WHENEVER the user wants to discover literature
  or read a study: "find recent papers on X", "what does the research say about
  Y", "find sources for my thesis on Z", "search for prior work on ...", or
  "summarise / read / extract the findings from this paper / this arXiv id /
  this DOI / this PDF". Trigger even when the tool isn't named. Results are real
  papers from public APIs (OpenAlex, Crossref, arXiv, Semantic Scholar, Europe
  PMC) and reading reports use only the actual extracted PDF text — nothing is
  fabricated.
---

# Paper Search

Two chained capabilities: **find papers**, then **deep-read** any one of them.
Both are grounded in real data — search hits come from public scholarly APIs,
and reading reports are built only from the extracted PDF text.

```
  search a topic ──▶ ranked real papers ──▶ deep-read the open-access ones
```

> **Running the scripts:** run them **by their full path from your current
> working directory — do NOT `cd` into the skill folder.** That way `--save`
> writes `search-results.md` into the user's workspace, where Claude can open it
> as a clickable preview. (Replace `scripts/…` in the examples below with the
> skill's real path, e.g. `~/.claude/skills/paper-search/scripts/search_papers.py`.)
> If `python` isn't found, use `python3` (needs Python 3.8+).

## Capability 1 — Search

**When:** the user wants to find papers / prior work / sources on a topic, or
asks what the research says about something.

1. **Default = run immediately. Only ask if the topic is missing.** Don't gate
   the search behind a menu. If the message already contains a **topic** (e.g.
   *"查论文 玩家共鸣机制 200"* or *"find papers on X"*), search **right away** —
   take any params they included (count, years, sort, open-access) and sensible
   defaults for the rest (20 results, all years, best match). **Do not show the
   setup menu and do not wait.** After the results, add ONE optional line so they
   can still refine: *"Showing 200, all years, best match — say e.g. 'since 2021,
   open access only' to refine."*

   Show the setup menu **only when you genuinely don't have a topic to search**
   (or it's too vague to query). Then, and only then:

   > 🔍 **Search setup** — tell me the topic (tweak the rest if you like):
   > 1. **Topic** — `(what should I search?)`
   > 2. **How many papers** — `20`  *(e.g. 10 / 20 / 40)*
   > 3. **Years** — `any`  ·  4. **Open-access only** — `no`  ·  5. **Sort** — `best match`
   >
   > Just give me a topic, or adjust any line.

   Map choices to flags: count → `--limit`, years → `--from-year` / `--to-year`,
   open-access → `--open-access-only`, sort → your re-rank preference (step 5).
2. For a focused query, search directly. For a broad/exploratory one, first
   expand it into directions and search the best 2–4 query strings (see
   `references/search.md`, Stages 1–3).
3. Run the retriever (5 keyless APIs in parallel — OpenAlex, Crossref, arXiv,
   Semantic Scholar, Europe PMC — deduped and rule-scored):

   ```bash
   python scripts/search_papers.py "your query" --limit 40 --compact --save
   #   --save     : ALWAYS include this — writes the FULL list (every card, with
   #                abstracts + links) to search-results.md. This is the complete
   #                record; it can't be truncated by what you show in chat.
   #   --compact  : numbered one-line-per-paper list — USE THIS when the count is
   #                large (~25+) so the whole list is scannable and fully shown
   #   --markdown : richer cards (title + abstract + links) — good for smaller
   #                sets or when the user wants detail
   #   --brief    : plain table, for your own quick skim
   #   (omit all) : full JSON records with abstracts, for re-ranking / writing
   # filters:  --from-year 2021 --to-year 2026  --open-access-only
   # sources:  --sources openalex,crossref,arxiv,s2,europepmc
   ```
4. **Present the full numbered list, and hand over the saved file.** Always run
   with **`--save`** — the script writes every result to `search-results.md`
   (printed path on stderr). That file is the guaranteed-complete record, so even
   if the chat view ends up partial, nothing is lost. In chat, show the script's
   stdout **as-is** (use **`--compact`** for large counts → a clean 1…N list they
   can scan, or **`--markdown`** cards for smaller/detailed sets) — the script,
   not the model, produces the layout, so it's identical on any model. Then point
   the user to the file **as a clickable Markdown link with the relative path** so
   it opens in Claude's preview pane — write it exactly as
   `📄 Full list (all N, with abstracts): [search-results.md](search-results.md)`.
   (Use the relative path, not an absolute one — absolute paths render gray/
   non-clickable. This works because you ran the script from the workspace, so the
   file is right there.) A `--markdown` card looks like:

   > ### 1. [Paper title](https://doi.org/…)  ← title links to the original
   > Authors  ·  Year  ·  *Venue*  ·  cited by N  ·  via Source  ·  🟢 Open Access
   > > abstract snippet…
   > [📄 Open paper](url) · [⬇ PDF](pdf_url) · [🔗 DOI](https://doi.org/…)

   Every card carries **clickable links that jump straight to the original paper,
   its open-access PDF, and its DOI** — keep them intact.

   **Hard output rules (do not override these for "helpfulness"):**
   - **Show ALL N cards.** Present the full list, from item 1 through the
     `— end of N results —` line. If the user asked for 200 and the databases
     returned 169, list all 169. Never stop early, never show only "highlights"
     or "the most representative", never collapse the rest into "… and more". The
     saved `search-results.md` is your backstop — it always holds the complete set.
   - **One flat numbered list, 1…N.** Do NOT reorganise into themes, categories,
     sections, or sub-numbered buckets. One continuous `1, 2, 3 … N` sequence —
     that number is how the user refers back ("deep-read #3", "summarise #7, #12").
   - **Don't shorten cards.** Keep each item's metadata + links. You may add at
     most a one-line "why" under one.
   - A long reply is expected when the count is large. You may add ONE line
     *after* the full list offering to narrow/filter/deep-read — never *instead*.
5. **Re-rank by fit FIRST, then number 1…N.** This is required, not optional —
   the script's `rule_score` is only a keyword prior, so its raw order floats
   keyword-soup to the top (wrong domain, wrong sense of an ambiguous word — e.g.
   "players" matching *football players* or game-theory *players* when the user
   meant video-game players). Before presenting: judge each result's fit to the
   user's ACTUAL question (see `references/search.md`), **lead with the genuinely
   on-target papers, and push clearly off-target ones to the bottom — or tag them
   `⚠ likely off-target`**. Then renumber the full set 1…N. Keep every paper
   (the file has them all); only DROP papers if the user explicitly asked to
   filter. For a very large list where full reordering is impractical, at least
   lead with the strongest hits and flag the obvious off-targets. Papers with
   a `pdf_url` are open-access and can be deep-read by number next.

The script auto-retries Semantic Scholar on rate-limit (HTTP 429), which clears
most skips. If `s2` still gets skipped a lot, set a free key —
`export S2_API_KEY=...` (from semanticscholar.org/product/api) — and it stops
429-ing. Any skipped source is reported on stderr and the rest carry the search.
Full pipeline in `references/search.md`.

## Capability 2 — Deep Read

**When:** the user wants to actually read, summarise, or extract findings from a
specific paper (by title, DOI, arXiv id, or URL) — including one they just found
in a search.

1. Resolve and extract the PDF (tries direct URL → arXiv-by-DOI → Unpaywall →
   OpenAlex → Semantic Scholar):

   ```bash
   python scripts/fetch_pdf.py --doi 10.1145/3313831.3376234 --text-only
   python scripts/fetch_pdf.py --arxiv 1706.03762 --text-only
   python scripts/fetch_pdf.py --pdf-url https://.../paper.pdf
   python scripts/fetch_pdf.py --title "Attention is all you need"
   # --text-only prints just the extracted text; drop it for full JSON
   #   (per-page array + resolved URL + which sources were tried)
   # reads the first ~12 pages by default; for a long paper's results section
   # raise it, e.g. --max-pages 20 --max-chars 60000   (--email for Unpaywall)
   ```
2. If `resolved_pdf_url` is null, the paper is paywalled with no free copy — say
   so plainly and offer to try another identifier or work from the abstract.
   **Never fabricate the contents.**
3. Produce the evidence-aware reading report using the prompt in
   `references/deep_read.md`, from the extracted text **only**. If extraction was
   `truncated`, disclose that the report covers just the analysed excerpt. You
   can also extract typed claims to hand off to a writing task.

## Setup

```bash
python -m pip install -r scripts/requirements.txt   # only pypdf, for deep read
```
Search needs nothing beyond the Python 3.8+ standard library; all sources are
keyless. If `python` isn't found, use `python3`. Set `UNPAYWALL_EMAIL=you@domain`
(or pass `--email`) to be polite to Unpaywall and slightly improve OA hit-rate.

## Honesty contract

Search results are real papers, not invented. Deep-read reports use only the
extracted PDF text; gaps are disclosed, not filled with guesses. When a PDF is
paywalled or an abstract is too thin to support a claim, say so rather than
fabricating — that candor is the point.
paper-search

Works with

Security Analysis

Attribution

Comments (0)