Mnemosyne MCP Server: Python Tools Detail
Level 3 (Detail) — Session archiving and gathering scripts.
Concept
The tools/ directory contains two Python utilities for archiving Gemini CLI sessions and converting them to clean Markdown. They live in the mnemosyne-mcp-server repository but operate on session data, not on the MCP service directly.
chronicler.py
File: tools/chronicler.py (285 lines)
Purpose: scans .gemini directories for session JSON files, deduplicates by basename, copies to /workspace/chats/, and transforms to clean Markdown in /workspace/chats/md/.
Pipeline
1. gather_sessions()
└─ Scan SEARCH_ROOTS ([~/, /workspace]) for .gemini dirs
└─ Glob **/session*.json and **/checkpoint*.json
└─ Dedup by basename, copy to /workspace/chats/
2. transform_json_to_md()
└─ For each JSON in /workspace/chats/:
├─ Skip if > 50 MB
├─ Skip meta-sessions (see below)
├─ Parse messages from "messages" or "history" key
├─ Clean protocol noise
├─ Smart truncate at 5000 chars
└─ Write MD to /workspace/chats/md/
Search Roots
SEARCH_ROOTS = [Path("/home/tazpod"), Path("/workspace")]
Size Protection
- JSONs larger than 50 MB are skipped entirely
- Markdown output shorter than 100 characters is discarded
Meta-Session Filtering (is_meta_session())
Checks the first 5 messages for:
- “KNOWLEDGE EXTRACTION PROTOCOL”
- “CHIEF ARCHIVIST”
If either string is found, the session is classified as archival work and skipped.
Exceptions (always kept):
checkpoint-mnemosyne-bulk-upload.jsonsession-2026-02-20T09-14-fee061f7.json
Protocol Noise Removal (clean_protocol_noise())
Filters lines containing these keywords:
| Keyword | Source |
|---|---|
| TAZLAB KNOWLEDGE EXTRACTION PROTOCOL | Instruction block |
| Chief Archivist | Role header |
| SESSION LOG BELOW | Session delimiter |
| — Content from referenced files — | File inclusion marker |
| # Session: | Markdown artifact |
| [TRONCAMENTO CHIRURGICO: | Truncation marker |
Text Truncation (smart_truncate())
Hard limit of 5000 characters per text block. When exceeded:
- Keep first 2500 chars
- Insert truncation marker with removed character count
- Keep last 2500 chars
Deduplication Logic
- First pass: dedup by basename across all
.geminidirectories (first seen wins) - Second pass: if basename already exists in
/workspace/chats/, comparemtimeand size to decide whether to overwrite
gather_sessions.py
File: tools/gather_sessions.py (52 lines)
Simplified subset of chronicler.py — gather step only.
Usage:
python tools/gather_sessions.py [<root_dir>]
Behavior:
- Scan
<root_dir>(default: current directory) for.geminidirectories - Glob
**/session*.jsonand**/checkpoint*.json - Copy to
/workspace/chats/with collision-safe renaming (prefixes parent dir name)
Dependencies
File: tools/requirements.txt
google-genai
psycopg2-binary
google-cloud-alloydb-connector
pyyaml
openai
requests
Note: not all dependencies are directly used by the current tools. The file appears to be a superset that includes potential future tool requirements (e.g., google-genai, psycopg2-binary, openai).
Known Issues
- Brittle meta-session heuristic: string matching in
is_meta_session()can produce false positives if a legitimate session mentions these keywords in context - No config file: search roots, size limits, and filters are hardcoded
- Installation gap: no
setup.pyorpyproject.toml— must be run from the repository root with dependencies manually installed - No error recovery: if a single JSON fails to parse, it is silently skipped with no retry or reporting mechanism beyond a log message
See Also
- Parent topic: Architecture
- Sibling details: Ingest Memory Detail, Retrieve Memories Detail, Deployment Detail
- Related entity: tazpod — session dirs are in the
.geminidirectory within the container