Mnemosyne MCP Server: Python Tools Detail

Level 3 (Detail) — Session archiving and gathering scripts.

Concept

The tools/ directory contains two Python utilities for archiving Gemini CLI sessions and converting them to clean Markdown. They live in the mnemosyne-mcp-server repository but operate on session data, not on the MCP service directly.

chronicler.py

File: tools/chronicler.py (285 lines)

Purpose: scans .gemini directories for session JSON files, deduplicates by basename, copies to /workspace/chats/, and transforms to clean Markdown in /workspace/chats/md/.

Pipeline

1. gather_sessions()
   └─ Scan SEARCH_ROOTS ([~/, /workspace]) for .gemini dirs
      └─ Glob **/session*.json and **/checkpoint*.json
         └─ Dedup by basename, copy to /workspace/chats/

2. transform_json_to_md()
   └─ For each JSON in /workspace/chats/:
      ├─ Skip if > 50 MB
      ├─ Skip meta-sessions (see below)
      ├─ Parse messages from "messages" or "history" key
      ├─ Clean protocol noise
      ├─ Smart truncate at 5000 chars
      └─ Write MD to /workspace/chats/md/

Search Roots

SEARCH_ROOTS = [Path("/home/tazpod"), Path("/workspace")]

Size Protection

  • JSONs larger than 50 MB are skipped entirely
  • Markdown output shorter than 100 characters is discarded

Meta-Session Filtering (is_meta_session())

Checks the first 5 messages for:

  • “KNOWLEDGE EXTRACTION PROTOCOL”
  • “CHIEF ARCHIVIST”

If either string is found, the session is classified as archival work and skipped.

Exceptions (always kept):

  • checkpoint-mnemosyne-bulk-upload.json
  • session-2026-02-20T09-14-fee061f7.json

Protocol Noise Removal (clean_protocol_noise())

Filters lines containing these keywords:

KeywordSource
TAZLAB KNOWLEDGE EXTRACTION PROTOCOLInstruction block
Chief ArchivistRole header
SESSION LOG BELOWSession delimiter
— Content from referenced files —File inclusion marker
# Session:Markdown artifact
[TRONCAMENTO CHIRURGICO:Truncation marker

Text Truncation (smart_truncate())

Hard limit of 5000 characters per text block. When exceeded:

  • Keep first 2500 chars
  • Insert truncation marker with removed character count
  • Keep last 2500 chars

Deduplication Logic

  • First pass: dedup by basename across all .gemini directories (first seen wins)
  • Second pass: if basename already exists in /workspace/chats/, compare mtime and size to decide whether to overwrite

gather_sessions.py

File: tools/gather_sessions.py (52 lines)

Simplified subset of chronicler.py — gather step only.

Usage:

python tools/gather_sessions.py [<root_dir>]

Behavior:

  1. Scan <root_dir> (default: current directory) for .gemini directories
  2. Glob **/session*.json and **/checkpoint*.json
  3. Copy to /workspace/chats/ with collision-safe renaming (prefixes parent dir name)

Dependencies

File: tools/requirements.txt

google-genai
psycopg2-binary
google-cloud-alloydb-connector
pyyaml
openai
requests

Note: not all dependencies are directly used by the current tools. The file appears to be a superset that includes potential future tool requirements (e.g., google-genai, psycopg2-binary, openai).

Known Issues

  1. Brittle meta-session heuristic: string matching in is_meta_session() can produce false positives if a legitimate session mentions these keywords in context
  2. No config file: search roots, size limits, and filters are hardcoded
  3. Installation gap: no setup.py or pyproject.toml — must be run from the repository root with dependencies manually installed
  4. No error recovery: if a single JSON fails to parse, it is silently skipped with no retry or reporting mechanism beyond a log message

See Also