Mnemosyne MCP Server: Python Tools Detail

Level 3 (Detail) — Session archiving and gathering scripts.

Concept

The tools/ directory contains two Python utilities for archiving Gemini CLI sessions and converting them to clean Markdown. They live in the mnemosyne-mcp-server repository but operate on session data, not on the MCP service directly.

chronicler.py

File: tools/chronicler.py (285 lines)

Purpose: scans .gemini directories for session JSON files, deduplicates by basename, copies to /workspace/chats/, and transforms to clean Markdown in /workspace/chats/md/.

Pipeline

1. gather_sessions()
   └─ Scan SEARCH_ROOTS ([~/, /workspace]) for .gemini dirs
      └─ Glob **/session*.json and **/checkpoint*.json
         └─ Dedup by basename, copy to /workspace/chats/

2. transform_json_to_md()
   └─ For each JSON in /workspace/chats/:
      ├─ Skip if > 50 MB
      ├─ Skip meta-sessions (see below)
      ├─ Parse messages from "messages" or "history" key
      ├─ Clean protocol noise
      ├─ Smart truncate at 5000 chars
      └─ Write MD to /workspace/chats/md/

Search Roots

SEARCH_ROOTS = [Path("/home/tazpod"), Path("/workspace")]

Size Protection

JSONs larger than 50 MB are skipped entirely
Markdown output shorter than 100 characters is discarded

Meta-Session Filtering (`is_meta_session()`)

Checks the first 5 messages for:

“KNOWLEDGE EXTRACTION PROTOCOL”
“CHIEF ARCHIVIST”

If either string is found, the session is classified as archival work and skipped.

Exceptions (always kept):

checkpoint-mnemosyne-bulk-upload.json
session-2026-02-20T09-14-fee061f7.json

Protocol Noise Removal (`clean_protocol_noise()`)

Filters lines containing these keywords:

Keyword	Source
TAZLAB KNOWLEDGE EXTRACTION PROTOCOL	Instruction block
Chief Archivist	Role header
SESSION LOG BELOW	Session delimiter
— Content from referenced files —	File inclusion marker
# Session:	Markdown artifact
[TRONCAMENTO CHIRURGICO:	Truncation marker

Text Truncation (`smart_truncate()`)

Hard limit of 5000 characters per text block. When exceeded:

Keep first 2500 chars
Insert truncation marker with removed character count
Keep last 2500 chars

Deduplication Logic

First pass: dedup by basename across all .gemini directories (first seen wins)
Second pass: if basename already exists in /workspace/chats/, compare mtime and size to decide whether to overwrite

gather_sessions.py

File: tools/gather_sessions.py (52 lines)

Simplified subset of chronicler.py — gather step only.

Usage:

python tools/gather_sessions.py [<root_dir>]

Behavior:

Scan <root_dir> (default: current directory) for .gemini directories
Glob **/session*.json and **/checkpoint*.json
Copy to /workspace/chats/ with collision-safe renaming (prefixes parent dir name)

Dependencies

File: tools/requirements.txt

google-genai
psycopg2-binary
google-cloud-alloydb-connector
pyyaml
openai
requests

Note: not all dependencies are directly used by the current tools. The file appears to be a superset that includes potential future tool requirements (e.g., google-genai, psycopg2-binary, openai).

Known Issues

Brittle meta-session heuristic: string matching in is_meta_session() can produce false positives if a legitimate session mentions these keywords in context
No config file: search roots, size limits, and filters are hardcoded
Installation gap: no setup.py or pyproject.toml — must be run from the repository root with dependencies manually installed
No error recovery: if a single JSON fails to parse, it is silently skipped with no retry or reporting mechanism beyond a log message

Mnemosyne MCP Server: Python Tools Detail#

Concept#

chronicler.py#

Pipeline#

Search Roots#

Size Protection#

Meta-Session Filtering (is_meta_session())#

Protocol Noise Removal (clean_protocol_noise())#

Text Truncation (smart_truncate())#

Deduplication Logic#

gather_sessions.py#

Dependencies#

Known Issues#

See Also#