Entity: capture

Brief: Manages the capture queue — staging, crawling, and popping knowledge entries for LLM synthesis.

Overview

ouro/scripts/capture.py is the primary knowledge ingestion tool. It reads code and notes into ouro/wiki/capture-queue.md for later synthesis by an LLM agent. It is the entry point to the Capture-Synthesize Loop.

Functions

stage(input_str)

@param input_str A file path or raw text string.

Appends a timestamped entry to the capture queue. If a valid file path is given, reads its content; otherwise treats the input as a raw snippet. Handles binary file detection and encoding errors gracefully.

@snippet stage-entry-format

### Capture [2026-05-03T10:00:00]
- **Source**: `ouro/scripts/capture.py`
- **Content**:
```<content>```
---

If the queue contains *(Empty)*, it is removed before appending.

crawl(directory)

@param directory Root directory to walk recursively. Defaults to . via CLI.

Walks all non-binary files and calls stage() on each, skipping ignored directories, sensitive files, and the ouro/wiki/ directory itself to avoid recursive capture. Reports a count of staged files and separately a count of skipped sensitive files.

get_git_changed_files(depth=1)

@param depth Number of commits to look back for changed files. Defaults to 1.

Runs four git commands to collect the full set of recently touched files: unstaged tracked changes (git diff --name-only), staged changes (git diff --name-only --cached), untracked new files (git ls-files --others --exclude-standard), and files changed in the last depth commits (git diff --name-only HEAD~{depth} HEAD). Returns a set of resolved absolute Path objects. Silently skips any command that fails (e.g. git not installed, not a repo).

crawl_git(directory, depth=1)

@param directory Root directory to restrict results to. Defaults to . via CLI. @param depth Passed through to get_git_changed_files().

Git-aware alternative to crawl(). Calls get_git_changed_files() to determine which files to stage, then applies the same sensitivity, binary, and ignore-list filters as crawl(). Only files within directory are staged. Recommended for ongoing sessions where the wiki already exists — avoids re-queuing unchanged files.

Module-level constants (edit in capture.py to customise for your project):

See ADR-005 for the rationale.

is_sensitive(file_path)

Combines SENSITIVE_NAMES, SENSITIVE_SUFFIXES, and heuristic keyword matching (secret, credential, password, passwd, apikey, api_key, token, private_key in the filename) to decide whether a file should be skipped during crawl.

Warning: IGNORED_DIRS, SENSITIVE_NAMES, SENSITIVE_SUFFIXES, and the keyword list in is_sensitive() are project-agnostic defaults. Projects with non-standard secret naming conventions should update these constants directly in capture.py.

pop()

Reads and prints the first ### Capture [...] entry from the queue, removes it, and rewrites the file. If no entries remain, restores the *(Empty)* marker. Used by the LLM agent to process one entry at a time during synthesis.

is_binary(file_path)

Heuristic check — reads the first 1024 bytes of a file and returns True if a null byte is found.

CLI Usage

# Stage a specific file
python ouro/scripts/capture.py path/to/file.py

# Stage a raw architectural note
python ouro/scripts/capture.py "Decision: use composition over inheritance in the plugin loader."

# Crawl only git-changed files (recommended after initial setup)
python ouro/scripts/capture.py --crawl --git

# Include last N commits' worth of changes
python ouro/scripts/capture.py --crawl --git 3

# Crawl the whole project (use for initial wiki population)
python ouro/scripts/capture.py --crawl

# Crawl a specific directory
python ouro/scripts/capture.py --crawl src/

# Pop the first entry from the queue (used during synthesis)
python ouro/scripts/capture.py --pop
Note: QUEUE_PATH is resolved relative to Path.cwd(), so the script must be run from the project root.

Known Gaps