Entity: ADR-005

Brief: capture.py --crawl now skips credential directories and sensitive files to prevent secrets from entering the capture queue.

Context

capture.py --crawl stages every readable text file it finds. With only a basic ignored_dirs set (.git, node_modules, etc.), a crawl of a real project would routinely pick up .env files, PEM keys, SSH private keys, AWS credentials, and Terraform state — writing them verbatim into the plaintext capture-queue.md.

This is a silent data-leak risk: the queue is committed to version control in some workflows, and is read by LLM agents that may log or cache context.

Decision

Three module-level constants and one guard function were added to capture.py:

The crawl loop calls is_sensitive() before is_binary() and before stage(). Skipped sensitive files are counted and reported in the crawl summary.

The ouro/README.md was updated with a callout directing users to review and adjust these constants for their project.

Alternatives Considered

Trade-offs

Rationale

Secure-by-default is the right posture for a tool that reads arbitrary project files into a plaintext log. The constants are clearly labeled and easy to edit; the README callout directs users to do so. The risk of silently excluding a non-sensitive file is far lower than the risk of silently capturing a secret.

Note: The keyword list in is_sensitive() (secret, credential, password, passwd, apikey, api_key, token, private_key) is the most likely source of false positives. Projects that use these words in non-secret filenames (e.g. token_counter.py) should remove the relevant keywords from the list.