Architecture

Lore combines markdown-first knowledge storage with an operational pipeline for ingest, compile, retrieval, and quality enforcement.

4-Layer Wiki

Index (wiki/index.md) — always consulted first
Articles (wiki/articles/*.md) — concept articles with backlinks and provenance
Derived (wiki/derived/) — Q&A answers, slides, charts
Assets (wiki/assets/) — local images

Supporting state lives in .lore/:

raw/ normalized ingest artifacts keyed by content hash
manifest.json source-to-raw tracking, compile timestamps, extracted hashes
wiki/concepts.json normalized concept metadata generated after compile
wiki/concepts/ per-concept detail pages (optional)
wiki/deprecated/ soft-deleted articles retained for audit
db.sqlite FTS/backlink database
compile.lock active compile mutex file

Repository Layout Snapshot

Path	Purpose
`.lore/raw/`	Source-derived extracted artifacts keyed by hash
`.lore/wiki/articles/`	Compiled concept pages with provenance
`.lore/wiki/deprecated/`	Soft-deleted articles (audit trail)
`.lore/wiki/index.md`	Canonical topic index
`.lore/wiki/concepts.json`	Concept index with aliases/tags/confidence
`.lore/wiki/derived/qa/`	Filed query answers
`.lore/db.sqlite`	FTS and link graph tables
`.lore/logs/`	JSONL command run logs

4-Phase Pipeline

Ingest — raw/ populated with extracted.md + meta.json
Compile — 6-step pipeline: diff → extract concepts → match → generate ops → apply → reindex
Query — Q&A via BFS/DFS traversal, filed to derived/qa/
Lint — orphans, gaps, ambiguous claims, and line-aware diagnostics surfaced

Operationally, these phases are idempotent and can be re-run incrementally.

Compile is hash-incremental by default: unchanged extracted content is skipped based on manifest.json extractedHash fields.

Pipeline Diagram

flowchart LR
    A[Ingest] --> B[Raw artifacts + manifest]
    B --> C[Compile]
    C --> D[Index rebuild + concepts]
    D --> E[Search and query]
    D --> F[Lint diagnostics]

Compile Sub-Pipeline

flowchart TD
    A[Raw sources] --> B{Diff: changed?}
    B -->|No| C[Skip]
    B -->|Yes| D[Extract Concepts]
    D --> E{Concepts?}
    E -->|No| F[Batch Create]
    E -->|Yes| G[Match to articles]
    G --> H[Generate Operations]
    H --> I[Apply to disk]
    F --> I
    I --> J[Reindex + concepts.json]

Provenance Model

Every article tracks which sources contributed to which lines. Two mechanisms:

Inline Provenance

Lines carry  comments:

The auth service uses JWT. <!-- sources:abc123(extracted) def456(inferred) -->

When the LLM reads articles for matching, provenance comments are stripped. The LLM sees clean, numbered lines.

Cumulative References

Every article ends with a ## References section listing all source hashes ever merged:

## References
- abc123 (extracted)
- def456 (inferred)

This section is system-managed and hidden from LLM context.

## Related is auto-generated from [[wiki-links]] found in the article body.

Operation Model

The LLM outputs line-level operations (JSON). Each operation carries a sources array for provenance:

Operation	Effect
`replace`	Replace one line
`insert-after`	Insert after a target line
`delete-range`	Remove lines (validated: start ≤ end)
`replace-range`	Replace a span of lines
`split`	Split article into two
`append-source`	Add sources to existing lines (no content change)
`soft-delete`	Move article to deprecated

Operations are applied sequentially per source. Line references use ¶ (pilcrow) prefixes to distinguish from YAML line numbers.

Compile Reliability Controls

Control	Purpose
PID lock file (`compile.lock`)	Prevent overlapping compile runs
Hash-based skipping	Avoid recompiling unchanged extracted content
Per-source retry (1 attempt)	Recover from malformed LLM output
Zero-concept skip	Sources without extractable concepts go to batch create
`start > end` validation	Range operations reject invalid spans
Post-compile index rebuild	Keep FTS/graph state aligned with article set

Ingest and Metadata Flow

Ingest writes .lore/raw/<sha>/extracted.md and .lore/raw/<sha>/meta.json.

Metadata can include:

canonical source identity
folder-derived topical tags
heuristic memory type tags
timestamps and provenance fields

Duplicate content is detected by hash and reuses existing raw entries.

Metadata tags can originate from path hints and memory-pattern heuristics to improve downstream classification and discovery.

Query Flow and Normalization

query uses hybrid retrieval from FTS + graph context.

Question text normalization is optional and controlled by:

CLI flags: --normalize-question, --no-normalize-question
env default: LORE_QUERY_NORMALIZE

Normalization is intentionally conservative to avoid mutating technical tokens.

Retrieval sequence:

load index context
run FTS candidate selection
expand one-hop neighbors via link graph
synthesize answer through LLM

Graph and Search Storage

SQLite structures:

fts: full-text index (slug, title, body) with ranking/snippets
links: conceptual edges (from_slug, to_slug)

lore path computes shortest conceptual paths via BFS over undirected adjacency derived from links.

Index Integrity and Guardrails

Index rebuild can run in standard or repair mode:

standard: regenerate DB artifacts from manifest/raw state
repair: recover missing manifest entries from existing raw folders

Backlink indexing filters low-signal wiki-link targets (for example stopword-only links) to reduce graph noise.

Provenance comments are stripped from article bodies before FTS indexing so they do not pollute search results.

Guardrail benefits:

fewer noisy edges in concept graph
cleaner lint gap/orphan signals
more stable path and neighbor exploration

MCP Maintenance Surface

The MCP server exposes maintenance and diagnostics tools for automation loops, including:

duplicate checks before ingest
raw tag distribution summaries
orphan/gap/ambiguity lint summaries
index rebuild and repair triggers

SQLite Schema

fts — FTS5 virtual table (slug, title, body) with Porter stemming
links — backlinks graph (from_slug, to_slug)

4-Layer Wiki​

Repository Layout Snapshot​

4-Phase Pipeline​

Pipeline Diagram​

Compile Sub-Pipeline​

Provenance Model​

Inline Provenance​

Cumulative References​

Related Section​

Operation Model​

Compile Reliability Controls​

Ingest and Metadata Flow​

Query Flow and Normalization​

Graph and Search Storage​

Index Integrity and Guardrails​

MCP Maintenance Surface​

SQLite Schema​

Related Docs​