Architecture
Lore combines markdown-first knowledge storage with an operational pipeline for ingest, compile, retrieval, and quality enforcement.
4-Layer Wiki
- Index (
wiki/index.md) — always consulted first - Articles (
wiki/articles/*.md) — concept articles with backlinks and provenance - Derived (
wiki/derived/) — Q&A answers, slides, charts - Assets (
wiki/assets/) — local images
Supporting state lives in .lore/:
raw/normalized ingest artifacts keyed by content hashmanifest.jsonsource-to-raw tracking, compile timestamps, extracted hasheswiki/concepts.jsonnormalized concept metadata generated after compilewiki/concepts/per-concept detail pages (optional)wiki/deprecated/soft-deleted articles retained for auditdb.sqliteFTS/backlink databasecompile.lockactive compile mutex file
Repository Layout Snapshot
| Path | Purpose |
|---|---|
.lore/raw/ | Source-derived extracted artifacts keyed by hash |
.lore/wiki/articles/ | Compiled concept pages with provenance |
.lore/wiki/deprecated/ | Soft-deleted articles (audit trail) |
.lore/wiki/index.md | Canonical topic index |
.lore/wiki/concepts.json | Concept index with aliases/tags/confidence |
.lore/wiki/derived/qa/ | Filed query answers |
.lore/db.sqlite | FTS and link graph tables |
.lore/logs/ | JSONL command run logs |
4-Phase Pipeline
- Ingest —
raw/populated withextracted.md+meta.json - Compile — 6-step pipeline: diff → extract concepts → match → generate ops → apply → reindex
- Query — Q&A via BFS/DFS traversal, filed to
derived/qa/ - Lint — orphans, gaps, ambiguous claims, and line-aware diagnostics surfaced
Operationally, these phases are idempotent and can be re-run incrementally.
Compile is hash-incremental by default: unchanged extracted content is skipped based on manifest.json extractedHash fields.
Pipeline Diagram
flowchart LR
A[Ingest] --> B[Raw artifacts + manifest]
B --> C[Compile]
C --> D[Index rebuild + concepts]
D --> E[Search and query]
D --> F[Lint diagnostics]
Compile Sub-Pipeline
flowchart TD
A[Raw sources] --> B{Diff: changed?}
B -->|No| C[Skip]
B -->|Yes| D[Extract Concepts]
D --> E{Concepts?}
E -->|No| F[Batch Create]
E -->|Yes| G[Match to articles]
G --> H[Generate Operations]
H --> I[Apply to disk]
F --> I
I --> J[Reindex + concepts.json]
Provenance Model
Every article tracks which sources contributed to which lines. Two mechanisms:
Inline Provenance
Lines carry <!-- sources:HASH(CONFIDENCE) --> comments:
The auth service uses JWT. <!-- sources:abc123(extracted) def456(inferred) -->
When the LLM reads articles for matching, provenance comments are stripped. The LLM sees clean, numbered lines.
Cumulative References
Every article ends with a ## References section listing all source hashes ever merged:
## References
- abc123 (extracted)
- def456 (inferred)
This section is system-managed and hidden from LLM context.
Related Section
## Related is auto-generated from [[wiki-links]] found in the article body.
Operation Model
The LLM outputs line-level operations (JSON). Each operation carries a sources array for provenance:
| Operation | Effect |
|---|---|
replace | Replace one line |
insert-after | Insert after a target line |
delete-range | Remove lines (validated: start ≤ end) |
replace-range | Replace a span of lines |
split | Split article into two |
append-source | Add sources to existing lines (no content change) |
soft-delete | Move article to deprecated |
Operations are applied sequentially per source. Line references use ¶ (pilcrow) prefixes to distinguish from YAML line numbers.
Compile Reliability Controls
| Control | Purpose |
|---|---|
PID lock file (compile.lock) | Prevent overlapping compile runs |
| Hash-based skipping | Avoid recompiling unchanged extracted content |
| Per-source retry (1 attempt) | Recover from malformed LLM output |
| Zero-concept skip | Sources without extractable concepts go to batch create |
start > end validation | Range operations reject invalid spans |
| Post-compile index rebuild | Keep FTS/graph state aligned with article set |
Ingest and Metadata Flow
Ingest writes .lore/raw/<sha>/extracted.md and .lore/raw/<sha>/meta.json.
Metadata can include:
- canonical source identity
- folder-derived topical tags
- heuristic memory type tags
- timestamps and provenance fields
Duplicate content is detected by hash and reuses existing raw entries.
Metadata tags can originate from path hints and memory-pattern heuristics to improve downstream classification and discovery.
Query Flow and Normalization
query uses hybrid retrieval from FTS + graph context.
Question text normalization is optional and controlled by:
- CLI flags:
--normalize-question,--no-normalize-question - env default:
LORE_QUERY_NORMALIZE
Normalization is intentionally conservative to avoid mutating technical tokens.
Retrieval sequence:
- load index context
- run FTS candidate selection
- expand one-hop neighbors via link graph
- synthesize answer through LLM
Graph and Search Storage
SQLite structures:
fts: full-text index (slug,title,body) with ranking/snippetslinks: conceptual edges (from_slug,to_slug)
lore path computes shortest conceptual paths via BFS over undirected adjacency derived from links.
Index Integrity and Guardrails
Index rebuild can run in standard or repair mode:
- standard: regenerate DB artifacts from manifest/raw state
- repair: recover missing manifest entries from existing raw folders
Backlink indexing filters low-signal wiki-link targets (for example stopword-only links) to reduce graph noise.
Provenance comments are stripped from article bodies before FTS indexing so they do not pollute search results.
Guardrail benefits:
- fewer noisy edges in concept graph
- cleaner lint gap/orphan signals
- more stable path and neighbor exploration
MCP Maintenance Surface
The MCP server exposes maintenance and diagnostics tools for automation loops, including:
- duplicate checks before ingest
- raw tag distribution summaries
- orphan/gap/ambiguity lint summaries
- index rebuild and repair triggers
SQLite Schema
fts— FTS5 virtual table (slug, title, body) with Porter stemminglinks— backlinks graph (from_slug, to_slug)