LLM Models

Lore uses OpenRouter for core LLM operations (for example compile, query, explain, and angela decision capture). Configure the model in .lore/config.json:

{
  "model": "deepseek/deepseek-v4-pro",
  "temperature": 0.3,
  "maxTokens": 4096
}

maxTokens is optional. If set, Lore sends max_tokens with that value. If unset, Lore omits max_tokens and relies on the provider/model default output limit.

Model Selection Guide

Use this as a practical starting point, then tune based on your own latency/cost/quality goals.

Workload	Suggested model style	Why
Large multi-document compile runs	Long-context, cost-efficient model	Better tolerance for larger source windows
Interactive query/explain loops	Balanced quality/speed model	Fast iteration while preserving answer quality
Decision capture (Angela)	High instruction-following model	Cleaner concise decision summaries

Suggested Defaults by Intent

Intent	Example config
Stability-focused	`temperature: 0.2`, set `maxTokens`
Exploration-focused	`temperature: 0.4-0.6`, optional `maxTokens`
Cost control	Lower `maxTokens`, run incremental compile frequently

Recommended Models

deepseek/deepseek-v4-pro -- strong long-context and multilingual performance
openai/gpt-4o -- strong quality/speed balance
anthropic/claude-3.5-sonnet -- strong reasoning
google/gemini-pro-1.5 -- large context window

Configuration Examples

Balanced default

{
  "model": "openai/gpt-4o",
  "temperature": 0.3,
  "maxTokens": 4096
}

Long-context compile emphasis

{
  "model": "deepseek/deepseek-v4-pro",
  "temperature": 0.2
}

Creative synthesis emphasis

{
  "model": "anthropic/claude-3.5-sonnet",
  "temperature": 0.6,
  "maxTokens": 6000
}

Temperature and maxTokens Tuning

Setting	Lower value effect	Higher value effect
`temperature`	More deterministic outputs	More diverse wording and structure
`maxTokens`	Tighter responses, lower output cost	Longer responses, higher output cost

Compile reliability note:

If a compile batch response is truncated (finish_reason=length), Lore retries with smaller batches.
Setting maxTokens too low can increase retry frequency on larger source sets.

Replicate Models

cuuupid/marker -- PDF/document extraction (.pdf, .docx, .pptx, .xlsx, .epub)
yorickvp/llava-13b -- image OCR/captioning (.png, .jpg, .jpeg, .webp, .gif, .bmp)

Replicate models are used for ingest parsing, not for compile/query/explain generation.

Notes

OpenRouter model choice is per-repo (.lore/config.json).
OpenRouter credentials can be set via lore settings set openrouterApiKey <value> --scope global or TELEPAT_OPENROUTER_KEY.
Replicate credentials can be set via lore settings set replicateApiToken <value> --scope global or TELEPAT_REPLICATE_TOKEN.
Environment variables take precedence over stored settings at runtime.

Model Selection Guide​

Suggested Defaults by Intent​

Recommended Models​

Configuration Examples​

Balanced default​

Long-context compile emphasis​

Creative synthesis emphasis​

Temperature and maxTokens Tuning​

Replicate Models​

Notes​

Related Docs​