Tier 2 - The Built-In Knowledge Base
When your context requirements extend beyond a single session or a single document, both OpenWebUI and LibreChat within GLBNXT Workspace support the creation of a persistent knowledge base through a separate configuration step. This allows you to build a collection of documents that the model can draw on across multiple conversations, without requiring you to re-upload or re-paste content each time.
What a Knowledge Base Does
A knowledge base stores a collection of documents in a processed form that allows relevant passages to be retrieved automatically when you ask a question. Rather than loading every document in full into the context window for every query, the system identifies which parts of the stored collection are most relevant to your current query and injects only those passages into the context.
This process is called Retrieval-Augmented Generation, or RAG. At a conceptual level, it works in three stages:
Indexing. When you add a document to the knowledge base, it is split into chunks of a defined size, and each chunk is converted into a numerical vector representation called an embedding. The embedding encodes the semantic content of the chunk in a high-dimensional space, such that chunks with similar meaning are positioned close together in that space.
Retrieval. When you submit a query, the same embedding process is applied to your query text. The system then searches the vector store for chunks whose embeddings are closest to the query embedding, measured by cosine similarity or a similar distance metric. The top-ranked chunks are selected as the retrieved context.
Generation. The retrieved chunks are prepended to the model's context window along with your query, and the model generates a response grounded in the retrieved content.
Understanding this process at a conceptual level is important because it clarifies why the knowledge base sometimes fails to surface the information you expected. The retrieval step is a semantic similarity search, not a keyword search and not a logical inference. It finds chunks that are semantically close to your query as expressed, not chunks that contain the information you need as you understand it. The distinction matters when your query and the relevant document use different terminology for the same concept.
Adding Documents to the Knowledge Base
In both OpenWebUI and LibreChat, documents are added to the knowledge base through the configuration interface rather than through the conversation input. The specific navigation path varies between versions, but the pattern is consistent: navigate to the knowledge or documents section of the settings, create a collection, and add files to it. That collection is then available to select as a context source when starting or continuing a conversation.
The most important discipline at this stage is document quality. The knowledge base is only as good as what you put into it. Several properties of the source documents directly affect retrieval quality:
Text extractability. PDFs generated from scanned images without OCR processing contain no extractable text. The knowledge base cannot index content it cannot read. Ensure documents are text-based before adding them.
Structural clarity. Documents with clear section headings, consistent formatting, and well-defined logical units produce better chunks than dense, unstructured text. The chunking process is largely mechanical, and well-structured source material gives it more useful natural break points to work with.
Terminological consistency. If the same concept is referred to by different names in different documents, retrieval for any single term will miss the documents using alternative terminology. Where possible, normalising terminology across a document collection before indexing improves recall significantly.
Writing Effective Retrieval Queries
The retrieval step in a RAG pipeline matches your query semantically against indexed chunks. Writing queries that surface the right content is therefore part of the skill of working with a knowledge base.
Several principles govern effective retrieval queries:
Use the terminology present in the source documents. If your documents refer to a process as "onboarding workflow" and your query uses "new employee setup," the semantic similarity may be sufficient for retrieval, but it may not. Where you know the terminology used in your documents, mirror it in your queries.
Be specific rather than general. A general query such as "tell me about our data policy" will retrieve chunks from across many documents. A specific query such as "what is the retention period for customer transaction records under our data retention policy" retrieves a much smaller, more relevant set of chunks.
Decompose multi-part questions. If your query has several distinct sub-questions, each drawing on different parts of the knowledge base, consider submitting them as separate queries rather than a single compound question. Each sub-question will retrieve more targeted content.
Instruct the model to ground its response in retrieved content. Including an instruction such as "base your response only on the documents provided" or "if the retrieved documents do not contain sufficient information to answer this question, say so explicitly" prevents the model from supplementing retrieved context with parametric knowledge that may be less accurate or less relevant.
Honest Limitations of Built-In RAG
The built-in knowledge base in OpenWebUI and LibreChat is a capable tool for persistent, multi-document context management. It is not a perfect one, and understanding its limitations allows you to design around them rather than being surprised by them.
Chunk boundary failures. The chunking process splits documents at fixed intervals or at detected structural boundaries. Information that spans a chunk boundary may be split across two chunks, neither of which individually contains enough context to answer the query. This is a fundamental limitation of chunked retrieval and is most pronounced for information that is distributed across a document rather than concentrated in a single passage.
Top-K retrieval limits. The retrieval step returns a fixed number of top-ranked chunks, often between three and ten depending on configuration. If the relevant information is not in the top-K results, it will not reach the model. Improving retrieval coverage requires either tuning the retrieval parameters, improving the source documents, or accepting that some queries will require manual context supplementation.
Semantic gap between query and content. As noted above, retrieval is a semantic similarity operation. It does not understand the logical relationship between your query and the document content. A query about the consequences of a policy will not reliably retrieve the policy itself unless the policy document discusses its own consequences explicitly.
No cross-document reasoning at retrieval time. The retrieval step operates on individual chunks. It does not synthesise information across multiple chunks before presenting it to the model. If answering your query requires combining information from three separate documents, all three relevant chunks must independently rank highly enough to be retrieved. There is no retrieval-time reasoning that connects them.
Last updated
Was this helpful?