Skip to content

AI Agency Osnabrück Automation & Workflows BFSG-compliant web design Westerkappeln · Region within 100 km of Osnabrück · Germany-wide

DigElite Chatbots · Knowledge base · RAG

Chatbot with its own Knowledge database (RAG).

The knowledge base of a DigElite chatbot is built from the customer's own documents—PDF manuals, Word files, website content, FAQs, bylaws, and OZG service descriptions—and queried using a Retrieval Augmented Generation (RAG) layer. Each answer optionally includes the source (document + section). If the system cannot find a suitable source, the chatbot honestly states "I have no information on that" instead of speculating—this threshold is configurable for each application.

What RAG can technically achieve

Answers from your documents — not from the model knowledge.

Retrieval-Augmented Generation is the architectural answer to the hallucination problem of classic chatbots. Instead of having the language model answer itself, we first search for the relevant sections in the customer documents, pass them to the model, and then let it formulate an answer from them.

1 — Indexing

Documents are broken down into sections (chunks), semantically encoded as vector embedding, and stored in a local vector database (e.g., PostgreSQL with pgvector, Qdrant, Chroma). Everything is hosted on the customer's server.

2 — Retrieval

For each query, the question is converted into an embedding and compared with the vector database. The most similar sections (typically 3–5) are compiled as context for the answer.

3 — Response Generation

The LLM (Aleph Alpha, Mistral, Llama) receives a question and contextual sections and writes an answer based solely on these sections. Sources are also included—traceability is built in, not added later.

Which document formats work?

From what you already have.

  • PDF — Manuals, user guides, white papers, OZG service descriptions, statutes.
  • Word / RTF — internal documentation, contribution regulations, office FAQs.
  • Excel / CSV — structured tables (e.g. contribution levels, event schedules).
  • Website content — crawled or directly from the WordPress content (posts, pages, custom post types).
  • Markdown / Plain Text — Wiki content, Confluence exports, GitHub documentation.
  • API sources — optional connection to existing knowledge APIs (e.g., internal CRM data, product databases).
Hallucination protection

„"I have no information on that" — as a feature, not as a bug.

The biggest risk of traditional AI chatbots is the free invention of answers ("hallucination"). With DigElite, the answer generator is contractually bound: it may only respond if the retrieval layer finds matching sources. Below a configurable similarity threshold, the chatbot explicitly responds, "I don't have any information on that in our knowledge base—would you like to speak with a member of staff?"—and hands over the conversation in a structured manner.

„"A chatbot that freely invents things is dangerous. We build chatbots that honestly say when they don't know something — that's the most important quality a business chatbot can have.""

— Philipp Herrmann, founder of DigElite

Frequently Asked Questions

What potential customers should ask before deployment.

What if our documents change frequently?

The knowledge base can be updated incrementally: You upload a new PDF or change a page—the system only re-indexes the changed sections; no complete rebuild is necessary. For highly dynamic content (e.g., dates, prices), knowledge sources can be directly connected to a database or API so that updates take effect in real time.

Where is the knowledge data physically located?

In your own WordPress database or the local vector database on your server. Nothing is sent to an external indexing provider. If you are using an API-LLM (Aleph Alpha, Mistral La Plateforme), only the relevant contextual sections are transmitted along with the question during operation—no blanket transmission of your entire knowledge base.

How large can the knowledge base be?

Typical sizes for SMEs, associations, and public administrations (hundreds to a few thousand documents, 10–500 MB of text) are not a problem. Vector databases scale up to the millions of chunks. For very large knowledge bases, we will discuss in an initial consultation whether segmentation into several areas is advisable.

Can we restrict source visibility per user group?

Yes. The knowledge base is segmentable (e.g., "public," "members," "internal staff"). In multi-tenant deployments (umbrella organization with regional associations), a regional association user sees only their own knowledge segments plus the overarching content of the umbrella organization. Permissions are controlled via WordPress roles.

15 minutes is enough to get an impression.

We'll be live-chatting with our own chatbot on nordzypern.live and showing you how it responds to real documents, when it honestly says "I don't know," and how it hands the call off to a human. No sales pitch, no Slide 47.

Watch the chatbot live & get an initial consultation
Book an appointment