Was, wenn unsere Dokumente sich häufig ändern?

Die Wissensbasis ist inkrementell aktualisierbar: Sie laden ein neues PDF hoch oder ändern eine Seite — das System indiziert nur die geänderten Abschnitte neu, kein kompletter Re-Bau nötig. Bei sehr dynamischen Inhalten (z. B. Termine, Preise) lassen sich Wissensquellen direkt an eine Datenbank oder API anbinden, damit Aktualisierungen in Echtzeit greifen.

Wo liegen die Wissens-Daten physisch?

In Ihrer eigenen WordPress-Datenbank bzw. der lokalen Vektordatenbank auf Ihrem Server. Nichts wird zu einem externen Indexierungs-Anbieter geschickt. Wenn Sie ein API-LLM (Aleph Alpha, Mistral La Plateforme) nutzen, werden im laufenden Betrieb nur die jeweils relevanten Kontext-Abschnitte mit der Frage zusammen übertragen — keine pauschale Komplett-Übermittlung Ihrer Wissensbasis.

Wie groß darf die Wissensbasis sein?

Für KMU/Verbände/Verwaltungen typische Größen (hunderte bis wenige tausend Dokumente, 10–500 MB Text) sind unproblematisch. Vektordatenbanken skalieren bis in den Millionen-Chunk-Bereich. Bei sehr großen Wissensbasen prüfen wir vorab im Erstgespräch, ob eine Segmentierung in mehrere Bereiche sinnvoll ist.

Können wir Quellen-Sichtbarkeit pro Nutzer-Gruppe einschränken?

Ja. Die Wissensbasis ist segmentierbar (z. B. „öffentlich", „Mitglieder", „interne Mitarbeiter"). Bei mehrmandantenfähigem Einsatz (Dachverband mit Landesverbänden) sieht ein Landesverbands-Nutzer nur seine eigenen Wissens-Segmente plus die übergeordneten Dachverbands-Inhalte. Die Berechtigungssteuerung läuft über die WordPress-Rollen.

DigElite Chatbots · Knowledge base · RAG

Chatbot with its own Knowledge database (RAG).

The knowledge base of a DigElite chatbot is built from the customer's own documents—PDF manuals, Word files, website content, FAQs, bylaws, and OZG service descriptions—and queried using a Retrieval Augmented Generation (RAG) layer. Each answer optionally includes the source (document + section). If the system cannot find a suitable source, the chatbot honestly states "I have no information on that" instead of speculating—this threshold is configurable for each application.

Watch the chatbot live Go to chatbot overview

What RAG can technically achieve

Answers from your documents — not from the model knowledge.

Retrieval-Augmented Generation is the architectural answer to the hallucination problem of classic chatbots. Instead of having the language model answer itself, we first search for the relevant sections in the customer documents, pass them to the model, and then let it formulate an answer from them.

1 — Indexing

Documents are broken down into sections (chunks), semantically encoded as vector embedding, and stored in a local vector database (e.g., PostgreSQL with pgvector, Qdrant, Chroma). Everything is hosted on the customer's server.

2 — Retrieval

For each query, the question is converted into an embedding and compared with the vector database. The most similar sections (typically 3–5) are compiled as context for the answer.

3 — Response Generation

The LLM (Aleph Alpha, Mistral, Llama) receives a question and contextual sections and writes an answer based solely on these sections. Sources are also included—traceability is built in, not added later.

Which document formats work?

From what you already have.

PDF — Manuals, user guides, white papers, OZG service descriptions, statutes.
Word / RTF — internal documentation, contribution regulations, office FAQs.
Excel / CSV — structured tables (e.g. contribution levels, event schedules).
Website content — crawled or directly from the WordPress content (posts, pages, custom post types).
Markdown / Plain Text — Wiki content, Confluence exports, GitHub documentation.
API sources — optional connection to existing knowledge APIs (e.g., internal CRM data, product databases).

Hallucination protection

„"I have no information on that" — as a feature, not as a bug.

The biggest risk of traditional AI chatbots is the free invention of answers ("hallucination"). With DigElite, the answer generator is contractually bound: it may only respond if the retrieval layer finds matching sources. Below a configurable similarity threshold, the chatbot explicitly responds, "I don't have any information on that in our knowledge base—would you like to speak with a member of staff?"—and hands over the conversation in a structured manner.

„"A chatbot that freely invents things is dangerous. We build chatbots that honestly say when they don't know something — that's the most important quality a business chatbot can have.""

— Philipp Herrmann, founder of DigElite

Frequently Asked Questions

What potential customers should ask before deployment.

What if our documents change frequently?

The knowledge base can be updated incrementally: You upload a new PDF or change a page—the system only re-indexes the changed sections; no complete rebuild is necessary. For highly dynamic content (e.g., dates, prices), knowledge sources can be directly connected to a database or API so that updates take effect in real time.

Where is the knowledge data physically located?

In your own WordPress database or the local vector database on your server. Nothing is sent to an external indexing provider. If you are using an API-LLM (Aleph Alpha, Mistral La Plateforme), only the relevant contextual sections are transmitted along with the question during operation—no blanket transmission of your entire knowledge base.

How large can the knowledge base be?

Typical sizes for SMEs, associations, and public administrations (hundreds to a few thousand documents, 10–500 MB of text) are not a problem. Vector databases scale up to the millions of chunks. For very large knowledge bases, we will discuss in an initial consultation whether segmentation into several areas is advisable.

Can we restrict source visibility per user group?

Yes. The knowledge base is segmentable (e.g., "public," "members," "internal staff"). In multi-tenant deployments (umbrella organization with regional associations), a regional association user sees only their own knowledge segments plus the overarching content of the umbrella organization. Permissions are controlled via WordPress roles.

Where you can continue reading.

This feature is part of the DigElite chatbot family — check it out. Product Overview or the thematically related clusters.

German and European LLMs

RAG only unfolds its full effect with a suitable model.

Learn more →

Service chat on the website

Most common channel for a RAG knowledge base.

Learn more →

DigElite Chatbots — Overview

Pillar with all cluster topics.

Learn more →

15 minutes is enough to get an impression.

We'll be live-chatting with our own chatbot on nordzypern.live and showing you how it responds to real documents, when it honestly says "I don't know," and how it hands the call off to a human. No sales pitch, no Slide 47.

Watch the chatbot live & get an initial consultation