⚠🤖 Conversation AI Still not reliable in April 2025
J
Joshua Zhang
Trained on 200 websites, where info exist, answers are horrendously inconsistant.
I suspect the issues are:
- Vector encoding : OpenAI using up to 20 chunks with certain token overlap to formulate replies. From image, GHL only uses 1 to 3 knowledge chunk with no token overlap. Results in high hallucination when there is no context..
OpenAI defaults to min of 5 chunks for 3.5 turbo and suggest 20 chunks for gpt 4 models. At GHL max 3 chunks and very frequently, i only see one 1 website or faq source where info it pulled, THIS IS WHY IT UNDER-PERFORMS.
- Model : GPT 4.0 turbo has limited intelligence and issue of "lost in the middle" info gap. please switch to GPT 4.1 asap. Its as cheap and wayyyy better.
- FAQ: When FAQ is added, it defaults to FAQ and ignores chunks rather than considering FAQ with Chunks. Result in wonky answers.
If cost is an issue, allow users to select model based on their needs and charge a markup accordingly.
Infrastructure is great but model inability to answer correctly is a BIG show stopper.
Either that or let us plug in our own assistants and make usable chatbots tapping on GHL infrastructure.
Suggestions:
- Model : 4o or 4.1
- Vector encoding : 20 chunks (minimum) 300 tokens per chunk with 15% overlap token redundancy.
- Check Knowledge Chunk Code
> Seems to be pulling chunk from only 1 source. (1 website, or 1 FAQ) Unable to pull from multiple parts of 1 website? or multiple websites / FAQs.
> Should be able to pull up to 20 small chunks of relavant text from a consolidated knoweldge base / text body.
- FAQ be done as a system prompt or part of another vector encoding file (preferred). Internal prompt to reference FAQ files as priority since it is based on feedback.
- Internal hardcode to always process vector stores and consider chunks when formulating answers. This way, it has the info it need. Prompt to specify that if info is not in the chunks, no product or service exists.
- API Option to allow bot to query the internet to formulate replies.
Log In
J
Joshua Zhang
Abhishek Kumar
Following up with our conversation on zoom.
I have detailed a list of finding of OpenAi replies VS ConversationAI and sent them to you over email.
~~~~~~~~~~~~
Executive Summary:
~~~~~~~~~~~~
Finding 1 :
Filter top 10 chunks above 0.4 score allowed for LLM consideration. (see test findings below).
Weak chunks @ 0.4 may have weak match but its still a match. Some context is better than no context leading to hallucination.
Finding 2 :
Check RAG bug where only 1 chunk per file is only allowed. System should allow multiple chunks from same source file.
Finding 3 :
Allow for 50% chunk overlap so LLM can cross reference. OpenAI recommends this whether files are in small, short text or we combine all knowledge base info into 1 large .txt file.
~~~~~~~~~~~~
Let me know if we need another call to run through the findings or if you all need any help.
Really wish to use Converation AI with its native LLM integrations instead of own openai assistant webhook.
Hope this findings help you guys tune your conversation AI settings.
J
Joshua Zhang
Abhishek Kumar
Sorry took awhile to get back.
Sent you an email to find a common time to discuss (Retrieval-Augmented Generation) issues with Conversation AI with your engineers.
- Recommended number of chunks
- Chunk % overlap
A
Abhishek Kumar
hey joshua please schedule a call with me https://speakwith.us/abhishekkumar
J
Joshua Zhang
Abhishek Kumar scheduled.
J
Joshua Zhang
Abhishek Kumar am in waiting room
A
Abhishek Kumar
Joshua Zhang joining in 10 min
J
Joshua Zhang
Abhishek Kumar thanks for the call last week.
When is your engineer back?
Dropped you 2 emails.
1 on the calendars
1 on the Conversation AI chunk / fix
A
Abhishek Kumar
Joshua Zhang I will schedule a call this week.