Trained on 200 websites, where info exist, answers are horrendously inconsistant.
I suspect the issues are:
  • Vector encoding : OpenAI using up to 20 chunks with certain token overlap to formulate replies. From image, GHL only uses 1 to 3 knowledge chunk with no token overlap. Results in high hallucination when there is no context..
OpenAI defaults to min of 5 chunks for 3.5 turbo and suggest 20 chunks for gpt 4 models. At GHL max 3 chunks and very frequently, i only see one 1 website or faq source where info it pulled, THIS IS WHY IT UNDER-PERFORMS.
  • Model : GPT 4.0 turbo has limited intelligence and issue of "lost in the middle" info gap. please switch to GPT 4.1 asap. Its as cheap and wayyyy better.
  • FAQ: When FAQ is added, it defaults to FAQ and ignores chunks rather than considering FAQ with Chunks. Result in wonky answers.
If cost is an issue, allow users to select model based on their needs and charge a markup accordingly.
Infrastructure is great but model inability to answer correctly is a BIG show stopper.
Either that or let us plug in our own assistants and make usable chatbots tapping on GHL infrastructure.
Suggestions:
  • Model : 4o or 4.1
  • Vector encoding : 20 chunks (minimum) 300 tokens per chunk with 15% overlap token redundancy.
  • Check Knowledge Chunk Code
> Seems to be pulling chunk from only 1 source. (1 website, or 1 FAQ) Unable to pull from multiple parts of 1 website? or multiple websites / FAQs.
> Should be able to pull up to 20 small chunks of relavant text from a consolidated knoweldge base / text body.
  • FAQ be done as a system prompt or part of another vector encoding file (preferred). Internal prompt to reference FAQ files as priority since it is based on feedback.
  • Internal hardcode to always process vector stores and consider chunks when formulating answers. This way, it has the info it need. Prompt to specify that if info is not in the chunks, no product or service exists.
  • API Option to allow bot to query the internet to formulate replies.