

One would be to use a wrapper that and treat ChatGPT has just another LLM. There would be two potential ways to do this. What models/prompts are we comparing?įirst up, we've got the standard text-davinci-003 model, with the standard VectorDBQAChain prompts. We will hold the "retrieval" step constant, so we are just evaluating the "generation" step of the chain. This chain takes a query, does a "retrieval" step to look up relevant documents in a vector store, and then does a "generation" step to pass them, along with the original query, to a model to get back an answer. In this article we will evaluate the performance of a chain on question answering over a particular dataset. Because I'm lazy, I also enrolled the help of the ChatGPT API itself to help do this evaluation.


But how does it actually compare to the existing API? It will take some time before there is a definitive answer, but here are some initial thoughts. OpenAI released a new ChatGPT API yesterday. 5 min read Photo by Tingey Injury Law Firm / Unsplash.
