Which AI Model Should You Use

Voxjar lets you select between our base model and GPT-4 when creating or editing your scorecards.

Our base model is fine tuned to follow instructions and does so better than the standard GPT-3.5 (chatGPT) in a call scoring environment. This model is selected by default on all scorecards but can be changed by clicking the scorecard header and then selecting a model from the dropdown menu. 

select a language model for call scoring

For most cases, the Voxjar Instruct Model (our base model) will give you fantastic results. It understands context well, will reference the transcript, and follows instructions well. This combined with prompt controls and a testing loop gives you a solid starting point to automate call scoring.

There are a few cases when you might want to use GPT-4.

(If there is another model that you would like us to add, let us know)

When to use GPT-4

GPT-4 is, by all measures, the most capable large language model as of this writing.

Because of that, you will likely get answers and reasoning based on a better understanding of your call data.

This will be especially noticeable in a handful of scenarios:

  • If your questions/answers require the AI to comprehend multiple topics
  • If your calls follow a less structured flow
  • If the AI will be scoring longer calls (30 minutes+)

It becomes pretty clear whether you need GPT-4 when you run a few tests using both models. Sometimes the difference is not noticeable, and others you'll see the difference right away.

GPT-4 Caveats

There are a couple of tradeoffs for the impressiveness of GPT-4.

GPT-4 is a premium language model and uses 2x the credits vs our base model.

GPT-4 is up to 10x slower than our base model. This is usually not an issue, but delays on testing and manually queued AI evaluations will be up to a minute slower.