3 Selecting and evaluating AI tools

This chapter covers

Distinctions among different types of AI, or ways of using AI, and how to select the most appropriate one
How to assess AI’s performance and select models
Common ways to measure AI’s performance at a task

This chapter provides guidance on selecting an AI model or tool and assessing its performance at a given task. We kick off by discussing three common distinctions between different types of AI: proprietary versus open source AI, off-the-shelf versus fine-tuned AI, and AI apps versus foundation models. We explain what these mean and how to pick the most suitable type. Afterward, we discuss a common process to assess AI’s performance, which uses different datasets for validation and testing. We also discuss some common performance measures such as accuracy. The appendix includes a catalog of popular generative AI tools.

Proprietary vs. open source

In proprietary AI, the user isn’t allowed to modify or even see the code that powers the underlying ML models. The inner workings of the technology are kept secret to prevent others from copying it. One common way of using proprietary AI is through customer-facing apps such as ChatGPT. These tend to charge users a monthly subscription to access the service, although some provide a free tier that grants access to a reduced number of features.

How to decide

Off-the-shelf vs. fine-tuning

How to decide

Customer-facing AI apps vs. foundation models

How to decide

Model validation, selection, and testing

Training set

Validation set

Test set

Performance measures

Accuracy

Precision and recall

Mean absolute error and root mean squared error

Summary