Evaluating Model Behavior

Evaluating LLM behavior is a core OWL function, and one that requires more technical set-up than other functions. Each evaluation runs tests against specified values for (1) a selected books portfolio, using (2) selected LLM(s) and (3) a selected constitution. The key steps of an evaluation are:

Setting the Evaluation Parameters

You can optionally control two sets of parameters for evaluations via your Settings, which are available via the drop-down menu from your account name at the right side of the top menu.

Choosing Book Portfolio, Constitution, LLM(s) and Tests

Evaluations are tied to book portfolios, so the first step in setting up an evaluation is to choose a portfolio from Public Portfolios or My Portfolios. After opening the portfolio by clicking View, choose Start Evaluation.

To set up the evaluation details, choose the relevant values in the fields for:

Running the Evaluation

To run an evaluation, click Start Evaluation at the bottom of the Start Portfolio Evaluation screen.

The evaluation results appear on screen, and can be accessed later from the portfolio screen by clicking Evaluations. The evaluation results include a substantial amount of detail about the evaluation process, including explanations of scores.