Evaluating Model Behavior
[To be completed following discussion of changes to evaluation pipeline.]
Evaluating LLM behavior is a core OWL function, and one that requires more technical set-up than other functions. Each evaluation runs tests against specified values for (1) a selected books portfolio, using (2) selected LLM(s) and (3) a selected constitution. The key steps of an evaluation are:
- (optional) Setting the evaluation parameters
- Choosing LLMs, constitution and tests
- Running the evaluation.
Setting the Evaluation Parameters
You can optionally control two sets of parameters for evaluations via your Settings, which are available via the drop-down menu from your account name at the right side of the top menu.
[To be completed.]
Choosing LLM(s), Constitution and Tests
[To be completed.]
Running the Evaluation
[To be completed.]