Overview

The Dynamic metric assesses how well the actual output from a model aligns with an expected output based on a set of specified criteria. This evaluation is dynamic as it considers various criteria such as accuracy, relevance, and other aspects tailored to the specific evaluation needs. The metric provides a nuanced insight into the model’s performance by analyzing each criterion separately.

DynamicMetric utilizes the evaluateDynamic function to conduct this detailed analysis.

Methods

evaluateDynamic Function

This function evaluates the alignment of the actual output with the expected output based on dynamic criteria specified for the evaluation.

  • input: The original input question or statement provided to the model.
  • actualOutput: The actual output generated by the model in response to the input.
  • expectedOutput: The expected output that ideally should be generated from the input.
  • criteria: An array of criteria used to evaluate the output. Each criterion details an aspect of the evaluation such as accuracy or relevance.

It generates a detailed prompt for OpenAI’s model to analyze the texts and provides a numerical score and reasons for each criterion. The function returns a promise that resolves to an array of evaluation results for each criterion.

DynamicMetric Class

DynamicMetric leverages the evaluateDynamic function to provide a comprehensive evaluation of the model’s output compared to the expected output.

  • input: The original input provided to the model.
  • actualOutput: The actual output from the model.
  • expectedOutput: The expected output ideally generated from the input.
  • criteria: The evaluation criteria.

The evaluateSteps method processes the evaluation and returns an overall score based on the average scores across the specified criteria, along with detailed reasons for each criterion’s score.

Example

import { evaluate, DynamicMetric, DynamicEvaluationCriteria } from '@evalkit/core';

const criteria = [
    { type: "Accuracy" },
    { type: "Relevance" }
];

evaluate({
    // Original input question provided to the model
    input: "What is the economic impact of renewable energy?",
    // Actual output generated by the model
    actualOutput: "Renewable energy creates jobs and reduces greenhouse gas emissions.",
    // Expected output ideally generated from the input
    expectedOutput: "Renewable energy significantly contributes to the economy by creating jobs and driving sustainable practices.",
    // The evaluation criteria
    criteria: criteria,
}, [DynamicMetric])

// outputs
{
  passed: false,
  // The score is based on number of passed criteria divided by total criteria
  score: 0,
  results: [
    {
      criteria: 'Accuracy',
      score: 0.5,
      reason: 'The actual output partially addresses the economic impact by mentioning job creation but misses the point of driving sustainable practices.',
      passed: false
    },
    {
      criteria: 'Relevance',
      score: 0.5,
      reason: 'The actual output addresses part of the economic impact by mentioning job creation but lacks information about driving sustainable practices which is mentioned in the expected output.',
      passed: false
    }
  ]
}