Overview

The Faithfulness metric evaluates the accuracy of text generated by a model by comparing it to the provided context. This metric checks the number of truthful statements against the total number of statements in the generated text. A high faithfulness score indicates that the generated content accurately reflects the context without introducing inaccuracies or distortions.

FaithfulnessMetric leverages the evaluateFaithfulness function to compute this metric.

Methods

evaluateFaithfulness Function

This function analyzes the faithfulness of the generated text based on its alignment with the given context.

  • output: The text generated by the model.
  • context: The reference context against which the output is evaluated.

It splits the output into individual statements and assesses each statement’s truthfulness relative to the context using a pre-trained AI model. The function returns a promise that resolves to a numeric score representing the percentage of truthful statements.

FaithfulnessMetric Class

FaithfulnessMetric uses the evaluateFaithfulness function to compute the faithfulness score.

  • output: The text generated by the model.
  • context: The reference context used for evaluation.

The evaluateSteps method invokes evaluateFaithfulness and returns a detailed result including the faithfulness score and reasons, which highlights the accuracy of the generated text relative to the provided context.

Example

import { evaluate, FaithfulnessMetric } from '@evalkit/core';

evaluate({
    // The generated text from an LLM
    output: "Investing in renewable energy can lead to long-term economic benefits.",
    // The context against which to evaluate the text
    context: "Renewable energy includes sources like solar and wind power.",
}, [FaithfulnessMetric])

// outputs
{
  passed: true,
  // The number of truthful statements in the text is 1 out of 1, resulting in a faithfulness score of 1.
  score: 1,
  reasons: ['All statements in the generated text are truthful.']
}