Installation

EvalKit exports multiple NPM packages.
We’ll concentrate on the evaluations framework package, which is available at:

npm install --save-dev @evalkit/core

OpenAI Configuration

Currently, EvalKit is using OpenAI API in order to perform evaluations.
Therefore, you’ll need to set up an OpenAI account and get an API key. Or configure it otherwise

Standard OpenAI

Simply set your OpenAI API key in your environment:

OPENAI_API_KEY=your-api-key

Or if you prefer to set it programmatically:

import { config } from '@evalkit/core';

config.setOpenAIConfig({
  apiKey: process.env.OPENAI_KEY
});

Custom OpenAI/Azure OpenAI

For custom OpenAI endpoints (like Azure OpenAI), use the config:

import { config } from '@evalkit/core';

config.setOpenAIConfig({
  apiKey: process.env.MY_AZURE_KEY,
  baseURL: process.env.MY_AZURE_ENDPOINT,
  apiVersion: '2023-05-15',  // Optional, defaults to 2023-05-15
  deploymentName: process.env.MY_DEPLOYMENT
});

The configuration is environment-agnostic - use any environment variables or configuration source you prefer. Required fields for custom endpoints:

  • apiKey
  • baseURL
  • deploymentName (for Azure OpenAI)

Configuration File

You can also configure EvalKit using an evalkit.config.ts (or .js) file in your project root:

// evalkit.config.ts
import type { EvalKitConfig } from '@evalkit/core';

const config: EvalKitConfig = {
  openai: {
    // Your OpenAI API key (defaults to OPENAI_API_KEY env var)
    apiKey: process.env.OPENAI_API_KEY,
    
    // Optional: For Azure OpenAI
    // baseURL: 'https://your-resource.openai.azure.com',
    // apiVersion: '2023-05-15',
    // deploymentName: 'your-deployment',
  },
  
  reporting: {
    // Which report formats to generate ('json' and/or 'html')
    // Leave empty for console-only output
    outputFormats: ['json', 'html'],
    
    // Where to save the report files
    outputDir: './eval-reports'
  }
};

export default config;

The config file supports:

OpenAI Settings

Same as the programmatic configuration above.

Reporting Settings

  • outputFormats: Array of report formats to generate
    • []: Console output only (default)
    • ['json']: JSON reports
    • ['html']: HTML reports
    • ['json', 'html']: Both formats
  • outputDir: Directory where reports will be saved (default: ’./eval-reports’)

Writing your first evaluation

The EvalKit evaluations framework provides a simple API to evaluate text using pre-trained transformer models. Here’s a quick example to get you started:

import { evaluate, BiasDetectionMetric } from '@evalkit/core';

// Simple evaluation
const result = await evaluate({
  output: "The company prefers to hire young and energetic candidates.",
}, [BiasDetectionMetric]);

// Advanced evaluation with multiple metrics
const result = await evaluate({
  input: "What are the benefits of renewable energy?",
  output: "Renewable energy sources reduce emissions and help fight climate change.",
  expectedOutput: "Renewable energy sources like solar and wind power reduce greenhouse gas emissions and help combat climate change.",
  criteria: [{ type: "Accuracy" }, { type: "Relevance" }]
}, [RelevancyMetric, DynamicMetric]);