Evaluating Generative AI Technologies

A NIST evaluation program to support research in Generative AI technologies.

GenAI Challenge Problem Overview

NIST GenAI is a new evaluation program administered by the NIST Information Technology Laboratory to assess generative AI technologies developed by the research community from around the world. NIST GenAI is an umbrella program that supports various evaluations for research and measurement science in Generative AI by providing a platform for Test and Evaluation.

The objectives of the NIST GenAI evaluation include but are not limited to:

Evolving benchmark dataset creation,
Facilitating the development of content authenticity detection technologies for different modalities (text, audio, image, video, code),
Conducting a comparative analysis using relevant metrics, and
Promoting the development of technologies for identifying the source of fake or misleading information.

NIST GenAI Pilot

The pilot study aims to measure and understand system behavior for discriminating between synthetic and human-generated content in the GenAI Text and GenAI Image modalities. This pilot addresses the research question of how human content differs from synthetic content, and how the evaluation findings can guide users in differentiating between the two. The generator task creates high-quality outputs while the discriminator task detects if a target output was generated by AI models or humans.

Generator teams will be tested on their system's ability to generate synthetic content that is indistinguishable from human-produced content.

Discriminator teams will be tested on their system's ability to detect synthetic content created by generative AI models including large language models (LLMs) and deepfake tools.

Pilot evaluations provide valuable lessons for future research on cutting-edge technologies and guidance for responsible and safe use of digital content.

Schedule

March 13 2025	GenAI Image Launch
March 19 2025	GenAI Image Registration Open
March 26 2025	GenAI T2T Workshop
April (TBD) 2025	GenAI T2T Report Available