Startup Aims to Revolutionise AI Testing with Independent Evaluation System

tech360.tv
Apr 12, 2024
3 min read

Vals.ai aims to fill the gap in the tech industry by developing an independent, standardized test for evaluating AI services. The startup collaborates with researchers and industry experts to create a neutral, third-party review system for large-language models. Vals.ai has received pre-seed funding and investor interest, highlighting the demand for unbiased testing in the AI industry.

Langston Nashold and Rayan Krishnan — Credit: Sedgwick McCray

A new startup, Vals.ai, is on a mission to address a critical gap in the tech industry - the lack of an independent, standardised test to evaluate AI services. While tech companies constantly release new AI products claiming to outperform competitors, there is no objective measure to validate these claims. This has prompted Vals.ai to develop a neutral, third-party review system for large-language models, in collaboration with researchers at Stanford and industry experts in fields such as accounting, law, and finance.

The founders of Vals.ai, Rayan Krishnan and Langston Nashold, dropped out of their AI-focused master's program at Stanford to pursue their vision. Alongside founding engineer Rez Havaei, they aim to create a comprehensive evaluation framework that goes beyond the limited assessments currently available. The startup utilises academic and industry-specific datasets to formulate testing questions and ensure a thorough evaluation process.

After a successful small preview earlier this year, Vals.ai officially launched on Thursday. The startup has already secured pre-seed funding from Pear VC, with additional support from a scout investor for Sequoia. This investment highlights the growing demand for unbiased testing as more companies consider integrating AI into their operations.

The need for an independent evaluation system is evident, as AI models are increasingly being deployed in critical sectors such as healthcare and law. Krishnan emphasises that there is still uncertainty about the suitability of these models for real-world applications. Furthermore, the extensive online data on which large language models are built raises concerns that they may have encountered benchmark questions and responses beforehand, compromising the integrity of the evaluation process.

While various researchers, analysts, and AI influencers have attempted to create benchmarks and informal reviews, there is no industry consensus on the best approach or the most trusted evaluators. Vals.ai aims to fill this void by providing a standardised and reliable evaluation system that can be universally adopted.

The implications of finding a solution are significant, especially as the competition in the AI market intensifies. OpenAI, once the undisputed leader, now faces stiff competition from companies like Anthropic, Google, and Cohere. With AI companies making bold claims about their models' capabilities, external validation becomes crucial. Vals.ai's initial report already revealed shortcomings in leading models' performance on tax-related questions, highlighting the need for further scrutiny.

The data collected by Vals.ai also suggests that AI system performance may vary across industries. For instance, legal reasoning tasks received higher accuracy rates compared to tax-related queries. This finding has piqued the interest of the legal community, demonstrating the relevance and potential impact of Vals.ai's evaluation system.

Krishnan acknowledges that there is still work to be done in applying AI models to specific domains or tasks. However, he believes that these models possess immense potential and can be trained to excel in specialised areas, much like a well-educated individual who requires additional training to become an expert in a specific field.

Vals.ai aims to fill the gap in the tech industry by developing an independent, standardised test for evaluating AI services.
The startup collaborates with researchers and industry experts to create a neutral, third-party review system for large-language models.
Vals.ai has received pre-seed funding and investor interest, highlighting the demand for unbiased testing in the AI industry.

Source: BLOOMBERG