LONDON, Oct 16 (Reuters) – Some of the most prominent artificial intelligence models are failing to comply with European regulations in key areas such as cybersecurity resilience and discriminatory output, according to Reuters data.
Assigning each model a score between 0 and 1, a leaderboard published Wednesday by LatticeFlow showed that models developed by Alibaba, Anthropic, OpenAI, Meta and Mistral all received an average score of 0.75 or higher.
However, the company’s ‘Large Language Model (LLM) Checker’ revealed the shortcomings of some models in key areas, highlighting where companies may need to dedicate resources to ensure compliance.
Companies that fail to comply with the AI law face fines of 35 million euros ($38 million), or 7% of global annual turnover.
MIXED RESULTS
But LatticeFlow’s test, developed in collaboration with researchers from Swiss university ETH Zurich and Bulgarian research institute INSAIT, provides an early indicator of specific areas where tech companies are at risk of non-compliance.
For example, discriminatory output has been a persistent problem in the development of generative AI models, which mirror human biases in gender, race, and other areas when prompted.
When testing for “prompt hijacking,” a type of cyberattack in which hackers disguise a malicious prompt as legitimate to extract sensitive information, the LLM Checker awarded Meta’s “Llama 2 13B Chat” model a score of 0.42. In the same category, French startup Mistral’s “8x7B Instruct” model received a score of 0.38.
The test is designed in line with the text of the AI Act and will be expanded to include further enforcement measures as they are introduced. LatticeFlow said the LLM Checker would be available for free for developers to test the compliance of their models online.
Petar Tsankov, the company’s CEO and co-founder, told Reuters that the test results were generally positive and offered companies a roadmap to help them refine their models in line with the AI law.
“The EU is still working out all the compliance benchmarks, but we are already seeing some gaps in the models,” he said. “With an increased focus on optimizing compliance, we believe model providers can be well prepared to meet regulatory requirements.”
Meta declined to comment. Alibaba, Anthropic, Mistral and OpenAI did not immediately respond to requests for comment.
Although the European Commission cannot verify external instruments, the agency has been kept informed throughout the development of the LLM Checker and has described it as a “first step” in putting the new laws into practice.
A spokesperson for the European Commission said: “The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI law into technical requirements.”
($1 = 0.9173 euros)
Sign up here.
Reporting by Martin Coulter; Editing by Hugh Lawson
Our Standards: Thomson Reuters Trust Principles.