Researchers from ETH Zurich, the Bulgarian AI research institute INSAIT – founded in collaboration with ETH and EPFL – and the ETH spin-off LatticeFlow AI have provided the first comprehensive technical interpretation of the EU AI Act for General Purpose AI (GPAI) models . This makes them the first to translate the legal requirements that the EU sets for future AI models into concrete, measurable and verifiable technical requirements.
Such a translation is highly relevant for the further implementation process of the EU AI Act: the researchers present a practical approach for model developers to see how they align with future EU legal requirements. Such a translation of high-level regulatory requirements into actually actionable benchmarks has not existed to date and can thus serve as an important reference point for both model training and the currently developing EU AI Act Code of Practice.
The researchers tested their approach on twelve popular generative AI models such as ChatGPT, Llama, Claude or Mistral. After all, these large language models (LLMs) have contributed enormously to the growing popularity and spread of artificial intelligence (AI) in everyday life. because they are very capable and intuitive to use.
With the increasing spread of these – and other – AI models, the ethical and legal requirements for the responsible use of AI also increase: for example, sensitive questions arise about data protection, privacy protection and the transparency of AI models. Models should not be a ‘black box’, but should produce results that are as explainable and traceable as possible.
The implementation of the AI law must be technically clear
Moreover, they must function fairly and not discriminate against anyone. Against this background, the EU AI Law, which the EU adopted in March 2024, is the world’s first AI legislative package that comprehensively aims to maximize public trust in these technologies and minimize their unwanted risks and side effects .
“The EU AI Act is an important step towards the development of responsible and trustworthy AI,” says ETH computer scientist Martin Vechev, head of the Laboratory for Secure, Trustworthy and Intelligent Systems and founder of INSAIT, “but so far provides us with a clear and precise technical interpretation of the high-level legal requirements of the EU AI Act.
“This makes it difficult to both develop legally compliant AI models and to assess the extent to which these models actually comply with legislation.”
The EU AI Act provides a clear legal framework to contain the risks of so-called General Purpose Artificial Intelligence (GPAI). This refers to AI models that can perform a wide range of tasks. However, the law does not specify how to technically interpret the broad legal requirements. The technical standards are still being developed until the regulations for high-risk AI models come into effect in August 2026.
“However, the success of the AI Act’s implementation will largely depend on how well it succeeds in developing concrete, precise technical requirements and compliance-oriented benchmarks for AI models,” said Petar Tsankov, CEO and, together with Vechev, founder of the ETH. spin-off LatticeFlow AI, which focuses on the implementation of reliable AI in practice.
“If there is no standard interpretation of what key concepts such as safety, explainability or traceability exactly mean in (GP)AI models, it remains unclear for model developers whether their AI models are in accordance with the AI law,” adds Robin Staab . , computer scientist and PhD candidate in Vechev’s research group.
Test of twelve language models reveals shortcomings
The methodology developed by the ETH researchers provides a starting point and basis for discussion. The researchers have also developed an initial ‘compliance checker’, a set of benchmarks that can be used to assess how well AI models meet the likely requirements of the EU AI Act.
In view of the continued realization of legal requirements in Europe, the ETH researchers have made their findings public in a study published on the arXiv preprint server. They also made their results available to the EU AI Office, which plays a key role in the implementation and compliance of the AI law – and therefore also for the model evaluation.
In a study that is largely understandable even to non-experts, the researchers first clarify key terms. Starting from six central ethical principles specified in the EU AI Act (human agency, data protection, transparency, diversity, non-discrimination, fairness), they derive 12 associated, technically clear requirements and link them to 27 state-of-the-art -the-art evaluation benchmarks.
Importantly, they also identify areas where concrete technical controls for AI models are less well developed or even non-existent, encouraging researchers, model providers and regulators alike to further push for effective implementation of EU AI in these areas -law.
Initiation for further improvement
The researchers applied their benchmark approach to twelve prominent language models (LLMs). The results make it clear that none of the language models analyzed today fully meet the requirements of the EU AI Act. “Our comparison of these large language models shows that there are shortcomings, especially when it comes to requirements such as robustness, diversity and fairness,” says Staab.
This is also due to the fact that in recent years model developers and researchers have focused on general model capabilities and performance over more ethical or social requirements such as fairness or non-discrimination.
However, the researchers have found that even important AI concepts such as explainability are unclear. In practice, suitable tools are lacking to explain how the results of a complex AI model were achieved: What is not completely clear conceptually is also virtually impossible to evaluate technically.
The research makes it clear that several technical requirements, including those related to copyright infringement, cannot currently be measured reliably. For Staab, one thing is clear: “Focusing the model evaluation on capabilities alone is not enough.”
That said, the researchers are focused on more than just evaluating existing models. For them, the EU AI Act is a first example of how legislation will change the development and evaluation of AI models in the future.
“We see our work as an impetus to enable the implementation of the AI Act and to obtain practical recommendations for model providers,” says Vechev, “but our methodology can go beyond the EU AI Act, as it is also adaptable to other similar models. legislation.”
“Ultimately, we want to encourage a balanced development of LLMs, taking into account both technical aspects such as competence and ethical aspects such as fairness and inclusivity,” Tsankov adds.
The researchers are making their benchmark tool COMPL-AI available on a GitHub website to spark technical discussion. The results and methods of their benchmarking can be analyzed and visualized there. “We have published our benchmark suite as open source so that other researchers from industry and the scientific community can participate,” says Tsankov.
More information:
Philipp Guldimann et al, COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act, arXiv (2024). DOI: 10.48550/arxiv.2410.07959
arXiv
Quote: Researchers provide an LLM benchmark suite for the EU Artificial Intelligence Act (2024, October 21), retrieved October 21, 2024 from https://techxplore.com/news/2024-10-llm-benchmarking-eu-artificial-intelligence. html
This document is copyrighted. Except for fair dealing purposes for the purpose of private study or research, no part may be reproduced without written permission. The content is provided for informational purposes only.