Exclusive: EU AI Act checker reveals Big Tech’s compliance pitfalls

16 October، 2024

27

New AI checker tests models for EU compliance
Some AI models scored low on cybersecurity and discriminatory output
Non-compliance may result in fines amounting to 7% of annual turnover

LONDON, Oct 16 (Reuters) – Some of the most prominent artificial intelligence models are failing to comply with European regulations in key areas such as cybersecurity resilience and discriminatory output, according to Reuters data.

The EU had long debated new AI regulations before OpenAI released ChatGPT to the public in late 2022. The record-breaking popularity and subsequent public debate over the supposed existential risks of such models spurred lawmakers to create specific rules around “general” AIs. (GPAI).

Now a new tool, designed by Swiss startup LatticeFlow and partners, and backed by European Union officials, has tested generative AI models developed by major tech companies like Meta (META.O)opens a new tab and OpenAI across dozens of categories, in line with the bloc’s far-reaching AI law, which will come into effect in phases over the next two years.

Assigning each model a score between 0 and 1, a leaderboard published Wednesday by LatticeFlow showed that models developed by Alibaba, Anthropic, OpenAI, Meta and Mistral all received an average score of 0.75 or higher.

However, the company’s ‘Large Language Model (LLM) Checker’ revealed the shortcomings of some models in key areas, highlighting where companies may need to dedicate resources to ensure compliance.

Companies that fail to comply with the AI law face fines of 35 million euros ($38 million), or 7% of global annual turnover.

MIXED RESULTS

Currently, the EU is still trying to determine how the AI Act’s rules around generative AI tools like ChatGPT will be enforced, convening experts to draw up a code of practice for the technology by spring 2025.

But LatticeFlow’s test, developed in collaboration with researchers from Swiss university ETH Zurich and Bulgarian research institute INSAIT, provides an early indicator of specific areas where tech companies are at risk of non-compliance.

For example, discriminatory output has been a persistent problem in the development of generative AI models, which mirror human biases in gender, race, and other areas when prompted.

When testing for discriminative output, LatticeFlow’s LLM Checker gave OpenAI’s “GPT-3.5 Turbo” a relatively low score of 0.46. For the same category, Alibaba Cloud’s (9988.HK)opens a new tab The model “Qwen1.5 72B Chat” received only a 0.37.

When testing for “prompt hijacking,” a type of cyberattack in which hackers disguise a malicious prompt as legitimate to extract sensitive information, the LLM Checker awarded Meta’s “Llama 2 13B Chat” model a score of 0.42. In the same category, French startup Mistral’s “8x7B Instruct” model received a score of 0.38.

“Claude 3 Opus”, a model developed by Google, supported (GOOGL.O)opens a new tab Anthropic, received the highest average score, 0.89.

The test is designed in line with the text of the AI Act and will be expanded to include further enforcement measures as they are introduced. LatticeFlow said the LLM Checker would be available for free for developers to test the compliance of their models online.

Petar Tsankov, the company’s CEO and co-founder, told Reuters that the test results were generally positive and offered companies a roadmap to help them refine their models in line with the AI law.

“The EU is still working out all the compliance benchmarks, but we are already seeing some gaps in the models,” he said. “With an increased focus on optimizing compliance, we believe model providers can be well prepared to meet regulatory requirements.”

Meta declined to comment. Alibaba, Anthropic, Mistral and OpenAI did not immediately respond to requests for comment.

Although the European Commission cannot verify external instruments, the agency has been kept informed throughout the development of the LLM Checker and has described it as a “first step” in putting the new laws into practice.

A spokesperson for the European Commission said: “The Commission welcomes this study and AI model evaluation platform as a first step in translating the EU AI law into technical requirements.”

($1 = 0.9173 euros)

Sign up here.

Reporting by Martin Coulter; Editing by Hugh Lawson

Our Standards: Thomson Reuters Trust Principles.opens a new tab

Buy licensing rights

Source link

Exclusive: EU AI Act checker reveals Big Tech’s compliance pitfalls

MIXED RESULTS

Artificial Intelligence (AI) | The Guardian

They are currently cleaning up the internet – OpEd – Eurasia Review

AI reveals that Trump’s language is both uniquely simplistic and divisive

Most Popular

Artificial Intelligence (AI) | The Guardian

They are currently cleaning up the internet – OpEd – Eurasia Review

AI reveals that Trump’s language is both uniquely simplistic and divisive

Billionaire Philippe Laffont of Coatue dumps Nvidia and Palantir shares and dives into this artificial intelligence (AI) infrastructure juggernaut

EDITOR PICKS

Artificial Intelligence (AI) | The Guardian

They are currently cleaning up the internet – OpEd – Eurasia Review

AI reveals that Trump’s language is both uniquely simplistic and divisive

POPULAR POSTS

Artificial Intelligence (AI) | The Guardian

They are currently cleaning up the internet – OpEd – Eurasia Review

AI reveals that Trump’s language is both uniquely simplistic and divisive

POPULAR CATEGORY

ABOUT US

FOLLOW US