SecurityBrief Canada - Technology news for CISOs & cybersecurity decision-makers
Story image

New report reveals major security flaws in multimodal AI models

Today

Enkrypt AI has released a report detailing new vulnerabilities in multimodal AI models that could pose risks to public safety.

The Multimodal Safety Report by Enkrypt AI unveils significant security failures in the way generative AI systems handle combined text and image inputs. According to the findings, these vulnerabilities could allow harmful prompt injections hidden within benign images to bypass safety filters and trigger the generation of dangerous content.

The company's red teaming exercise evaluated several widely used multimodal AI models for their vulnerability to harmful outputs. Tests were conducted across various safety and harm categories as outlined in the NIST AI Risk Management Framework. The research highlighted how recent jailbreak techniques exploit the integration of text and images, leading to the circumvention of existing content filters.

"Multimodal AI promises incredible benefits, but it also expands the attack surface in unpredictable ways," said Sahil Agarwal, Chief Executive Officer of Enkrypt AI. "This research is a wake-up call: the ability to embed harmful textual instructions within seemingly innocuous images has real implications for enterprise liability, public safety, and child protection."

The report focused on two multimodal models developed by Mistral—Pixtral-Large (25.02) and Pixtral-12b. Enkrypt AI's analysis found that these models are 60 times more likely to generate child sexual exploitation material (CSEM)-related textual responses compared to prominent alternatives such as OpenAI's GPT-4o and Anthropic's Claude 3.7 Sonnet. The findings raise concerns about the lack of sufficient safeguards in certain AI models handling sensitive data.

In addition to CSEM risks, the study revealed that these models were 18 to 40 times more susceptible to generating chemical, biological, radiological, and nuclear (CBRN) information when tested with adversarial inputs. The vulnerability was linked not to malicious text prompts but to prompt injections concealed within image files, indicating that such attacks could evade standard detection and filtering systems.

These weaknesses threaten to undermine the intended purposes of generative AI and call attention to the necessity for improved safety alignment across the industry. The report emphasises that such risks are present in any multimodal model lacking comprehensive security measures.

Based on the findings, Enkrypt AI urges AI developers and enterprises to address these emerging risks promptly. The report outlines several recommended best practices, including integrating red teaming datasets into safety alignment processes, conducting continuous automated stress testing, deploying context-aware multimodal guardrails, establishing real-time monitoring and incident response systems, and creating model risk cards to transparently communicate potential vulnerabilities.

"These are not theoretical risks," added Sahil Agarwal. "If we don't take a safety-first approach to multimodal AI, we risk exposing users—and especially vulnerable populations—to significant harm."

Enkrypt AI's report also provides details about its testing methodology and suggested mitigation strategies for organisations seeking to reduce the risk of harmful prompt injection attacks within multimodal AI systems.

Follow us on:
Follow us on LinkedIn Follow us on X
Share on:
Share on LinkedIn Share on X