🚤 Marine Bench

Can AI answer 25 questions about Regulator Marine boats?

The Experiment

We asked 6 top LLMs to answer 25 specific questions from a Regulator Marine product catalog.

Model	Score	Accuracy
Grok 4.1 Fast	1/25	4%
Gemini 3 Flash	3/25	12%
Claude Opus 4.5	5/25	20%
DeepSeek v3.2	5/25	20%
GPT-5.2 (Instant)	8/25	32%
Gemini 3 Flash + Search	8/25	32%
Gemini 3 Pro	11/25	44%
🎯 With RAG Context	25/25	100%

44%

Best model without context. Even Gemini 3 Pro fails more than half the questions.

100%

Any model with RAG. Context quality matters more than model choice.

32%

Web search isn't enough. Generic web data ≠ your proprietary docs.

Model choice matters less than having the right context.
A 4% Grok with your docs beats a 44% Gemini without them.