Research / ARGUS

Autonomous Role-based Guard for UI Security.

Academic ResearchNortheastern UniversityIDOR DetectionLLM Agents

ARGUS is a lightweight, browser-driven agentic AI framework designed to automate the detection of Insecure Direct Object Reference (IDOR) vulnerabilities — the most prevalent form of Broken Access Control in modern web applications.

By combining multimodal screenshot analysis with cross-role API probing through an Eyes-Brain-Hands architecture, ARGUS bridges the gap between static scanners and manual penetration testing — operating entirely as a black-box tool requiring no source code access.

Download Paper View Poster

Abstract

Broken Access Control remains the #1 risk in the OWASP Top 10:2025, with 100% of applications tested exhibiting some form of vulnerability. This paper introduces ARGUS — a browser-based multi-agent system that combines Large Language Models for reasoning, Playwright for browser automation, and Retrieval-Augmented Generation (RAG) to maintain context across multi-step user flows.

Evaluated against OWASP Juice Shop, ARGUS demonstrates that model selection and prompt specificity both materially affect detection reliability and efficiency. Lighter models require explicit workflow scaffolding; stronger models may perform better without it.

Authors

Jubril A. Akanbi

Northeastern University

Linghe Zhou

Northeastern University

Yiyang Wang

Northeastern University

Dr. Maryam Tanha

Northeastern University (Supervisor)

Industry Partner: GreenHat Security — advisory and financial support.

System Architecture

Eyes — Perception Layer

Captures screenshots, DOM content, and network traffic through Playwright to build a comprehensive picture of application state at every step.

Brain — Reasoning Engine

Three specialized LLM agents — Discovery, Strategy, and Analyzer — coordinated by an Orchestrator, with a RAG module grounded in OWASP patterns.

Hands — Action Layer

Playwright-powered browser automation: navigation, UI interaction, API request replay, and role/session switching for cross-role testing.

Coordinated by the Access Map — a dynamic knowledge structure recording endpoints, roles, and observed authorization outcomes. The Strategy Agent is augmented with a ChromaDB-backed RAG module for OWASP-grounded decision making.

Experimental Results

Model choice and prompt design both drive outcomes.

Two controlled experiments against OWASP Juice Shop (3 trials per condition) evaluated how model capability and prompt specificity affect IDOR detection reliability and efficiency.

Model	Structured Prompt — Avg. Iterations	Loose Prompt — Avg. Iterations	Key Insight
Haiku-4.5	5.3	0 / 3 runs	Completely dependent on explicit workflow structure — collapses to 0% without it.
Qwen-3.6-plus	3.7	5.7	Maintains 100% success but compensates with ~57% more output tokens under loose prompting.
GPT-5-mini	8.0	4.0	Performs better without explicit steps — the structured prompt introduces unnecessary overhead.

Reliability is not model-tier

Detection success is not guaranteed by model capability alone — prompt structure materially affects outcomes across all tiers.

Light models need scaffolding

Haiku-4.5 dropped from 100% to 0% without an explicit phased workflow — structured prompts are a functional requirement for lower-tier models.

Stronger models may be over-constrained

GPT-5-mini achieved its best results (4.0 iter avg) under a loosened prompt — suggesting structured workflows can introduce unnecessary overhead.

Conclusion

Agentic security testing is a viable direction.

ARGUS demonstrates that a browser-based, multimodal LLM agent can autonomously detect IDOR vulnerabilities with meaningful reliability. The findings confirm that careful co-design of model selection and prompt strategy is essential for deploying such systems in practice.

Future work should prioritize extending beyond read-only GET-based probing to cover PUT, PATCH, and DELETE methods; expanding the vulnerability taxonomy to include RBAC bypass and privilege escalation; and generalizing ARGUS to work across diverse web applications as a configurable, user-facing tool.

Future Directions

Expand IDOR testing to PUT/PATCH/DELETE HTTP methods for unauthorized modification detection

Extend vulnerability taxonomy: RBAC bypass, privilege escalation, mass assignment

Build a re-seeding pipeline for the RAG knowledge base to keep pace with evolving vulnerability patterns

Generalize ARGUS for unseen web applications — single-target to multi-target configuration

Package as a configurable, user-facing tool for security practitioners

Get in Touch

Interested in this research?

Whether you want to collaborate, apply this approach to your application, or discuss findings — we'd like to hear from you.