Security testing is hard. Scaling it is harder. Making it accessible without dumbing it down? That’s the challenge we set out to solve.
After two years building FuzzForge, iterating on feedback from security teams and solving real-world challenges, we’ve learned one thing: the future of security automation isn’t about building the perfect SAST engine or the most autonomous AI pentester. It’s about orchestrating heterogeneous security tools into intelligent, auditable workflows that learn and adapt.
We’re not the 20th vendor claiming autonomous hacking. We’re building transparent, deterministic, composable security automation.
In this series of articles, we’ll walk you through our engineering journey, from problem to solution. This first article explains why we’re building FuzzForge and what our approach is. Follow-up articles will dive into the how: workflow orchestration with Temporal, sandboxing untrusted code, integrating AI without black boxes, and scaling infrastructure.
One clarification upfront: “FuzzForge” suggests fuzzing-only, but the ‘Forge’ is key. We orchestrate security workflows across SAST, fuzzing, dynamic analysis, and custom modules. Complete security pipelines, not just fuzzers.
Vulnerability research takes years to master. It combines deep software internals knowledge, attack pattern expertise, and the creativity to chain isolated issues into exploitable vulnerabilities. Development teams need security validation, but it’s not their core skill. Security experts don’t scale. They’re expensive, scarce, and time-constrained.
The industry faces a fundamental problem: modern teams deploy multiple times per day, yet traditional pentesting happens quarterly due to cost and availability. This leaves applications running untested in production, with attackers often discovering vulnerabilities first.
The market has fractured into specialized approaches, each optimizing for different tradeoffs. Before diving in, some key acronyms:
Four philosophical approaches have emerged:
SAST Tools (Semgrep, CodeQL, SonarQube) prioritize speed and developer experience. Semgrep raised $100M in 2025 validating the “fast feedback, fewer false positives” approach. CodeQL goes deeper with semantic analysis but requires specialized expertise. Fuzzing Platforms like OSS-Fuzz discovered 13,000+ vulnerabilities since 2016 with zero false positives—crashes are proof. They excel at what they do but miss cross-technique workflows and struggle with contextual prioritization. Setup complexity and long campaign times limit adoption.
DevSecOps platforms (Checkmarx, Veracode, Snyk) consolidate multiple techniques into unified dashboards. Snyk pioneered “developer-first security,” raising ~$1.3B and reaching $8.5B peak valuation through aggressive acquisitions (DeepCode, Fugue, Enso, Helios, Probely, Invariant Labs). These platforms offer one-stop-shopping but create vendor lock-in, Gartner notes they’re “jack-of-all-trades” where individual tools may not be best-of-breed. ASPM solutions aggregate findings into prioritized views but don’t orchestrate workflows.
AI-Powered Pentesting has exploded in 2024-2025:
The appeal is clear: AI doesn’t sleep, scales infinitely, and learns every protocol. The concern is equally clear: black-box AI decisions, explainability gaps, and “AI slop” polluting bug bounty programs. Every major vendor positions AI as “Copilot, not Autopilot.” Augmentative rather than fully autonomous.
BAS Platforms (market projected $2.4B by 2029) continuously test defenses by emulating MITRE ATT&CK tactics. Key players: Pentera ($250M raised, $1B unicorn), Picus Security ($80M, 4,000+ threats library), Cymulate ($141M), and AttackIQ ($79M, MITRE’s founding partner). BAS validates whether controls detect threats but runs scripted attacks rather than discovering new vulnerabilities.
PTaaS (Penetration Testing as a Service, market projected $301M by 2029) combines automated scanning with human ethical hackers. Unlike traditional pentesting (3-4 week setup, annual cadence, PDF reports), PTaaS launches in 24-72 hours with real-time dashboards. Key players: Cobalt, Synack (FedRAMP authorized), HackerOne (2M+ ethical hackers), and NetSPI.
Manual Consultants remain the gold standard for deep assessments but don’t scale. Big Four firms dominate enterprise security audits at $2000-4000/man-day, delivering quality but not frequency.
Cyber Reasoning Systems (CRS) represent the holy grail: fully autonomous systems that find vulnerabilities, generate exploits, and patch binaries without human intervention. The concept emerged from DARPA’s Cyber Grand Challenge (2016). ForAllSecure’s Mayhem won, later securing a $45M DoD contract before Bugcrowd acquired it in 2025. The successor AIxCC (2025) showed progress: 86% of synthetic vulnerabilities identified, 18 real zero-days discovered. But here’s the reality: CRS remains largely research and POC, not productized. No commercial product delivers true end-to-end autonomous vulnerability research at scale. This is exactly the space FuzzForge aims to bridge, not by promising magic, but by coordinating proven techniques into practical workflows.
Every approach makes tradeoffs:
Nobody has solved this. Neither have we. But we’ve chosen different tradeoffs based on what security teams actually need.
So how do we address these tensions? That’s what FuzzForge is built to solve.
The core problem remains: tools exist, orchestration doesn’t. That’s what FuzzForge addresses.
The future of security automation isn’t building the best SAST engine or most autonomous AI pentester. It’s coordinating heterogeneous tools into intelligent, auditable workflows that adapt to each organization’s needs.
We don’t believe in magic. We believe in composable, auditable, scalable workflows.
We don’t reinvent Semgrep or AFL. We chain them together.
Each module is an isolated container with standardized inputs/outputs. Users integrate their preferred SAST scanners (Semgrep, CodeQL), fuzzers (AFL, LibFuzzer), dynamic analyzers, and custom tools. The intelligence is in the orchestration: “If Semgrep finds buffer overflow in user_input.c, launch 48-hour AFL campaign on that code path.”
Example workflow: Upload a firmware image → Binwalk extracts filesystem → Ghidra/angr decompiles binaries → Semgrep scans extracted code → AFL fuzzes network services for 72h → QEMU emulation validates crashes → unified vulnerability report with CVE mapping and PoCs. Same orchestration applies to APKs, iOS apps, source repos, or CI/CD pipelines.
FuzzForge supports three usage modes:
Tradeoff: Orchestration complexity vs. flexibility. We avoid NIH syndrome and vendor lock-in but require more setup than single-tool solutions.
Every finding includes full lineage: which tool, what configuration, which test case, when, with what evidence. Temporal-based orchestration ensures reproducibility. Same workflow + same code = identical results.
Contrast this with black-box AI tools where “AI found 15 vulnerabilities” tells you nothing about confidence, false positive likelihood, or how to verify.
Tradeoff: Verbosity vs. transparency. Our logs are detailed, sometimes too detailed. In security, post-incident analysis requires complete information.
We use AI strategically through specialized agents powered by RAG (Retrieval-Augmented Generation), grounding decisions in domain-specific knowledge (CVEs, vulnerability patterns, codebase context) to significantly reduce hallucinations, though we implement additional validation layers for critical security decisions. Details on our agentic architecture in a future article.
Two execution modes depending on context and risk:
Deployment flexibility: Cloud APIs (OpenAI, Anthropic), self-hosted (Ollama, vLLM), or fully on-premise with fine-tuned SLMs or even TRMs for air-gapped environments.
The difference from black-box AI tools: full explainability. Every agent decision shows the model used, retrieved context, reasoning chain, and confidence scores. You can replay any workflow to audit exactly what happened.
Tradeoff: Less “fully autonomous” than some promise, more autonomous than pure advisory tools. We’re in the middle: automated where safe, supervised where critical.
We run untrusted code: malicious samples, vulnerable apps, proof-of-concept exploits. Each runs in isolated containers. A crashing fuzzer doesn’t affect other workflows. Malware in dynamic analysis can’t escape to infrastructure.
From 1 scan to 1000 parallel fuzzing campaigns. Quick SAST (minutes) to long fuzzing (weeks). Resource isolation prevents interference.
Tradeoff: Infrastructure overhead vs. security guarantees. Containers cost more than shared execution, but cross-contamination is unacceptable.
In short: FuzzForge orchestrates multi-technique security workflows with full transparency, modularity, and AI assistance.
FuzzForge is:
FuzzForge is not:
Specialization is advancing, but integration is lagging.
Organizations deploy SonarQube for code quality + Snyk for dependencies + Semgrep for SAST + Pentera for BAS + CAI for offensive testing. Yet they lack unified orchestration beyond ASPM dashboards that aggregate findings into a single view. ASPM solves “too many dashboards” but doesn’t orchestrate workflows.
What’s missing:
The gap is clear: tools exist, orchestration doesn’t. FuzzForge fills that gap.
Workflow-first, not tool-first: Most platforms build around one technique (SAST, AI pentesting) then expand. We started with orchestration and integrate whatever makes sense. An IoT workflow might combine Binwalk (firmware extraction), Ghidra (decompilation), Semgrep (code analysis), AFL (fuzzing), Frida (runtime), and custom protocol analyzers. No single-tool platform covers this.
Bring-your-own-tools AND bring-your-own-AI: Want CodeQL instead of Semgrep? Both in different workflows? Custom SAST for proprietary languages? All supported. Same for AI: use OpenAI APIs, self-host Ollama, or run fine-tuned SLMs on-premise. Contrast with Checkmarx/Veracode (closed ecosystems), Tenzai (cloud-only, unclear integration), or Snyk (optimized for their stack).
On-premise & air-gapped deployment: FuzzForge runs entirely on-premise with support for both local LLMs (Ollama, vLLM, fine-tuned SLMs) and external APIs. For air-gapped environments, organizations can deploy with local LLMs only, ensuring no code leaves their infrastructure.
Auditable AI: When agents run autonomously, you see the full reasoning chain, not just “AI found vulnerabilities.” You can replay any workflow to debug or audit decisions.
Multi-technique integration: Security isn’t SAST or fuzzing. It’s SAST and fuzzing and dynamic analysis. We integrate all three. Example: SAST finds buffer overflow → dynamic confirms it’s reachable → fuzzing generates 10K payloads → runtime monitoring confirms execution. This mirrors how human researchers work.
Flexible execution modes: Quick SAST on commits (minutes), deep fuzzing campaigns (days/weeks), one-shot firmware analysis, or autonomous AI agents running 24/7. Upload a binary, point to a repo, or integrate into CI/CD. Same powerful workflows, different entry points.
We’re not claiming superiority everywhere. SAST tools have faster cold starts and simpler setup. DevSecOps platforms excel at compliance reporting. AI pentesting tools offer more autonomous creative exploitation. BAS platforms focus on control validation with executive-friendly reporting. Manual consultants bring business context and social engineering skills we can’t replicate.
We optimize for: reproducibility, auditability, flexibility, multi-technique integration, scale.
Anyone who wants to find vulnerabilities efficiently:
Not for: Compliance-checkbox-only needs (use DevSecOps platforms), teams expecting fully autonomous pentesting with zero configuration or oversight.
Security automation isn’t about finding the perfect tool. It’s about orchestrating the right tools for each situation.
This article covered why (the security accessibility problem) and what (our workflow orchestration approach). Next we’ll detail the how:
We started development two years ago alongside the DARPA challenge (without being able to participate since we’re French and not US based). We are working toward six demos by February 2025 (IoT, Android, iOS, Rust/Go, open-source), building with a team of ten dedicated engineers. We’re not claiming victory. We’re sharing our journey, reasoning, and tradeoffs.
The security automation landscape is evolving rapidly. We believe there’s space for an approach that prioritizes transparency, flexibility, and orchestration over black-box magic.
Feedback welcome. Interested in collaboration, or contributing modules? Reach out.
FuzzForge Team
Founded in 2021 and headquartered in Paris, FuzzingLabs is a cybersecurity startup specializing in vulnerability research, fuzzing, and blockchain security. We combine cutting-edge research with hands-on expertise to secure some of the most critical components in the blockchain ecosystem.
Contact us for an audit or long term partnership!
| Cookie | Duration | Description |
|---|---|---|
| cookielawinfo-checkbox-analytics | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics". |
| cookielawinfo-checkbox-functional | 11 months | The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional". |
| cookielawinfo-checkbox-necessary | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
| cookielawinfo-checkbox-others | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other. |
| cookielawinfo-checkbox-performance | 11 months | This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance". |
| viewed_cookie_policy | 11 months | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |