Low-cost drones have changed the threat landscape faster than defenses can keep up. Counter-UAS, the ability to detect, track, and respond to unauthorized drones, is becoming essential for protecting critical infrastructure, securing airspace around events, and military applications. The systems that do this well need to fuse perception, coordination, and real-time decision-making in ways that don't fall apart at the seams.
These systems operate in highly dynamic environments and require a high degree of assurance. Hence, there is a pressing need to identify their failure cases, such as missed detections or insufficient area coverage. However, most existing work evaluates perception, tracking, or planning components in isolation, but in real deployments the dominant failures arise from their interaction. I built this system to understand where counter-UAS autonomy actually breaks in practice. I wanted a setup where I could rapidly prototype a full counter-UAS pipeline, observe its behavior end-to-end, and diagnose failure modes as they emerge from the coupling between perception, motion, and multi-agent coordination.
Counter-UAS is a systems problem
Simulating multiple agents, running detection and localization in the loop, and tracking shared state quickly introduces integration overhead. When iteration time is limited, progress is constrained less by individual algorithms and more by how easily the system can be composed, observed, and adjusted. I needed a unified way to prototype across simulation, perception, coordination, and visualization.
After considering a few options, I decided to build a multi-drone detection and localization system using GRID. With GRID I could start building locally, scale to the cloud, drop in cutting-edge detection models, and run fully asynchronous multi-agent control with ease.
Building a multi-drone detection and localization system
At a high level, the system looks like this:
- AirGen for multi-drone simulation
- Out-of-the-box vision detection models
- Thread-safe multi-agent control for patrol and intrusion behaviors
- Flask + OpenLayers for a live geospatial dashboard
- Rerun for internal 3D and perception debugging
.png)
All agents share a single simulation world, but operate independently, each running detection, localization, and reporting results to a shared map.
I started by setting up a geographically anchored AirGen environment with multiple drones:
- Defender drones on patrol
- A single intruder drone entering the airspace
Each drone is spawned with a known offset, allowing all local coordinates to be mapped into a shared NED frame, and later into longitude/latitude for visualization.
With the simulation in place, I defined meaningful multi-agent behavior. Each defender drone was assigned an independent patrol pattern designed to maximize airspace coverage while avoiding overlap with other agents. GRID made it straightforward to express these behaviors through APIs, which meant I could iterate on patrol logic and coordination patterns without getting bogged down with low level controller details.
The patrol followed a pentagon-shaped trajectory, with each drone starting at a different phase offset. This ensured spatial diversity while keeping the behavior simple and interpretable. Importantly, patrol and perception were not treated as separate phases. Detection ran continuously while each drone was in motion, reflecting how a real-world system would operate. Being able to wire this up quickly in Python made it easy to experiment with different patrol geometries and immediately see their impact on coverage.
From a systems perspective, each drone behaved autonomously, but all agents shared access to the same simulation world. Thread-safe coordination ensured consistent state updates without sacrificing parallelism.
Running vision-based detection with GRID
Once the drones were patrolling, perception became the central focus. Instead of training or integrating custom models, I used GRID to access out of the box object detection models directly on the simulated camera feeds.
Open-vocabulary detection allowed the system to search for intruding drones using natural language prompts, making the perception layer flexible and easy to adapt. This removed the typical overhead of dataset preparation, model selection, and deployment.
By treating perception as an on-demand capability rather than a pipeline to maintain, I could focus entirely on how detections were consumed by the system. GRID effectively turned complex vision models into modular building blocks, enabling rapid iteration on higher-level behavior without touching model infrastructure. Additionally, optimized model inference for multiple drone feeds simultaneously was another crucial factor behind picking GRID’s ready-to-use models.
Estimating 3D positions from visual detections
Raw detections are not actionable unless they are grounded in space. To make detections useful for counter-UAS, each 2D bounding box had to be converted into a 3D position estimate.

For every detection, the system combined camera geometry, estimated depth, and the observing drone’s pose to infer the intruder’s location in the shared world frame. Only the highest-confidence detection was used at each step to keep the localization signal stable and interpretable. In the first iteration of the solution, I used all detected drones which led to friendly drones being identified as intruders. In the second iteration, after localization i filtered out predicted intruder drone positions that coincided with known friendly drone positions. This filtering step helped in more reliable intruder localization.
Visualizing live geospatial system behavior for debugging
To make the system observable and debuggable, all agent states and localization estimates were exposed through a lightweight web interface using Flask. The geoanchored map view was created using OpenLayers.
Each drone continuously published its position and orientation, while localization results were streamed as georeferenced points with associated confidence and depth metadata. These signals were rendered on a live map, providing an operational view of patrol behavior and intruder tracking.
Observing the system in action: patrols dispersing across the airspace, detections triggering in real time, localization estimates converging on the intruder; turned debugging into iterative refinement rather than blind troubleshooting. That observability, combined with how readily the pieces composed, is what made this project tractable. GRID provided simulation, state-of-the-art detection, and asynchronous multi-agent control as modular building blocks rather than separate systems to integrate and maintain. The result was more time spent on patrol geometry and localization logic, less on infrastructure. For counter-UAS, where failures emerge from interactions between components rather than any single subsystem, that composability is essential.