Skip to main content
Home/Notes/Cognitive radio spectrum crisis

Note · 2026-02-14

How cognitive radio networks are solving the spectrum crisis

Static spectrum allocation broke a decade ago. The fix is not more spectrum, it is a radio that learns when to talk and when to listen.

The wireless spectrum is not running out. Most licensed bands sit empty most of the time. The problem is that the device with the licence is not the device that needs the bandwidth. Cognitive radio fixes this with three moves: sense who is using the spectrum, predict who will use it next, allocate dynamically. Deep RL handles the prediction and allocation under partial observability, the actual technical name for "the radio cannot see everything". The 6G use case is where this stops being academic.

The crisis is not what you think

The headline framing is a spectrum shortage. The reality is a spectrum allocation problem. FCC measurement campaigns from the early 2010s already showed that most licensed bands sit empty 70 to 90 percent of the time depending on geography and time of day. The frequencies are not used up. They are reserved for someone who is not using them right now.

Cognitive radio is the proposal that you let other devices use those frequencies opportunistically, as long as they yield instantly when the licence-holder needs them. Two roles. The primary user owns the licence. The secondary user borrows the bandwidth, with no right to interfere.

Why static allocation broke

Three pressures broke the old model. First, IoT device count exploded. Tens of billions of devices need slivers of bandwidth, briefly, often. Static allocation cannot scale to that. Second, video and AR/VR pushed average bandwidth per session into the tens of megabits. Cellular networks ran out of contiguous 5G spectrum almost immediately on launch. Third, machine-to-machine traffic patterns are nothing like human-to-internet patterns. Bursty, location-correlated, latency-sensitive in unpredictable ways.

What cognitive radio actually does

Three steps in a loop:

  • Sense, the secondary user listens to the spectrum, classifies which slices are in use right now.
  • Predict, given the sensed state and history, estimate which slices will stay free for the next few seconds.
  • Allocate, transmit on the predicted-free slices, monitor for primary-user return, yield instantly if detected.

The interesting part is that the agent never sees the full spectrum state. It only sees what its sensor can detect, which is a narrow slice with noise and missed detections. Formally this is a POMDP, a partially-observable Markov decision process.

Why deep RL beats classical control here

Classical spectrum-allocation algorithms (auctions, game-theoretic optimisations, water-filling) assume the agent knows the state. POMDPs break those assumptions. Deep RL handles partial observability natively, learns from sensed slices, and can scale to high-dimensional observation spaces (cooperative sensing across multiple secondary users).

The trade-off is sample efficiency. Deep RL needs a lot of training data, which in cognitive radio means a lot of simulated spectrum scenarios. The 2025 Bentham CYBPRO paper I co-authored uses a deep-Q architecture with energy as a first-class component of the reward, achieving 25 to 30 percent energy savings at 0.93 sensing AUC.

Where 6G makes this real

5G already pushed dynamic spectrum sharing in concept. 6G makes it foundational. Three reasons:

  • Terahertz bands. Massive raw bandwidth, terrible propagation. Static allocation in terahertz is even more wasteful than in sub-6 GHz.
  • Integrated sensing and communication. The radio is also a sensor. The sensing function and the communication function share spectrum.
  • Dense heterogeneous deployments. Picocells, smallcells, IoT clusters, drones, satellite backhaul, all in the same area. The static allocation table that worked for 4G macrocells does not work here.

Where the security layer matters

A cognitive radio that learns is also a cognitive radio that can be poisoned. Adversarial inputs to the sensor can mislead the policy. Reward shaping can be gamed by malicious primary-user impersonators. The exploration policy can leak information about secondary-user identity. None of these are theoretical, all are well-documented in the adversarial-RL literature.

That is why the security layer is part of the design, not an afterthought. See the 6G cognitive radio security topic page for the standing perspective and the research deep-dive for full methodology.

What is next

Three open problems worth watching: federated training across edge nodes (so the cognitive policy improves without centralising training data), adversarial-robustness benchmarks for the policy itself, and PQC-protected telemetry feeding the controller. The third one is where this thread crosses into post-quantum cryptography and IoT PQ-EDHOC.

Related

This article was originally published on Medium. The canonical version lives here.