6G cognitive radio security · Shujaatali Badami

Cognitive radio is a partially-observable Markov decision process (POMDP). The agent never sees the full spectrum state, only sensed slices. Securing the resulting deep-RL controller is harder than securing a deterministic radio because the attack surface is the policy itself, not just the protocol stack. My 2025 work on AI-driven green cognitive radio (Bentham CYBPRO) achieves 25 to 30 percent energy savings at 0.93 sensing AUC by treating spectrum efficiency and power efficiency as a single composite objective.

Why "cognitive" radio at all

5G already uses massive MIMO and beamforming. 6G goes further: dense heterogeneous deployments, terahertz bands, ultra-dynamic spectrum sharing, integrated sensing and communication, and orders-of-magnitude more devices per square kilometre. A statically-configured radio cannot keep up. The next-generation radio learns: senses the spectrum, predicts primary-user activity, allocates dynamically.

That makes it a POMDP. The radio observes spectrum slices, not the underlying state. Decisions are sequential. Rewards are delayed. Deep reinforcement learning is the natural tool.

Where security gets weird

Securing a static radio means securing the protocol stack: authentication, key exchange, ciphers, integrity. Standard cryptography work. PQ-EDHOC, AES-GCM, the rest.

Securing a learning radio means all of that plus securing the policy. Adversarial inputs to the spectrum sensor can poison the policy gradient. Reward shaping can be gamed. The agent's exploration policy can leak information about secondary-user identity. The attack surface is no longer just the protocol stack: it is the controller's induction.

The energy-aware reward insight

The default reward in cognitive radio is spectrum efficiency. Bits-per-Hz, throughput, fairness. Energy comes in as a constraint, not an objective. That works for plug-in deployments but not for battery- or solar-powered 6G IoT, which is a large fraction of the projected device population.

The 2025 Bentham CYBPRO paper treats energy as a first-class component of the reward, with an explicit Pareto coefficient the controller can move along at runtime. Result: 25 to 30 percent energy savings at 0.93 sensing AUC, where the equivalent fixed-allocation baseline either burns more power or misses primary-user activity.

The POMDP through-line to the rest of my work

The exploration-vs-exploitation tradeoff under partial observability is the same problem that shows up in my IEEE ICEACE 2024 sole-author paper on Compressed Suffix Memory. Suffix-tree-based instance methods give you a state abstraction that scales to non-trivial POMDPs without the exponential state explosion of vanilla USM, and Boltzmann sampling beats epsilon-greedy when Q-values are themselves noisy estimates over latent states. See the CSM research deep-dive.

The same instinct shows up in the energy-aware cognitive radio paper. Same POMDP playbook, different payoff structure.

What is next

Adversarial-robustness benchmark for the policy itself, not just the protocol stack.
Hybrid PQC-protected telemetry feeding the controller, so the input channel is at least cryptographically protected even when the policy is not.
Federated training across edge nodes, the IoT angle that ties this to IoT PQ-EDHOC and the smart-meter PQC stack.

6G cognitive radio security, from where I sit.

Why "cognitive" radio at all

Where security gets weird

The energy-aware reward insight

The POMDP through-line to the rest of my work

What is next

Talk to me about this

Related topics