Safe-Reinforcement Learning for Wireless Resource Allocation

Investigators: Gustavo de Veciana (ECE, UT Austin), and Sanjay Shakkottai (ECE, UT Austin)
Students and Participants :


Support: This project is supported by the National Science Foundation Award (CNS-1910112)

Goal: Next generation wireless networks are being engineered to meet a complex mix of application requirements, from traditional mobile broadband (e.g., web browsing, video streaming) to new emerging applications (e.g., augmented reality, self-driving cars, industrial automation, robotics) with heterogeneous much more stringent reliability-latency requirements. The ability to support these requirements is salient to the potential to deliver the new business models and new revenue streams that would enable deployments of the new technology. Thus, wireless scheduling and resource allocation will take center stage in terms of enabling technologies for such networks. Meanwhile reinforcement learning (RL) using deep networks has emerged as a powerful framework to devise polices that optimize complex systems' performance (including wireless systems); however, these usually do not come with any formal guarantees. The central thesis of this proposal is that RL based resource allocation policies without operational guarantees, e.g., throughput-optimality/stability, are unlikely to be accepted and/or deployed, thus a key requirement to make these techniques usable, is to develop approaches which ensure safety guarantees. The proposed research effort will advance the state-of-the-art in safe reinforcement learning, with specific applications to wireless systems, but also are expected to benefit other application domains as well as society more broadly, through planned efforts in education, innovation, achieving diversity, engaging the community and industry, and disseminating results to a wider public.

This proposal centers on the development and analysis of a safe reinforcement learning (Safe-RL) framework, which optimizes rewards over short-time scales, but also provides theoretically strong long-term throughput-optimality guarantees for state-of-art wireless scheduling algorithms. The key underlying observation is that many of today’s scheduling algorithms derive their performance guarantees from Lyapunov analysis. We propose the use of guardrails, constraints on the state-dependent actions of Safe-RL, that guarantee that the wireless system’s Lyapunov evolution stay within a bounded perturbation of classical algorithms. This guarantee, in turn, ensures that Safe-RL has safety/stability properties of state-of-the-art schedulers, while leveraging RL to realize complex performance tradeoffs. The proposed research lies in three inter-related thrusts.

Broader impact and activities


Publications to date

Job Dispatching Policies for Queueing Systems with Unknown Service Rates,
Tuhinangshu Choudhury, Gauri Joshi, Weina Wang, and Sanjay Shakkottai.   In MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, , July 2021.


MmWave Codebook Selection in Rapidly-Varying Channels via Multinomial Thompson Sampling,
Yi Zhang, Soumya Basu, Sanjay Shakkottai, and Robert Heath.   MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, July 2021.


Online Learning for Hierarchical Scheduling to Support Network Slicing in Cellular Networks,
J.Song, G. de Veciana, and S. Shakkottai   To Appear Performance Evaluation pages 1-22, December 2021.


Meta-scheduling for the wireless downlink through learning with bandit feedback.
J.Song, G. de Veciana, and S. Shakkottai   Accepted to IEEE/ACM Transactions on Networking, pages 1-19, June 2020. Extended version


Auto-tuning for cellular scheduling through bandit-learning and low-dimensional clustering.
I. Tariq, R. Sen, T. Novlan, S.Akoum, M. Majumndar, G. de Veciana and S. Shakkottai   IEEE/ACM Transactions on Networking, pages 1-20, April 2020. Submitted.


Online channel-state clustering and multiuser capacity learning for wireless scheduling
I. Tariq, R. Sen, G. de Veciana, and S. Shakkottai   In Proc. IEEE INFOCOM, pages 1-9 , April 2019.
Joint scheduling of URLLC and eMBB traffic in 5G wireless networks
A. Anand, G. de Veciana, and S. Shakkottai,  IEEE/ACM Transactions on Networking 28(2):1-15, April 2020.