Safe-Reinforcement Learning for Wireless Resource Allocation

Investigators: Gustavo de Veciana (ECE, UT Austin), and Sanjay Shakkottai (ECE, UT Austin)
Students and Participants :

Jianhan Song (PhD student)
Geetha Chandresakeran (PhD student)
Kartik Patel (PhD student)
Isfar Tariq(PhD student)
Rajat Sen (Ph.D. currently at Amazon)
Arjun Anand (Ph.D. currently at Intel)
Additional collaboration with researchers at AT\&T T. Novlan, S.Akoum, and M. Majumndar.

Support: This project is supported by the National Science Foundation Award (CNS-1910112)

Goal: Next generation wireless networks are being engineered to meet a complex mix of application requirements, from traditional mobile broadband (e.g., web browsing, video streaming) to new emerging applications (e.g., augmented reality, self-driving cars, industrial automation, robotics) with heterogeneous much more stringent reliability-latency requirements. The ability to support these requirements is salient to the potential to deliver the new business models and new revenue streams that would enable deployments of the new technology. Thus, wireless scheduling and resource allocation will take center stage in terms of enabling technologies for such networks. Meanwhile reinforcement learning (RL) using deep networks has emerged as a powerful framework to devise polices that optimize complex systems' performance (including wireless systems); however, these usually do not come with any formal guarantees. The central thesis of this proposal is that RL based resource allocation policies without operational guarantees, e.g., throughput-optimality/stability, are unlikely to be accepted and/or deployed, thus a key requirement to make these techniques usable, is to develop approaches which ensure safety guarantees. The proposed research effort will advance the state-of-the-art in safe reinforcement learning, with specific applications to wireless systems, but also are expected to benefit other application domains as well as society more broadly, through planned efforts in education, innovation, achieving diversity, engaging the community and industry, and disseminating results to a wider public.

This proposal centers on the development and analysis of a safe reinforcement learning (Safe-RL) framework, which optimizes rewards over short-time scales, but also provides theoretically strong long-term throughput-optimality guarantees for state-of-art wireless scheduling algorithms. The key underlying observation is that many of today’s scheduling algorithms derive their performance guarantees from Lyapunov analysis. We propose the use of guardrails, constraints on the state-dependent actions of Safe-RL, that guarantee that the wireless system’s Lyapunov evolution stay within a bounded perturbation of classical algorithms. This guarantee, in turn, ensures that Safe-RL has safety/stability properties of state-of-the-art schedulers, while leveraging RL to realize complex performance tradeoffs. The proposed research lies in three inter-related thrusts.

Thrust 1 develops the foundations and representations for safe-RL at the core of this proposal, along with a theoretical basis for safety guarantees and new classes of efficient learning for wireless system network abstractions.
Thrust 2 focuses on the application of safe-RL theory to wireless resource allocation, including addressing challenges associated with joint scheduling of real-time and broadband traffic, learning and exploiting traffic patterns, and an exploration of the degree to which a policy hits guardrails as an indication of system anomalies or need for re-optimization.
Thrust 3 centers on the challenging but necessary task of validating the proposed safe-RL framework leveraging an industrial strength multi-cell simulator.

Broader impact and activities

G. de Veciana presented invited talks at IEEE WIOPT Workshop on Machine Learning in Wireless Communications (WMLC), IISC, Intel.
S. Shakkottai presented invited talk at IEEE WIOPT Workshop on Machine Learning in Wireless Communications (WMLC)

Publications to date

Job Dispatching Policies for Queueing Systems with Unknown Service Rates,
Tuhinangshu Choudhury, Gauri Joshi, Weina Wang, and Sanjay Shakkottai. In MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, , July 2021.

MmWave Codebook Selection in Rapidly-Varying Channels via Multinomial Thompson Sampling,
Yi Zhang, Soumya Basu, Sanjay Shakkottai, and Robert Heath. MobiHoc '21: Proceedings of the Twenty-second International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, July 2021.

Online Learning for Hierarchical Scheduling to Support Network Slicing in Cellular Networks,
J.Song, G. de Veciana, and S. Shakkottai To Appear Performance Evaluation pages 1-22, December 2021.

Meta-scheduling for the wireless downlink through learning with bandit feedback.
J.Song, G. de Veciana, and S. Shakkottai Accepted to IEEE/ACM Transactions on Networking, pages 1-19, June 2020. Extended version

Auto-tuning for cellular scheduling through bandit-learning and low-dimensional clustering.
I. Tariq, R. Sen, T. Novlan, S.Akoum, M. Majumndar, G. de Veciana and S. Shakkottai IEEE/ACM Transactions on Networking, pages 1-20, April 2020. Submitted.

Online channel-state clustering and multiuser capacity learning for wireless scheduling
I. Tariq, R. Sen, G. de Veciana, and S. Shakkottai In Proc. IEEE INFOCOM, pages 1-9 , April 2019.
Joint scheduling of URLLC and eMBB traffic in 5G wireless networks
A. Anand, G. de Veciana, and S. Shakkottai, IEEE/ACM Transactions on Networking 28(2):1-15, April 2020.