Research Statement

Prof. Brian L. Evans

bevans@ece.utexas.edu

02/22/17

1.0 Introduction

My research and teaching interests are in processing of signals to (1) increase speed/reliability in communication systems and (2) improve visual quality during image/video acquisition and display. To achieve these goals, my research group derives theory, algorithms, simulations and real-time prototypes. In deriving algorithms, we keep implementation constraints in mind. That is, we keep in mind that our algorithms will ultimately be implemented in fixed-point data and arithmetic on targets that are constrained in memory size and memory input/output rates. We gather our algorithms, along with other leading algorithms, in freely distributable toolboxes in MATLAB that we release on the Internet. We also implement our algorithms in software and hardware in embedded prototypes and full-system testbeds. We evaluate tradeoffs in application performance vs. implementation complexity, first at a coarse level using desktop simulation, and then at a fine level using embedded targets. Targets include digital signal processors, x86 processors and field programmable gate arrays (FPGAs).

My research group also develops system-level electronic design automation methods and tools for multicore embedded systems. A fundamental problem in multicore systems is the conflict between concurrency and predictability. To solve this conflict, we abstract the representation of software by using formal models of computation. We use the Synchronous Dataflow model and extend the Process Network model. Both models guarantee deadlock-free execution that will give the same results whether the program is run sequentially, across multiple cores, or across multiple processors. Both models are well suited for streaming discrete-time signal processing algorithms for baseband communications as well as speech, audio, image and video applications.

2.0 Digital Communications Systems

2.1 DSL Communication Systems

Orthogonal Frequency Division Multiplexing (OFDM) forms each symbol via an inverse fast Fourier transform (FFT). The symbol is periodically extended by copying the last few samples to the front of the symbol, which is known as the cyclic prefix. The receiver often applies a channel shortening filter to reduce the effective channel impulse response to be no longer than the cyclic prefix. This allows frequency equalization to be performed in the FFT domain to reduce complexity.

Digital Subscriber Line (DSL) communications seeks to achieve data rates as high as 22 Mbps over the twisted-pair copper wiring used to support legacy phone systems to provide Internet service to homes and small offices. In Asymmetric DSL (ADSL) receivers, a channel shortening filter can increase the bit rate by 16x over not using one, for the same bit error rate. For ADSL, we developed the first channel shortening training method that maximizes a measure of bit rate and is realizable in real-time fixed-point software. Our algorithm doubled bit rate over the best training method at the time and only required a change of software in existing receivers. We also developed a dual-path channel shortening structure, which increased bit rate by another 20%. (More info)

We designed and implemented a testbed to empower designers to evaluate and visualize tradeoffs in communication performance vs. implementation complexity at the system level. The testbed uses a type of OFDM known as discrete multitone (DMT) modulation as found in ADSL systems, and has two transmitters and two receivers. The 2x2 DMT testbed can execute in real time using National Instruments embedded hardware over physical cables, or on the PC using cable models. Baseband processing for the physical and medium access control layers is in C++ and runs on an embedded x86 dual-core processor. The baseband code contains multiple algorithms for each of the following structures: peak-to-average power ratio reduction, echo cancellation, equalization, bit allocation, channel shortening, channel tracking and crosstalk cancellation. Crosstalk cancellation gives 90% of the gain in bit rate. The sponsor deployed the testbed in the field. (More info)

2.2 Wi-Fi and Smart Grid Communications

In unlicensed frequency bands, communication speed and reliability are limited by interference instead of thermal noise. Interference comes from communication services as well as non-communication electronic equipment. In smart grid communications over power lines, interference from switching power supplies can be 40-50 dB greater than thermal noise in the 3-500 kHz unlicensed band. In wireless smart grid and Wi-Fi communications, operating microwave ovens sweep up and down the 2.4 GHz unlicensed band.

Supported by Intel and NI, we developed statistical models of interference from Wi-Fi networks and clusters of Wi-Fi networks. Based on the models, we developed Wi-Fi receiver methods to double bit rates (or reduce bit error rates by 10x) in the presence of strong interference. (More info)

Supported by Semiconductor Research Corporation (SRC), with liaisons IBM, NXP, and TI, we derived statistical models of interference for communication over power lines, as used in smart grid infrastructure for local utilities. The IEEE 1901.2 powerline communication standard adopted our models. Based on the models, we developed communication receiver methods to quadruple bit rates (or reduce bit error rates by 100x) in the presence of strong interference. We validated the methods in a real-time testbed, which led to a student paper award at the 2013 Asilomar Conf. Signals, Systems & Comp. (best in track; second best overall). Receiver methods are standard-compliant but high in complexity. We developed joint transmitter-receiver designs to reduce complexity by 10x and achieve similar performance. One of those methods won the Best Paper Award at the 2013 IEEE Int. Symp. Power Line Communications. (More info)

Supported by a second SRC contract, again with liaisons TI and NXP, we developed with researchers at UT Dallas methods for smart grid communications that simultaneously transmit the same data over power lines and the 900 MHz unlicensed band. We have demonstrated a reduction of 10-100x in bit error rate. We have also validated the approach in a real-time testbed using NI hardware and software. (More info)

2.3 Cellular Communications

In licensed frequency bands, cellular communications research seeks ways to meet the annual 2-3x worldwide increase in data demand. At the same time, cellular infrastructure companies and service providers seek to reduce capital and operating costs to offset decline in monthly fees for service contracts.

For cellular basestations, we developed the first algorithm to allocate subcarrier frequencies and power to multiple users that optimizes bit rates, has linear complexity, and is realizable in fixed-point hardware/software. These basestations transmit to all users at the same time by using a distinct subset of subcarrier frequencies for each user. The subsets are not necessarily contiguous. Optimal allocation of user subcarrier frequencies and power requires mixed-integer programming, which is computationally intractable for common scenarios (e.g. 1536 carrier frequencies and 30 users). Our algorithms are available for continuous and discrete rates, and apply to perfect or partial knowledge of channel state. Prior to our breakthrough, engineers relied on heuristics with quadratic complexity for sub-optimal resource allocation. (More Info)

Another line of research to increase data rates and reduce operating costs in dense urban cellular networks is to aggregate computing at each of 8-10 nearby basestations into one supercomputing node, a.k.a. a cloud radio access network. The aggregation allows scalability of computing nodes with customer demand, eases maintenance, and reduces energy costs. Energy costs can account for 12% of operating costs.

An enabling technology is compression of the cellular signals at each basestation to reduce costs in transport to/from the supercomputer. Supported by Huawei, our first method achieved 3x compression for a single antenna with less than 2% loss in signal quality. Our second method achieved 8x compression using multiple antennas and increased signal quality at the same time. This was a dramatic breakthrough. (More info)

3.0 Digital Image/Video Processing

3.1 Image Printing and Display

Image halftoning algorithms reduce image resolution in intensity and color to match those of the display. Examples include rendering a 24-bit color image on a 12-bit color display, or an 8-bit grayscale image on a binary device such as a reflective screen. One way to achieve the illusion of higher resolution is to push the quantization error at each spatial location and for each appropriate color channel into high frequencies where the human visual system is less sensitive. One such method, error diffusion, filters the quantization error at a pixel and feeds the result to unquantized pixels.

For color halftoning by error diffusion, we have developed a unified theoretical framework, methods to compensate for the image distortion it induces, and methods for halftone quality assessment. The framework linearizes color error diffusion by replacing the color quantizer with a matrix gain plus an additive uncorrelated noise source. We then apply linear methods to compensate for image distortion, including vector-valued prefiltering to invert the signal transfer function and vector-valued adaptive filtering to reduce the visibility of color quantization noise. We compensate for false textures in the halftone (i.e. textures that are not visible in the original) by replacing the quantizer with a lookup table that flips the outcome near threshold values. All compensation methods have low enough complexity to be incorporated into a commercial printer or display driver. (More info)

3.2 Video Display on Handheld Devices

In fall 2010, my research group developed new halftoning algorithms for displaying video on handheld devices with reduced grayscale resolution, such as e-readers. Halftoning achieves the illusion of higher resolution by pushing the quantization error at each spatial location into spatial frequencies where the human visual system is less sensitive. For display of 8-bit grayscale video on 1-bit black/white displays, we assessed and compensated two key perceived temporal artifacts of dirty window effect and flicker. (More info)

3.3 Video Acquisition on Smart Phones

The quality of videos acquired by smart phone cameras is severely affected by unintentional camera motion, such as up-and-down motion caused by walking or jitter caused by hand shake, as well as rolling shutter effects. To reduce weight, size, and cost, smart phone cameras do not have hardware shutters. Instead, the matrix of light sensors is read out and reset row-by-row, which is known as a rolling shutter. Rolling shutter effects occur due to fast camera motion and include skew, smear and wobble distortion.

Rolling shutter effect rectification and video stabilization consist of (1) camera motion estimation, (2) camera motion regeneration, and (3) frame synthesis. With support from TI, we developed a video rectification/stabilization algorithm for a handheld platform by fusing gyroscope measurements and video analytics. First, we estimate camera motion for each row. Second, we smooth the sequence of camera motions over all frames. Last, we synthesize frames based on the difference between original and regenerated camera motion. We developed a smart phone app and Matlab software that runs at 7 frames/s. The approach is feasible for real-time implementation on a smart phone. The work won a Top 10% Paper Award at the 2012 IEEE Int. Work. Multimedia Sig. Proc. Online demonstrations are available. (More info)

3.4 Visual Quality Assessment

Automated visual quality assessment (VQA) of pictures can accelerate design of image acquisition, compression and display algorithms. Ubiquitous standard dynamic range (SDR) images provide 8 bits per color per pixel. High dynamic range (HDR) images, which can be captured by smart phones and digital cameras, enhance the range of luminance/chrominance values by using 16 or 32 bits per color per pixel.

For synthetic SDR scenes and natural HDR images, we have designed and released public databases, conducted subjective VQA experiments, evaluated VQA algorithms, and proposed no-reference VQA algorithms. No-reference means that the processed/compressed image is available but the original source image is not available for a comparison. This matches the more common use case of taking pictures and browsing pictures online. For the HDR image database, we also conducted the first large-scale subjective study using the Amazon Mechanical Turk crowdsourced platform to gather 300,000+ opinion scores on 1,800+ images from 5,000+ unique observers, and compared those results against VQA algorithm results. Among no-reference VQA algorithms, those based on scene statistics have the highest correlations with human visual quality scores for synthetic SDR and natural HDR images. One of our synthetic SDR image quality papers received a Top 10% Paper Award at the 2015 IEEE Int. Conf. Image Processing. (More info)

4.0 System-level Electronic Design Automation Tools

4.1 System on Chip Design

We automate the mapping of streaming signal processing tasks onto multicore processors to achieve high-throughput, low-latency and real-time performance. We model tasks using the Synchronous Dataflow (SDF) model of computation. An SDF program is represented as a directed graph, in which edges are first-in first-out queues of bounded size. Each node in the graph is enabled for execution when enough data values are available on each input. When node completes its execution, the data values produced on each output edge are enqueued. We address simultaneous partitioning and scheduling of SDF graphs onto heterogeneous multicore platforms to optimize throughput, latency and cost. We generate Pareto tradeoff curves to allow a system engineer to explore design tradeoffs in possible partitions and schedules. Case studies include an MP3 decoder.

4.2 Scalable Software Framework

We realize high-throughput, scalable software on multicore processors by extending the Process Network (PN) model of computation. A PN program is represented as a directed graph, in which nodes are concurrent processes and edges are first-in first-out queues. Nodes map to threads. PN guarantees predictability of results regardless of the rates or order in which processes execute. Thus, correctness of a program does not depend on the use of explicit synchronization mechanisms, such as mutual exclusion. In PN, a queue could grow without bound. Our Computational PN (CPN) framework schedules programs in bounded memory when possible. To increase throughput, CPN decouples input/output management in the queues from computation in the nodes. C++ programs in our CPN framework automatically scale to multiple cores via thread scheduling by an operating system, such as Linux. The same CPN program can run on a single core or multiple cores, without any change to the code. Case studies include a 3-D beamformer. (More info)

5.0 Brief Biography

Dr. Brian L. Evans is the Engineering Foundation Professor of Electrical and Computer Engineering at The University of Texas at Austin. He earned his B.S.E.E.C.S. (1987) degree from the Rose-Hulman Institute of Technology, and his M.S.E.E. (1988) and Ph.D.E.E. (1993) degrees from the Georgia Institute of Technology. From 1993 to 1996, he was a post-doctoral researcher at the University of California, Berkeley. In 1996, he joined the faculty at UT Austin.

Prof. Evans was elevated to IEEE Fellow "for contributions to multicarrier communications and image display". He has published more than 230 refereed conference and journal papers, and graduated 27 PhD and 10 MS students. He has received three teaching awards and three top paper awards, as well as a 1997 US National Science Foundation CAREER Award. (Overview slides)


Mail comments about this page to bevans@ece.utexas.edu.