Haris Vikalo – Selected projects

  1. Structured low-rank matrix factorization with applications to genomics.

    In many problems, one seeks to represent a data matrix by the product of two matrices: one capturing meaningful information and the other specifying how that information is combined to generate the data. We consider the matrix decomposition that arises in the context of discovery of genetic variations; there, the observed matrix contains noisy samples of the product of an informative matrix having discrete-valued entries and a matrix with rows that are standard unit basis. We developed and analyzed convergence of structurally-constrained iterative algorithms for finding this decomposition. Extensive benchmarking demonstrates superiority of the proposed framework over state-of-the-art techniques.


  2. Supervision and smoothness in clustering.

    In many applications, we are given partial or uncertain relations between objects that need to be clustered. We developed soft-constraint semi-supervised affinity propagation (SCSSAP) algorithm which adds supervision to affinity propagation (AP) without strictly enforcing instance-level constraints. This formulation is particularly advantageous in the presence of noisy labels or noisy constraints since the penalty parameter of SCSSAP can be tuned to express our confidence in instance-level constraints. In the presence of label and constraint noise, SCSSAP outperforms both AP and other existing semi-supervised clustering algorithms.


  3. Base-calling algorithms for next-generation DNA sequencing systems.

    Novel sequencing technologies will provide significant improvements in many aspects of human condition, ultimately leading towards the understanding, diagnosis, treatment, and prevention of diseases. Motivated by the need for high-performing sequencing systems, we have studied the problem of identifying the order of nucleotides from noisy measurements – the so-called base-calling problem. The latest high- throughput next-generation sequencing platforms are capable of generating tens of millions of reads in parallel and hence, in addition to being accurate, base-calling algorithms need also be computationally efficient and scalable. In our work, we have explored the trade-off between accuracy and speed of base-calling and developed several algorithms characterized by different accuracy/complexity features. Selected publications:

  4. Signal processing for real-time biosensor arrays.

    Real-time DNA microarrays observe the dynamics of binding between molecular targets and the biosensing elements that capture them. The paradigm shift in data acquisition, from measuring a single steady-state data point in conventional microarrays to measuring full dynamics of the binding process in a real-time microarray, requires the development of novel estimation techniques. We have shown that molecular binding in real-time microarrays can be modeled as a continuous-time Markov process that, under some mild and practical assumptions, can be approximated by a diffusion process ( i.e., described by a stochastic differential equation). Therefore, estimation of the number of target molecules from the acquired signal can be posed as the problem of inferring parameters of a discretely sampled diffusion process. Selected publications:

    • M. Shamaiah, X. Shen, and H. Vikalo, "Estimating parameters of sampled diffusion processes in affinity biosensors," IEEE Transactions on Signal Processing, vol. 60, no. 6, June 2012, pp: 3228-3239.[Inferring parameters of a discretely sampled diffusion process is challenging because the transition density of the underlying diffusion process is not available in closed form. In this paper, we approximate the transition density using Hermite polynomial expansion, and develop a sequential Monte Carlo parameter estimation scheme. Experimental results demonstrate that real-time biosensing enables significant performance improvements compared to the conventional technology.]

    • A. Hassibi, H. Vikalo, J. L. Riechmann, and B. Hassibi, "Real-time DNA microarray analysis," Nucleic Acids Research, vol. 37, no. 20, 2009, e132:1-12.[This paper presents novel affinity biosensing platform that estimates amounts of target molecules from the acquired temporal samples of the target-probe binding process.]

  5. Resource-constrained sensor networks.

    In sensor networks, due to limited power and bandwidth resources, communication from sensors to a fusion center is often limited. For a rational use of the sensor network resources, the fusion center typically schedules only a subset of the available sensors for transmission, and the selected sensors transmit only partial information. Recently, we showed that in a variety of scenarios sensor selection can be cast as maximization of a submodular function over uniform matroids. It turns out that a computationally efficient greedy sensor selection algorithm is guaranteed to perform within (1-1/e) of the optimal solution. We extended these results to the scenario where the communication between sensors and the fusion center is unreliable. Our other work in this area includes reconstruction of time-varying sparse signals in sensor networks where the sensors are allowed to transmit only quantized information to the fusion center. Selected publications:

    • M. Shamaiah, S. Banerjee, and H. Vikalo, ``Greedy sensor selection: Leveraging submodularity,'' IEEE Conference on Decision and Control (CDC), Atlanta, GA, December 2010. [This paper formulates sensor selection problem as the maximization of a submodular function over uniform matroids, and proposes solving it via a greedy algorithm. Structural properties of the problem are used to significantly reduce the complexity of the naive greedy scheme.]

    • M. Shamaiah, S. Banerjee, and H. Vikalo, "Greedy sensor selection under channel uncertainty," IEEE Wireless Communications Letters, 2012. [This is a generalization of the previous work to the scenario where the communication links between sensors and fusion center are unreliable.]

  6. Cognitive radio networks.

    We develop distributed algorithms for efficient spectrum sensing and access strategies in cognitive radio networks. The developed schemes rely on probabilistic graphical representation of the considered problems, and employ message-passing techniques to find approximate solutions. Selected publications:

    • M. Shamaiah, S.-H. Lee, S. Vishwanath, and H. Vikalo, "Distributed algorithms for spectrum access in cognitive radio relay networks," IEEE Journal on Sel. Areas in Communications - Cognitive Radio Series, November 2012. [Here we consider the scenario where primary users permit secondary users access to the resource (spectrum) as long as they consent to aiding the primary users as relays in addition to transmitting their own data. Given a pool of primary and secondary users, we desire to optimize overall network utility by determining the best configuration/pairing of secondary users with primary users. Given such formulation, we develop an algorithm based on affinity propagation technique that is completely distributed in its structure.]

  7. Algorithms for solving integer least-squares problems in communications.

    In many communication systems, ML decoding reduces to an integer least-squares problem of finding the solution to the system of linear e quations where the unknown vector is comprised of integers, while the given vector and coefficient matrix are comprised of real entries. Geometrically, this can be interpreted as a search for the closest point in a given lattice. The closest point search in a lattice is well known to be an NP-hard problem and thus designers often turn to suboptimal criteria with computationally efficient solutions. While the former is true for the general integer least-squares problems, our main observation has been that in communications applications, the given vector is not arbitrary. Instead, it is an unknown point in a finite lattice that has been perturbed by an additive noise vector of known statistical properties. This leads to a probabilistic, rather than the more traditional deterministic, notion of complexity. Thus, a natural question to ask is “what are the statistics of the algorithms that solve integer least-squares problems?” We answered this question for the so-called sphere decoding algorithm, for which we found closed-form analytic expressions for the expected complexity and the complexity variance. We showed that for a wide range of SNRs, the expected complexity of sphere decoding is comparable to the complexity of the best heuristics, and is sub-cubic. This suggests that ML detection, previously thought to be intractable, can in fact be implemented with a complexity similar to that of heuristic methods, but with significant performance gains – a result with many practical implications. Selected publications: