Haris Vikalo – Selected projects
Structured lowrank matrix factorization with applications to genomics.
In many problems, one seeks to represent a data matrix by the product of two matrices:
one capturing meaningful information and the other specifying how that information is
combined to generate the data. We consider the matrix decomposition that arises in the
context of discovery of genetic variations; there, the observed matrix contains noisy
samples of the product of an informative matrix having discretevalued entries and a
matrix with rows that are standard unit basis. We developed and analyzed convergence
of structurallyconstrained iterative algorithms for finding this decomposition.
Extensive benchmarking demonstrates superiority of the proposed framework over
stateoftheart techniques.
Supervision and smoothness in clustering.
In many applications, we
are given partial or uncertain relations between objects that need to be clustered. We developed
softconstraint
semisupervised affinity propagation (SCSSAP) algorithm which adds supervision to affinity
propagation (AP) without strictly enforcing instancelevel constraints. This formulation is
particularly advantageous in the presence of noisy labels or noisy constraints since the penalty
parameter of SCSSAP can be tuned to express our confidence in instancelevel constraints. In the
presence of label and constraint noise, SCSSAP outperforms both AP and other existing semisupervised
clustering algorithms.
Basecalling algorithms for nextgeneration DNA sequencing systems.
Novel sequencing technologies will provide significant improvements in many aspects of human
condition, ultimately leading towards the understanding, diagnosis, treatment, and prevention of
diseases. Motivated by the need
for highperforming sequencing systems, we have studied the problem of identifying the order of
nucleotides from noisy measurements – the socalled basecalling problem. The latest high
throughput nextgeneration sequencing platforms are capable of generating tens of millions of reads
in parallel and hence, in addition to being accurate, basecalling algorithms need also be
computationally efficient and scalable. In our work, we have explored the tradeoff between
accuracy and speed of basecalling and developed several algorithms characterized by different
accuracy/complexity features. Selected publications:
 S. Das and H. Vikalo,
"OnlineCall: Fast online parameter
estimation and base calling for Illumina's next generation
sequencing," Bioinformatics, vol. 28, no. 13, pp. 16771683, 2012.
[OnlineCall is a computationally efficient online learning scheme for Illumina's platforms that
achieves significantly lower error rates than the stateoftheart commercial method, while
providing three orders of magnitude speedup as compared to recently proposed competing
techniques.]

X. Shen and H. Vikalo,
"ParticleCall: A particle filter for base calling
in Illumina's nextgeneration sequencing systems,"
BMC Bioinformatics, vol. 13, no. 160, July 2012.
[ParticleCall is a sequential Monte Carlo (i.e., particle filtering) algorithm shown to
outperform all existing (as of July 2012) basecalling scheme for Illumina platform.]
Signal processing for realtime biosensor arrays.
Realtime DNA microarrays observe the dynamics of binding between molecular targets
and the biosensing elements that capture them. The paradigm shift in data acquisition,
from measuring a single steadystate data point in conventional microarrays to measuring
full dynamics of the binding process in a realtime microarray, requires the development
of novel estimation techniques. We have shown that molecular binding in realtime
microarrays can be modeled as a continuoustime Markov process that, under some mild
and practical assumptions, can be approximated by a diffusion process (
i.e., described by a stochastic differential equation). Therefore, estimation of the
number of target molecules from the acquired signal can be posed as the problem of
inferring parameters of a discretely sampled diffusion process.
Selected publications:

M. Shamaiah, X. Shen, and H. Vikalo,
"Estimating parameters of sampled diffusion processes in affinity biosensors,"
IEEE Transactions on Signal Processing, vol. 60, no. 6, June 2012,
pp: 32283239.[Inferring parameters of a discretely sampled diffusion process is
challenging because the transition density of the underlying diffusion process is not
available in closed form. In this paper, we approximate the transition density using
Hermite polynomial expansion, and develop a sequential Monte Carlo parameter estimation
scheme. Experimental results demonstrate that realtime biosensing enables significant performance improvements compared to the conventional technology.]

A. Hassibi, H. Vikalo, J. L. Riechmann, and B. Hassibi,
"Realtime DNA microarray analysis,"
Nucleic Acids Research, vol. 37, no. 20, 2009, e132:112.[This
paper presents novel affinity biosensing platform that estimates amounts of
target molecules from the acquired temporal samples of the targetprobe binding process.]
Resourceconstrained sensor networks.
In sensor networks, due to limited power and bandwidth resources, communication from sensors
to a fusion center is often limited. For a rational use of the sensor network resources,
the fusion center typically schedules only a subset of the available sensors for transmission,
and the selected sensors transmit only partial information. Recently, we showed that in a
variety of scenarios sensor selection can be cast as maximization of a submodular function
over uniform matroids. It turns out that a computationally efficient greedy sensor
selection algorithm is guaranteed to perform within (11/e) of the optimal solution. We
extended these results to the scenario where the communication between sensors and the
fusion center is unreliable. Our other work in this area includes reconstruction of
timevarying sparse signals in sensor networks where the sensors are allowed to transmit
only quantized information to the fusion center. Selected publications:

M. Shamaiah, S. Banerjee, and H. Vikalo,
``Greedy sensor selection: Leveraging submodularity,''
IEEE Conference on Decision and Control (CDC),
Atlanta, GA, December 2010. [This paper formulates sensor selection problem as the
maximization of a submodular function over uniform matroids, and proposes solving it via
a greedy algorithm. Structural properties of the problem are used to significantly
reduce the complexity of the naive greedy scheme.]
 M. Shamaiah, S. Banerjee, and H. Vikalo, "Greedy sensor selection under
channel uncertainty," IEEE Wireless Communications Letters, 2012.
[This is a generalization of the previous work to the scenario where the
communication links between sensors and fusion center are unreliable.]
Cognitive radio networks.
We develop distributed algorithms for efficient spectrum sensing and access strategies
in cognitive radio networks. The developed schemes rely on probabilistic graphical
representation of the considered problems, and employ messagepassing techniques to
find approximate solutions. Selected publications:
 M. Shamaiah, S.H. Lee, S. Vishwanath, and H. Vikalo, "Distributed algorithms
for spectrum access in cognitive radio relay networks," IEEE
Journal on Sel. Areas in Communications  Cognitive Radio Series,
November 2012. [Here we consider the scenario where primary users permit secondary
users access to the resource (spectrum) as long as they consent to aiding the primary
users as relays in addition to transmitting their own data. Given a pool of primary and
secondary users, we desire to optimize overall network utility by determining the best
configuration/pairing of secondary users with primary users. Given such formulation,
we develop an algorithm based on affinity propagation technique that is completely
distributed in its structure.]
Algorithms for solving integer leastsquares problems in communications.
In many communication systems, ML decoding reduces
to an integer leastsquares problem of finding the solution to the system of linear e
quations where the unknown vector is comprised of integers, while the given vector and
coefficient matrix are comprised of real entries. Geometrically, this can be interpreted
as a search for the closest point in a given lattice. The closest point search in a lattice
is well known to be an NPhard problem and thus designers often turn to suboptimal criteria
with computationally efficient solutions. While the former is true for the general integer
leastsquares problems, our main observation has been that in communications applications,
the given vector is not arbitrary. Instead, it is an unknown point in a finite lattice that has
been perturbed by an additive noise vector of known statistical properties. This leads to
a probabilistic, rather than the more traditional deterministic, notion of complexity. Thus,
a natural question to ask is “what are the statistics of the algorithms that solve integer
leastsquares problems?” We answered this question for the socalled sphere decoding algorithm,
for which we found closedform analytic expressions for the expected complexity and the
complexity variance. We showed that for a wide range of SNRs, the expected complexity of sphere
decoding is comparable to the complexity of the best heuristics, and is subcubic. This
suggests that ML detection, previously thought to be intractable, can in fact be implemented
with a complexity similar to that of heuristic methods, but with significant performance
gains – a result with many practical implications. Selected publications:

S. Barik and H. Vikalo,
"Sparsityaware sphere decoding: Algorithms and complexity analysis,"
IEEE Transactions on Signal Processing, vol. 62,
no. 9, May 2014, pp: 22122225.

H. Vikalo and B. Hassibi,
"On sphere decoding algorithm. II.
Generalizations, secondorder statistics, and applications to
communications," IEEE Transactions on Signal
Processing, vol. 53, no. 8, August 2005, pp. 2819  2834.

B. Hassibi and H. Vikalo,
"On sphere decoding algorithm. I.
Expected complexity," IEEE Transactions on
Signal Processing, vol. 53, no. 8, August 2005,
pp. 2806  2818.
