Technical Report Abstracts

TR-00-01

Emerson, E. Allen, John W. Havlicek, and Richard J. Trefler. "Virtual Symmetry Reduction." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-01. January 2000. 18 pages.

We provide a general method for ameliorating state explosion via symmetry reduction in certain asymmetric systems, such as systems with many similar, but not identical, processes. The method applies to systems whose structures (i.e., state transition graphs) have more state symmetries than arc symmetries. We introduce a new notion of "virtual symmetry" that strictly subsumes earlier notions of "rough symmetry" and "near symmetry" [ET99]. Virtual symmetry is the most general condition under which the structure of a system is naturally bisimilar to its quotient by a group of state symmetries. We give several example systems exhibiting virtual symmetry that are not amenable to symmetry reduction by earlier techniques: a one-lane bridge system, where the direction with priority for crossing changes dynamically; an abstract system with asymmetric communication network; and a system with asymmetric resource sharing motivated from the drinking philosophers problem. These examples show that virtual symmetry reduction applies to a significantly broader class of asymmetric systems than could be handled before.

Download

TR-00-03

Emerson, E. Allen and Vineet Kahlon. "Reducing Model Checking of the Many to the Few." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-03. January 2000. 13 pages.

Systems with an arbitrary number of homogeneous processes occur in many applications. The Parametrized Model Checking Problem (PMCP) is to determine whether a temporal property is true for every size instance of the system. Unfortunately, it is undecidable in general. We are able to establish, nonetheless, decidability of the PMCP in quite a broad framework. We consider asynchronous systems comprised of an arbitrary number n of homogeneous copies of a generic process template. The process template is represented as a synchronization skeleton while correctness properties are expressed using Indexed CTL*nX. We reduce model checking for systems of arbitrary size n to model checking for systems of size (up to) a small cutoff size c. This establishes decidability of PMCP as it is only necessary to model check a finite number of relatively small systems. Efficient decidability can be obtained in some cases. The results generalize to systems comprised of multiple heterogeneous classes of processes, where each class is instantiated by many homogeneous copies of the class template (e.g., m readers and n writers).

Download

TR-00-04

Sankaralingam, Karthikeyan, Ramadass Nagarajan, Stephen W. Keckler, and Doug Burger. "SimpleScalar Simulation of the PowerPC Instruction Set Architecture." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-04. February 2001. 20 pages.

In this report, we describe a modification to the SimpleScalar tool set to support the PowerPC ISA. Our work is based on Version 3.0 of the publicly available SimpleScalar tool set. We briefly describe features of the PowerPC ISA relevant to the simulator and provide operating system specific implementation details. We made modifications to the suite of five simulators that model the micro-architecture at different levels of detail. The timing simulator sim-outorder simulates PowerPC binaries on the Register Update Unit (RUU) micro-architecture. The five simulators were tested by simulating the SPEC CPU95 benchmarks to completion. The tool set simulates binaries compiled for 32-bit IBM AIX running on PowerPC.

Download

TR-00-05

Gupta, Shashank, Steven W. Keckler, and Doug Burger. "Technology Independent Area and Delay Estimations for Microprocessor Building Blocks." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-05. February 2001. 26 pages.

As technologies continue to shrink and sizes of chips continue to get larger, the transistor budget available for design is growing rapidly. Accurate area and delay estimates for the various structures would help in estimating the approximate area and delay estimates for blocks which could be used to make preliminary analysis of new designs. In this work, we identify some of the basic architecture building blocks and provide area and delay estimates for them. Published die photos of various processors were analyzed to make area estimates which were converted to technology independent units considering the process parameters used for the chip fabrication. Using those estimates, a number of design configurations have been analyzed and projections about the area and delay of future designs are made.

Download

TR-00-06

Hrishikesh, M.S., Doug Burger, and Stephen W. Keckler. "Impact of Technology Scaling on Instruction Execution Throughput." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-06. 2000. 44 pages.

Future technologies will enable faster and more numerous transistors on-chip. However, poor wire scaling as semiconductor devices shrink will increase on-chip communication delays. In this report, we explore two methods in which processor pipelines of the future may be designed -- deep pipelines with large structures capacities and short pipelines with small structure capacities. We perform our study for fifteen different clocks and across seven technologies. We show that for both design methods optimal performance is obtained when the amount of useful logic per pipeline stage corresponds to a delay of 6 fan-out-of-four (FO4). We also study the effect of the size and latency of critical microarchitectural structures on performance, measured in instructions per cycle (IPC). We quantify the effect of the latency of structures and show that the access penalties of the level-1 cache, the branch predictor and the instruction window have the largest effect on IPC. In addition, we also quantify the effect of functional unit execution latencies on IPC.

Download

TR-00-07

Berger, Emery, Kathryn McKinley, Robert Blumofe, and Paul Wilson. "Hoard: A Scalable Memory Allocator for Multithreaded Applications." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-07. April 2000. 11 pages.

Parallel, multithreaded programs such as web servers, database managers, news servers, and scientific applications are becoming increasingly prevalent. For these C and C++ applications, the memory allocator is often a bottleneck that severely limits program performance and scalability on multiprocessor systems. Previous allocators suffer from problems that include poor performance and scalability, and heap organizations that introduce false sharing. Worse, many allocators exhibit a "blowup" in memory consumption when confronted with a producer-consumer pattern of object allocation and freeing. This blowup can increase memory consumption by a factor of P (the number of processors) or lead to unbounded memory consumption. Such pathological behavior can cause premature program termination by exhausting all available swap space. This paper introduces Hoard, a fast, highly scalable allocator that avoids false sharing and blowup. Hoard is the first allocator to simultaneously solve the above problems. Hoard combines one global heap and P per-processor heaps with a novel discipline that provably bounds blowup and has near zero synchronization costs in the common case. Our results on eleven programs demonstrate that Hoard yields low average fragmentation and improves overall program performance over the standard Solaris allocator by up to a factor of 60 on 14 processors, and up to a factor of 18 over the next best allocator we tested.

Download

TR-00-09

Yang, Y. Richard and Simon S. Lam. "General AIMD Congestion Control." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-09. May 2000. 35 pages.

Instead of the increase-by-one decrease-to-half strategy used in TCP Reno for congestion window adjustment, we consider the general case such that the increase value and decrease ratio are parameters. That is, in the congestion avoidance state, the window size is increased by "alpha" per window of packets acknowledged and it is decreased to "beta" ofthe current value when there is congestion indication. We refer to this window adjustment strategy as general additive increase multiplicative decrease (GAIMD). We present the (mean) sending rate of a GAIMD flow as a function of a, 0, loss rate, mean roundtrip time, mean timeout value, and the number of packets acknowledged by each ACK. We conducted extensive experiments to validate this sending rate formula. We found the formula to be quite accurate for a loss rate of up to 20%. We also present in this paper a simple relationship between a and 0 for a GAIMD flow to be TCP-friendly, that is, for the GAIMD flow to have approximately the same sending rate as a TCP flow under the same path conditions. We present results from simulations in which TCP-friendly GAIMD flows compete for bandwidth with TCP Reno flows and with TCP SACK flows, on a DropTail link as well as on a RED link. We found that the GAINID flows were highly TCP-friendly. Furthermore, these GAINID flows have reduced rate fluctuations compared to TCP flows.

Download

TR-00-10

Yang, Y. Richard, Min Sik Kim, and Simon S. Lam. "Optimal Partitioning of Multicast Receivers." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-10. May 2000. 12 pages.

Multicast sessions may have a large number of receivers with heterogeneous reception capacities. To accommodate this heterogeneity, various multi-rate schemes, based upon the use of layering or replication, have been proposed. We consider in this paper the optimal partitioning of receivers into groups for multi-rate schemes. For a general class of utility functions, we formulate the partitioning problem as an optimization problem to maximize the sum of receiver utilities. We present an efficient dynamic programming algorithm to solve the partitioning problem, and prove that the solution it finds is optimal. We also show that the majority of the benefit of a multi-rate scheme can be gained by using a small number of groups (or layers), say 4 to 5. To illustrate our solution approach, we apply it to the case where receiver capacities are determined by multi-rate max-min fair rates. A complete protocol for receiver rates computation, rates collection, optimal receiver partitioning, and receiver adaptation is presented. We then compare our approach with other multi-rate approaches as well as a single-rate approach. Experimental results show that our approach provides substantial performance improvements.

Download

TR-00-11

Cardone, Richard, Don Batory, and Calvin Lin. "Java Layers: Extending Java to Support Component-Based Programming." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-11. June 2000. 17 pages.

Java Layers extends the Java programming language by implementing a software component model based on layer composition. Each layer implements a single design feature and may contain code that crosscuts multiple classes. Layer composition enables large software applications to be constructed in a more modular way, and with a higher level of semantic checking, than is typically achieved using current programming techniques such as object-oriented frameworks. This paper describes the Java Layers language extension.

Download

TR-00-13

Yang, Y. Richard, Min Sik Kim, Xincheng Zhang, and Simon S. Lam. "Two Problems of TCP AIMD Congestion Control." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-13. September 2000. 6 pages.

AIMD (additive increase and multiplicative decrease) algorithm has been used in many congestion control protocols, including TCP in the Internet. In this paper, we compared AIMD and MAIMD (multiplicative additive increase and multiplicative decrease). We found that the convergence speeds to fair states of AIMD and MAIMD are close to each other. However, we observe that MAIMD has some advantages. For example, its speed to use network available bandwidth can be much faster than AIMD. We also investigated AIMD behaviors under a more realistic aynchronous system model. We found that under this model, AIMD system can have more than one attrators, and therefore can be another contributor to the fairness problem of TCP.

Download

TR-00-14

Yang, Y. Richard and Simon S. Lam. "Analysis of Binomial Congestion Control." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-14. September 2000. 16 pages.

Binomial congestion control was proposed by Bansal and Balakrishnan in [2]. However, the sending rate derivation in [2] is greatly simplified and does not consider the effect of timeouts. The motivation of this paper is to analyze the sending rate of binomial congestion window adjustment policy, considering both tripli-duplicate loss indications and timeout loss indications. We also consider the selection of "alpha" and "beta" for HAD and SQRT congestion control strategies [2] to be TCP-friendly. This paper suggests that the authors of Binomial should test their protocol under higher loss scenarios. The balance of this paper is as follows. In Section 2, we describe the Binomial congestion control and state the analysis assumptions. The detail of the derivations is put in the Appendix. In Section 3, we use the sending rate formula to derive conditions under which a Binomial flow is TCP-friendly.

Download

TR-00-15

Gouda, Mohamed, Chin-Tser Huang, and Eric Li. "Anti-Replay Window Protocols for Secure IP." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-15. July 2000. 6 pages.

The anti-replay window protocol is used to secure IP against an adversary that can insert (possibly replayed) messages in the message stream from a source computer to a destination computer in the Internet. In this paper, we verify the correctness of this important protocol using standard methods (i.e. auxiliary variables, annotation, and invariants). We show that despite the adversary, the protocol delivers each message at most once, and discards a message only if another copy of this message has already been delivered, or the message has suffered a reorder of degree w or more, where w is the window size. We then develop two variations of this protocol: one variation uses two windows of size w/2 each, and the other uses w windows of size one each. We use the same standard methods to show that each of these protocols delivers every message at most once, and discards a message only if another copy of this message has already been delivered, or the message has suffered a reorder of degree w+d or more, where d is the sum of current distances between successive windows in the protocol. These two protocols are shown to be more effective than the original protocol.

Download

TR-00-17

Warshaw, Lane and Daniel P. Miranker. "Application Semantics for Active Monotonic Database Applications." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-17. July 2000. 27 pages.

We refine an active-database application taxonomy, proposed by Stonebraker, to include monotonic log monitoring applications (MLM). MLMs are a subclass of hard rule systems where triggering events are restricted to monotonic relations. We develop a formal semantic model for the MLM class. We then prove the correctness of concurrency schemes for applications within the model. Our results demonstrate that only minimal coupling mode support is necessary for the database integration of hard rule systems obeying the MLM restrictions.

Download

TR-00-18

Gorinsky, Sergey and Harrick Vin. "Additive Increase Appears Inferior." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-18. October 2000. 15 pages.

Feedback-based adjustment of load is a common mechanism for resource allocation in computer networks. This paper disputes the popular beliefs that the additive-increase multiplicative-decrease adjustment policy is optimal or even necessary for convergence to fair resource sharing. We demonstrate that, in the classic synchronous model, additive increase does not guarantee the quickest convergence of fairness. Moreover, not only fairness but also efficiency converges very slowly under additive increase. For an asynchronous model, we show that the additive-increase multiplicative-decrease algorithm fails to converge to optimal fairness. We observe that the TCP congestion control algorithm suffers from the problems detected by our analysis and is unfair.

Download

TR-00-19

Gouda, Mohamed G., E. N. Elnozahy, C.-T. Huang, and T. M. McGuire. "Hop Integrity in Computer Networks." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-19. August 2000. 28 pages.

A computer network is said to provide hop integrity iff when any router p in the network receives a message m supposedly from an adjacent router q, then p can check that m was indeed sent by q, was not modified after it was sent, and was not a replay of an old message sent from q to p. In this paper, we describe three protocols that can be added to the routers in a computer network so that the network can provide hop integrity. These three protocols are a secret exchange protocol, a weak integrity protocol, and a strong integrity protocol. All three protocols are stateless, require small overhead, and do not constrain the network protocol in the routers in any way.

Download

TR-00-20

Dahlin, Mike, Bharat Chandra, Lei Gao, Amjad-Ali Khoja, Amol Nayate, Asim Razzaq, and Anil Sewani. "Using Mobile Extensions to Support Disconnected Services." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-20. August 2000. 15 pages.

This paper examines the design and implementation of mobile extensions, a distributed operating system abstraction for supporting disconnected access to dynamic distributed services. The goal of mobile extensions is to make it as easy for service providers to deploy services that make use of caching, hoarding, asynchronous messaging, and application- level adaptation to cope with mobility, network failures, and server failures. We identify resource management as a crucial problem in this environment and develop a novel popularity-based resource management policy and demonstrate that under web service workloads it allocates resources nearly as efficiently as traditional schedulers, while under workloads with more aggressive resource users, it provides much stronger performance isolation. Overall, we find that for the four web service workloads we study, mobile extensions can reduce failures by as much as a factor of 5.9 to a factor of 16.7 for those applications able to provide tolerable service when disconnected.

Download

TR-00-21

Gunter, Brian C., Wesley Reiley, and Robert A. van de Geijn. "Implementation of Out-of-Core Cholesky and QR Factorizations with POOCLAPACK." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-21. September 2000. 11 pages.

In this paper parallel implementation of out-of-core Cholesky factorization is used to introduce the Parallel Out-of-Core Linear Algebra Package (POOCLAPACK), a flexible infrastructure for parallel implementation of out-of-core linear algebra operations. POOCLAPACK builds on the Parallel Linear Algebra Package (PLAPACK) for in-core parallel dense linear algebra computation. Despite the extreme simplicity of POOCLAPACK, the out-of-core Cholesky factorization implementation is shown to achieve in excess of 80% of peak performance on a 64 node configuration of the Cray T3E-600. The insights gained from examining the Cholesky factorization have been applied to the much more difficult and important QR factorization operation. Preliminary results for parallel implementation of the resulting OOC QR factorization algorithm are included.

Download

TR-00-22

Li, Xiaozhou, Y. Richard Yang, Mohamed Gouda, and Simon S. Lam. "Batch Updates of Key Trees." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-22. September 2000. 18 pages.

Introduced the concept of key trees for secure group communications and discussed how to update a key tree due to a single rekey request (a join or a leave). In a real application, however, rekey requests are likely to be processed in batches, instead of in real-time. In this paper we consider the case where there are equal number of joins and leaves in a batch. In this case, we can replace a leave by a join at the same location in the key tree. We call this an update. We describe a method, called the rekey ssubtree method, to efficiently process batch updates. The method identifies the keys that need to be updated and updates those keys only once. The method saves server cost substantially, compared to the naive approach, which processes updates one after another. We also describe how to modify the rekey subtree method when joins and leaves are unequal. To analyze the rekey subtree method's performance, we derive the exact expressions for the best, worst, and average case server's encryption costs. We also derive the exact expression for a user's average decryption cost. If an application only wants to minimize the server's cost, it can use server cost expressions to decide what degree the key tree should use. Our analysis and experiments show that 4 is usually the optimal degree when the number of updates is below some threshold (about 1/4 of the number of users), when the number of updates is above the threshold, key star (key tree of degree equal to the number of users) is optimal. If the application has users of limited capacity, it should choose a larger degree to reduce the user's work, at the cost of increasing the server's.

Download

TR-00-23

Yang, Y. Richard, Min Sik Kim, and Simon S. Lam. "Transient Behaviors of TCP-friendly Congestion Control Protocols." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-23. September 2000. 39 pages.

We investigate the fairness, smoothness, responsiveness, and aggressiveness of TCP and three representative TCP-friendly congestion control protocols: GAIMD, TFRC, and TEAR. The properties are evaluated both analytically and experimentally by studying protocol responses to three network environment changes. The first environment change is the inherent fluctuations in a stationary network environment. We consider three types of sending rate variations: smoothness, short-term fairness, and long-term fairness. For a stationary environment, we observe that smoothness and fairness are positively correlated. We derive an analytical expression for the sending rate coefficient of variation for each of the four protocols. These analytical results match well with experimental results. The other two environment changes we study are a step increase of network congestion and a step increase of available bandwidth. Protocol responses to these changes reflect their responsiveness and aggressiveness, respectively.

Download

TR-00-24

Yang, Y. Richard and Simon S. Lam. "A Secure Group Key Management Communication Lower Bound." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-24. September 2000. 9 pages.

We discuss in this paper a llower bound on communication cost for secure group key management system. To model the rekeying process, we introduce the concept of rekey encryption graphs. Using the rekey encryption graphs, we show that there exists a sequence of join and leave requests such that the amortized per request communication cost is "omega"(ln(n)), where n is the number of users who have ever joined the secure group.

Download

TR-00-25

Cardone, Richard and Calvin Lin. "Static Virtual Types in Java Layers." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-25. September 2000. 13 pages.

This specification describes how a simple, static form of virtual typing can be implemented in a parametrically polymorphic version of Java. We discuss the need for virtual types and the tradeoffs of integrating virtual types into a statically typed, object-oriented language. Though our discussion is couched in terms of Java Layers, the design can be used in more general settings.

Download

TR-00-26

. "REPORT NEVER PUBLISHED." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-26. .

TR-00-27

. "REPORT NEVER PUBLISHED." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-27. .

TR-00-28

Gunnels, John A. and Robert A. van de Geijn. "Formal Linear Algebra Methods Environment (FLAME) Overview." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-28. November 2000. 25 pages.

Since the advent of high-performance distributed-memory parallel computing, the need for intelligible code has become ever greater. The development and maintenance of libraries for these architectures is simply too complex to be amenable to conventional approaches to coding and attempting to employ traditional methodology has led to the production of an abundance of inefficient, anfractuous code that is difficult to maintain and nigh-impossible to upgrade. Having struggled with these issues for more than a decade, we have arrived at a conclusion that is somewhat surprising to us: the answer is to apply formal methods from Computer Science to the development of high-performance linear algebra libraries. The resulting approach has consistently resulted in aesthetically-pleasing, coherent code that greatly facilitates performance analysis, intelligent modularity, and the enforcement of program correctness via assertions. Since the technique is completely language-independent, it lends itself equally well to a wide spectrum of programming languages (and paradigms) ranging from C and FORTRAN to C++ and Java to graphical programming languages like those used for LabView. In this paper, we illustrate our observations by looking at the development of the Formal Linear Algebra Methods Environment (FLAME) for implementing linear algebra algorithms on sequential architectures. This environment demonstrates that lessons learned in the distributed memory world can guide us toward better approaches to coding even in the sequential world.

Download

TR-00-29

Kaufmann, Matt and J Strother Moore. "ACL2 Workshop 2000 Proceedings, Part A." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-29. November 2000. 125 pages.

ACL2 stands for "A Computational Logic for Applicative Common Lisp." It is a programming language, a formal mathematical logic, and a mechanical theorem proving system. ACL2 is used primarily in the formal verification of computer hardware and software, but also finds use as a programming environment and in the checking of proofs from more traditional areas of mathematics. See the ACL2 home page (http://www.cs.utexas.edu/users/moore/acl2) for more details on the system. This technical reports contains the proceedings of the second ACL2 Workshop. See http://www.cs.utexas.edu/users/moore/acl2/workshop-2000 for more information. Supporting scripts for many of the papers may be found at this site.

Download

TR-00-30

Manolios, Panagiotis. "Enumerating the Strings of a Regular Expression." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-30. March 2001. 11 pages.

We present a Haskell solution to the problem of enumerating the strings of a regular expression with respect to a regular preorder, a term we soon define. A version of this problem was considered in a note by Jayadev Misra. We have generalized the problem. The use of Haskell was suggested by Edsger W. Dijkstra and our solution makes critical use of Haskell's lazy evaluation.

Download

TR-00-31

Gorinsky, Sergey, K. K. Ramakrishnan, and Harrick Vin. "Addressing Heterogeneity and Scalability in Layered Multicast Congestion Control." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-31. March 2001. 12 pages.

In this paper, we design SIM, a protocol that integrates three distinct mechanisms - Selective participation, Intra-group transmission adjustment, and Menu adaptation - to solve the general multicast congestion control problem. We argue that only a solution that includes elements of each mechanism can scale and adapt to heterogeneity in network and receiver characteristics. In our protocol, these mechanisms operate at different time scales and distribute the responsibility of adaptation to different entities in the network. Per our knowledge, SIM is the first protocol for layered multicast that adjusts not only the subscription levels of the receivers but also the transmission rates of the layers. We show that SIM is efficient and stable in the presence of heterogeneous receivers and dynamic changes in the bottlenecks and session membership. SIM also outperforms RLM in terms of stability and efficiency.

Download

TR-00-32

Gorinsky, Sergey and Harrick Vin. "Analysis of Binary Adjustment Algorithms in Fair Heterogeneous Networks." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-32. March 2001. 20 pages.

Many congestion control schemes rely on binary notifications of congestion from the network: on detecting network congestion, they reduce transmission rates; and on receiving a signal indicating no congestion, they increase transmission rates. For conventional networks with First-In First-Out (FIFO) scheduling of packets, the effectiveness of such algorithms has been evaluated with respect to their responsiveness, smoothness, and fairness properties. Recently, it has been argued that it is possible to design high-speed network routers that can guarantee fair allocation of link capacities and buffers. In networks that employ such routers, fairness is ensured by the routers, thereby making responsiveness and smoothness the two main criteria for evaluating and selecting a binary adjustment algorithm. In this paper, we consider binary adjustment algorithms with four increase policies proposed in the literature: multiplicative increase (MI), additive increase (AI), inverse-square-root increase (ISI), and inverse increase (II). We analyze these algorithms in fair heterogeneous networks. We find that the multiplicative increase policy, which is considered inappropriate for conventional networks due to its fairness property, provides superior performance over the other policies in fair networks.

Download

TR-00-33

Agaram, Kartik K., Stephen W. Keckler, and Doug Burger. "Characterizing the SPHINX Speech Recognition System." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-33. January 2001. 11 pages.

This paper examines SPHINX, a system for speaker independent, large vocabulary, continuous speech recognition. We find that SPHINX in particular, and speech recognition systems in general, display behavior that is substantially different from the compute-bound benchmarks that have traditionally driven popular computer system design. SPHINX applies considerable load on the memory hierarchy, with a large primary working set and poor locality. In this paper we quantify these results, and correlate them with the source code, showing that they are a consequence of the algorithms used, rather than specific implementation details of the processor, or the way the application is coded. The unprecedented growth of speech recognition applications makes it imperative that system designers lend them due consideration when designing the next generation of computer systems.

Download

TR-00-34

Gunnels, John A., Daniel S. Katz, Enrique S. Quintana-Orti, and Robert A. van de Geijn. "Fault-Tolerant High-Performance Matrix Multiplication." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-34. December 2000. 16 pages.

In this paper, we extend the theory of algorithmic fault-tolerant matrix-matrix multiplication, C = AB, in a number of ways. First, we propose low-overhead methods for detecting errors introduced not only in C but also in A and/or B. Second, we theoretically show that the methods will detect all errors as long as only one entry is corrupted. Third, we propose a low-overhead rollback approach to correct errors once detected. Finally, we give a high-performance implementation of matrix-matrix multiplication that incorporates these error detection and correction methods. Empirical results demonstrate that the methods work well in practice with an acceptable level of overhead relative to high-performance implementations without fault-tolerance.

Download

TR-00-35

Gao, Youxin. "Interconnect Optimization in Deep Sub-micron Design under the Transmission Line Model." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-35. March 2001. 127 pages.

As the VLSI technology has been scaled down to 0.18nm in recent years and is expected to be scaled down to 0.05nm in the near future, interconnect delay becomes an important factor in achieving high performance. In deep sub-micron design, interconnect delay is shown to be 10 to a few hundred times bigger than the intrinsic gate delay for a global interconnect, and thus dominates the circuit delay. To reduce interconnect delay, wire-sizing and buffer insertion/sizing are two effective techniques. One of the approaches to wire-sizing is continuous wire-sizing. In continuous wire-sizing, the shape of a wire is described by a continuous function, and the objective is to find a shape function which minimizes delay or minimizes area subject to a delay bound. In the first part of this dissertation, we present some continuous wire-sizing results under the Elmore delay model. In the second part of this dissertation, we present some wire-sizing results under the transmission line model. In the third part of this dissertation, we present a graph based algorithm for optimal buffer insertion under accurate delay models.

Download

TR-00-36

Adams, William Edward. "Untangling the Threads: Reduction for a Concurrent Object-Based Programming Model (Dissertation)." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-36. August 2000. 292 pages.

Reduction is a technique for simplifying reasoning about systems where a set of sequential executions, which we call `threads', are executed concurrently. In a concurrent execution, steps from different threads are interleaved. A reduction theorem shows that a concurrent execution is equivalent to an execution in which all the steps of a given thread appear contiguously. These steps can be replaced by a single step representing the complete execution of the thread. Applying reduction to each thread in turn, we reduce a concurrent execution to a sequence of atomic steps, one for each thread. This is a sequential execution, where threads execute one at a time. Reasoning about the sequential reduction is significantly simpler than reasoning about the original execution. In the sequential execution, we do not need to consider the interactions of steps from different threads. We describe a model for concurrent systems, called `Seuss'. In this model, a program is a set of independently executing boxes, which communicate using a form of remote procedure call. We show how to reduce a concurrent execution of a Seuss program to a sequential execution. Since every concurrent execution is equivalent to a sequential execution, we can understand all possible executions of a Seuss program by understanding just its sequential executions. We show three main results. One gives sufficient conditions to guarantee that all threads in an execution terminate. The other two concern reduction of executions in which all threads terminate. We express the reduction results relative to restrictions on which pairs of threads can run concurrently. The first reduction theorem shows a restriction on concurrency that guarantees that every execution can be reduced to a sequential execution. The second reduction theorem gives a stronger restriction on concurrency (so less concurrency is allowed in an execution), but, in return, gives a reduced sequential execution that has stronger fairness properties (and thus better progress properties) than we get with the first theorem. To show these results, we use an operational semantics for a simple Seuss language. The semantics is such that our results apply to any language implementing the Seuss model.

Download

TR-00-37

Hewett, Micheal. "Computational Perceptual Attention." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-37. May 10, 2002. 80 pages.

This dissertation describes CPA, a general-purpose mechanism for expressing and implementing attention policies that control the allocation of resources among sensory processing tasks in a robot or other advanced intelligent system. A wide variety of attention policies can be expressed in this mechanism, which also supports soft real-time constraints on perceptual processing. Intelligent systems can become inundated with data, resulting in perceptual overload and a consequent inability to formulate a timely or appropriate response. Perceptual overload is often modulated by a perceptual attention mechanism that filters and prioritizes incoming data. Most existing attention mechanisms are tailored to the specific task the system is performing. A general-purpose attention mechanism must have a task-independent interface for controlling attention; support a heterogeneous set of sensors; support heterogeneous methods for processing sensor data; and support real-time throughput constraints. The CPA is a general-purpose attention mechanism that supports multimodal perceptual attention. Using it, an intelligent system can enact and control a variety of attention policies for any type or combination of sensor or sensor data. An intelligent system dynamically creates multiple heterogeneous perception tasks in accord with behavioral goals and installs them in the CPA. The CPA supports two general categories of perception tasks: detectors, which do not retain information between perception cycles; and trackers, which do. Perception tasks are prioritized using an attention policy and are executed using a priority-based scheduler. A wide range of attention policies can be expressed in this mechanism, including policies that dynamically modify perception priorities, policies in which emergency input overrides normal perception processing, and policies that dynamically change the level of resistance to perceptual distractions. Results show that perception intervals as short as 100 milliseconds can be achieved with a five-sensor robot under a variety of attention policies. Analysis of the system's performance under perceptual load shows that qualitatively different attention policies can be realized in the attention mechanism. We show that intelligent systems can use the CPA to implement the four primary characteristics of human perceptual attention: selective attention, sustained attention, divided attention, and top-down control.

Download

TR-00-38

Tarafdar, Ashis. "Software Fault Tolerance in Distributed Systems Using Controlled Re-Execution." The University of Texas at Austin, Department of Computer Sciences. Technical Report TR-00-38. August 2000. 145 pages.

Distributed applications are particularly vulnerable to synchronization faults. An important approach to tolerating synchronization faults is "rollback recovery", which involves restoring a previous state and re-executing. Existing rollback recovery methods depend on chance and cannot guarantee that synchronization faults do not recur during re-execution. We propose a new rollback recovery method, "controlled re-execution", based on selectively adding synchronizations during re-execution to ensure that synchronization faults do not recur. The controlled re-execution method gives rise to three interesting questions: How do we determine the synchronizations that ensure a safe re-execution? How do we monitor an application to detect faulty global conditions? How well does controlled re-execution perform in practice?
The first part of the dissertation addresses the "predicate control" problem which takes a computation and a global property and adds synchronizations to the computation to maintain the property. We design efficient algorithms to solve the problem for many useful predicates, including disjunctive predicates and various types of mutual exclusion predicates. These predicates correspond to commonly encountered synchronization faults such as races.
The second part of the dissertation investigates the "predicate detection" problem which involves determining whether a global property occurs in a computation. We address the problem for the useful class of conjunctive predicates and in the context of an "extended" model of computation that allows improved predicate detection over the conventional model. We show that, in general, the problem is NP-Complete. However, an efficient solution is demonstrated for the useful cases of "receive-ordered" and "send-ordered" computations. Further, this solution can be used to achieve an improved, though exponential, solution for the general problem.
The third part of the dissertation involves an experimental study of the controlled re-execution method. We evaluate the controlled re-execution method in tolerating race faults and find that the extra tracing costs imposed are within tolerable limits and that it greatly enhanced the likelihood of recovery. We conclude that controlled re-execution is an effective and desirable method for tolerating races in long-running non-interactive distributed applications.

Download

Questions to trcenter@cs.utexas.edu