#Number
TR-PDS-1995-018

#Title
Scheduling Queries With Large Out-of-Core Data Sets

#Author
Aman Sinha and Craig Chase

#Abstract
  We consider the problem of dynamic, preemptive scheduling of queries
  that have a shared data set.  The model consists of (a) a processing
  node with limited physical memory, (b) a disk resident data set, (c)
  Poisson arrival of queries, (d) known probability distributions of
  processing times and (e) fixed, non-zero cost of data fetches from
  the disk.  The objective is to minimize the flow time of the
  queries.  In this paper we assume that the queries may be scheduled
  in any order.  Data is fetched from the disk in units of ``chunks'',
  which are assumed to be fixed size.  Each query may process the data
  chunks in any order.  Since the data is shared, a single data chunk
  may be required by multiple queries.  Hence, an efficient scheduling
  algorithm must ensure that the chunk is read from disk a minimum
  number of times.  It is well known that the shortest expected
  remaining processing time (SERPT) rule can minimize the flow time
  for memory resident tasks.  However, it is unclear how successfully
  SERPT can be adapted to exploit data locality.

  In this paper we show that SERPT can be easily augmented to include
  data prefetching.  Through simulation, we show that
  prefetched-SERPT (PSERPT) achieves very good locality of reference
  and is very effective at minimizing flow time even with very
  fine-grained computation, and a high-degree of inter-query data
  locality.


#Bib
@TechReport{SC95,
  author = "Aman Sinha and Craig Chase",
  title = "Scheduling Queries With Large Out-of-Core Data Sets",
  institution =  "Parallel and Distributed Systems Laboratory, ECE
                  Dept. University of Texas at Austin",
  year =         "1995",
  number =    "TR-PDS-1995-018",
  note =      "submitted to the 1996 SIGMOD conference",
  note =      "available via ftp or WWW at maple.ece.utexas.edu
                  as technical report ECE-PDS-1995-018",
}