#Number TR-PDS-1995-018 #Title Scheduling Queries With Large Out-of-Core Data Sets #Author Aman Sinha and Craig Chase #Abstract We consider the problem of dynamic, preemptive scheduling of queries that have a shared data set. The model consists of (a) a processing node with limited physical memory, (b) a disk resident data set, (c) Poisson arrival of queries, (d) known probability distributions of processing times and (e) fixed, non-zero cost of data fetches from the disk. The objective is to minimize the flow time of the queries. In this paper we assume that the queries may be scheduled in any order. Data is fetched from the disk in units of ``chunks'', which are assumed to be fixed size. Each query may process the data chunks in any order. Since the data is shared, a single data chunk may be required by multiple queries. Hence, an efficient scheduling algorithm must ensure that the chunk is read from disk a minimum number of times. It is well known that the shortest expected remaining processing time (SERPT) rule can minimize the flow time for memory resident tasks. However, it is unclear how successfully SERPT can be adapted to exploit data locality. In this paper we show that SERPT can be easily augmented to include data prefetching. Through simulation, we show that prefetched-SERPT (PSERPT) achieves very good locality of reference and is very effective at minimizing flow time even with very fine-grained computation, and a high-degree of inter-query data locality. #Bib @TechReport{SC95, author = "Aman Sinha and Craig Chase", title = "Scheduling Queries With Large Out-of-Core Data Sets", institution = "Parallel and Distributed Systems Laboratory, ECE Dept. University of Texas at Austin", year = "1995", number = "TR-PDS-1995-018", note = "submitted to the 1996 SIGMOD conference", note = "available via ftp or WWW at maple.ece.utexas.edu as technical report ECE-PDS-1995-018", }