Extending the SDSS Batch Query System to
the National Virtual Observatory Grid
María A. Nieto-Santisteban (1)
William O'Mullane (1)
Jim Gray (2)
Nolan Li (1)
Tamás Budavari(1)
Alex Szalay (1)
Aniruddha R. Thakar(1)
(1) The Johns Hopkins University
(2) Microsoft Research
The Sloan Digital Sky Survey science database is approaching 1TB in size.
While the vast majority of queries normally execute in seconds or minutes,
this prompt execution time can be disproportionately increased by a small
fraction of queries that take hours or days to run either because they
require non-index scans of the largest tables or because they request very
large result sets. In response to this, a job submission and tracking system
has been developed with multiple queues. The transfer of very large result
sets from queries over the network is another serious problem. Statistics
suggested that much of this data transfer is unnecessary; users would prefer
to store results locally in order to allow further cross matching and filtering.
To allow local analysis, a system was developed that gives users their own
personal database (MYDB) at the portal site. Users may transfer data to their
MYDB, and then perform further analysis before extracting it to their own
machine.
We now intend to extend the MYDB and asynchronous query ideas to multiple NVO
nodes. This implies development, in a distributed manner, of several features,
which have been demonstrated for a single node in the SDSS Batch Query System
(CasJobs). The generalization of asynchronous queries necessitates some form of
MYDB storage as well as workflow tracking services on each node and coordination
strategies among nodes.
|