Sloan Digital Sky Survey

Abstracts for ADASS XIII (Strasbourg, 2003) in no particular order


HTM2: Spatial Toolkit for the Virtual Observatory 
PDF Version 
Gyorgy Fekete JOHNS HOPKINS UNIVERSITY
Alex Szalay JOHNS HOPKINS UNIVERSITY
Jim Gray MICROSOFT RESEARCH


The hierarchical tringular mesh (HTM) is a discrete foundation for
describing location, size and shape on the celestial sphere. Indeces derived
from HTM descriptors are used in a relational database for managing spatial
information. Some of the new features available in the current
implementation support operations such as the ability to perform searches
based on arbitrary polygons, convex hulls of polygons or any region bounded
by great or small circles. These functions are reached through a language
that is implemented as an extension to MSQL Server 2000 relational database
engine. The heart of the HTM tools can also be used through various
interfaces in several langugages, like C++, C# and Java. An extensive XML
specification for describing spatial structures to support spatial queries
is also under development.



Batch Query System with Interactive Local Storage for SDSS and the VO

Paper as Postscript  or   as pdf 

William O'Mullane (1)
Jim Gray (2)
Nolan Li (1)
Tamas Budavari(1)
Maria A. Nieto-Santisteban(1)
Alex Szalay (1)
(1) The Johns Hopkins University
(2) Microsoft Research


The Sloan Digital Sky Survey science database is approaching 1Tb in size. 
A small fraction of queries submitted to the DB server often take hours
or days to run either because they require non-index scans of the
largest tables or request very large result sets.   However, such
queries inevitably slow down the vast majority of queries that would
normally execute in seconds or minutes. A job submission and tracking system 
has been developed with multiple queues. Execution time in queues are strictly 
limited. 

Since the exponential growth in the size of online datsets has
outstripped the increase in network bandwidths (and this discrepancy is
only likely to get worse in the future), the transfer of very large
result sets from queries over the network is another serious problem
which must be addressed. Statistics suggested that some form of local storage
would alleviate this problem as, in many cases, users do not need to download
results immediately. They would prefer to store results locally to allow
further cross matching and filtering. The system allows local space in the
form of MYDB. Selects may be performed into MYDB. Tables in MYDB may be
extracted to files (FITS,VOTABLE) and then transfered using FTP/HTTP to the
users machine. Tables may also be shared with other users through a groups
mechanism.

For the Virtual Observatory we need to extend MYDB to the notion of VOSPACE -
a collection of local spaces (on the Grid, MYDB and other systems)  where
data may be replicated, registered, transfered, and shared inside a VO
Grid-enabled environment.

Spatial Queries with VO Regions in SQL
Jim Gray(1), Alex Szalay(2), George Fekete(2), Adrian Pope(2),
Ani Thakar(2), Wil O'Mullane(2),Peter Kunszt(3)
(1) Microsoft Research
(2) The Johns Hopkins University
(3) CERN, Geneva

Using the Region part of the VO Space-Time metadata representation, we have
implemented a set of SQL User Defined Functions, which can perform complex
spatial queries over objects stored in a SQL Server database. The functions,
implemented in Transact-SQL, rely a small number of basic functions in the
HTM2 library, which is linked to the database through a DLL. The functions
perform various spatial operations on points and polygons:
    (i) point inside of a polygon,
   (ii) boolean operators on spherical polygons,
  (iii) polygons containing a point,
   (iv) all points in a polygon, etc.

In the SDSS DR1 we have built a extensive table of various geometric objects
and boundaries which can be used to censor objects in regions masked by
satellites,
and bright stars. We will also implement a Web Service that can calculate
the
intersection of an external input region with the survey footprint, or can
tell
whether a point is inside the survey or not.
efficiencies from the database. Such data are extremely important to
estimate statistical completeness for studies of large scale structure.

It is also possible to build procedures which calculate the intersections of
various observational and operational constraints (target selection areas,
spectroscopic plates) and subsequently compute separate targeting and
observing


Open SkyQuery - VO Compliant Dynamic Federation of Astronomy Archives

PDF Version 

Paper as Postscript  or   as pdf 
Tamas Budavari, Alexander S. Szalay, Tanu Malik, Aniruddha R. Thakar,
William O'Mullane, Roy Williams, Jim Gray, Bob Mann, Naoki Yasuda...

We discuss the redesign of the SkyQuery architecture, originally built as
a simple proof of concept for dynamic federation of astronomical archives.
The design of the Open SkyQuery is based upon higher level services
extending the basic functionality of the current VO standard, the
ConeSearch. Open SkyQuery implements the VO specifications for data
access, retrieval and spatial join. Data are published via Web Services
called SkyNodes providing a rich functionality including footprint
coverage. SkyNodes are discovered through the VO registry. We propose to
have at least two levels of SkyNode compliance (Core and Advanced). We
will also provide templates for publishing data into a SkyNode.

Keywords: SkyQuery, VOTable, Registry, ADQL


Making FITS available in .NET and it's Applications
PDF Version 
Vivek Haridas, Tamas Budavari, William O'Mullane, Alex Szalay,
Alberto Conti, Bill Pence, Antonio Volpicelli, Ani Thakar

The Flexible Image Transport System (FITS) is a powerful and widely
adopted means of exchanging Astronomical Data. There are also a great
number of tools and libraries available on many platforms to facilitate
working with FITS.

We present the Fits.Net, A library written to facilitate development of
astronomical data analysis tools on the Microsoft.Net Platform.  This has
been developed as a wrapper over one of the very popular and time tested
FITS libraries, CFITSIO.
Fits.Net library merges the advantages of speed and ruggedness of CFITSIO
with the language independence of the Microsoft.Net technology and a
simple Document Object Model (DOM). We believe this library will be
intuitive for .NET programmers.

We present the design and usage patterns of the library in C#. We also
discuss performance issues of the library. Finally we present a number of
applications and web services, which are currently running on this
library.

ImgCutout, an Engine of Instantaneous Astronomical Discovery.
Maria A. Nieto-Santisteban (JHU), Alex Szalay (JHU), Jim Gray
(Microsoft Research).
Paper as Postscript  or   as pdf 

ImgCutout is a web application that enables professional astronomers
and the general public to interactively visualize and explore large,
complex astronomical data sets.

The application consists of a web interface that calls a web service,
which accesses SkyServer, a 1TB SQLServer database containing catalog
data for 100 million objects, spectra and images from the Sloan Digital
Sky Survey. ImgCutout builds, in real time, color mosaic-images of
user-selected regions of the sky, and overlays additional information
about astronomical and spatial objects in the database including:
boundaries of survey fields and aperture plates, outlines of individual
objects and data quality masks, in addition to locations of photometric
and spectroscopic objects.  The tool can search for lists of known
objects, provide detailed information about selected objects, and
formulate new database queries.
  
Our presentation illustrates the instantaneous discovery process possible
with the ImgCutout.

VO Enabled Mirage and the IVOAClient Package
PDF Version 
  
Samuel Carliles (JHU), Tin Kam Ho (Bell Labs), Wil O'Mullane (JHU)
  
Astronomers commonly analyze astronomical data by imposing different views on
the data, frequently viewing it in image form or as multi-dimensional plots.
The ability to correlate the data in these views in order to see manifestations
of patterns across views would be a powerful tool.  The Mirage data
visualization application offers this functionality.  In order to increase the
value of this tool to astronomers, we have added two features to Mirage, namely
a module for viewing FITS images, and the ability to load VOTable data.  During
the process of adding the VOTable functionality to Mirage, we also developed a
separate Java package, called the IVOAClient package, which can easily be
integrated into any other Java application to provide the ability to load
VOTable data via Cone search queries submitted to any Cone services published
in a Registry, or by a direct SDSS CAS query.

We describe the usage of VO Enabled Mirage, the process of writing a data view
module which can be incorporated into Mirage, and the process of integrating
the IVOAClient package into any Java application.

From FITS to SQL - Loading and Publishing the SDSS Data
PDF version
Proceedings paper as Postscript or as PDF 
Ani Thakar, JHU
Alex Szalay, JHU
Jim Gray, Microsoft

The Sloan Digital Sky Survey Data Release 1 (DR1) contains nearly 1 TB of
catalog data published online as the Catalog Archive Server (CAS) at
http://skyserver.pha.jhu.edu/dr1.  The DR1 CAS is the end product of a data
loading pipeline that transforms the FITS file data exported by the LINUX-based
SDSS Operational Database (OpDB), converts it to CSV (comma separated values)
format, and loads it into a MS Windows-based relational DBMS (SQL Server).

Loading the data is potentially the most time-consuming and labor-intensive
part of archive operations, and it is also the most critical: it is
realistically your one chance to get the data right.  We attempted to automate
it as much as possible, and to make it easy to diagnose data and loading
errors.  

We describe this pipeline, focusing on the highly automated SQL data loader
framework (sqlLoader) - a distributed workflow system of modules that check,
load, validate and publish the data to the databases.  The workflow is
described by a directed acyclic graph (DAG) whose nodes are the processing
modules.  It is designed for parallel loading on a cluster and is controlled
from an ASP web interface (Load Monitor).

The validation stage, in particular, represents a systematic and thorough
scrubbing of the data before it is deemed worthy of publishing. The publish
step merges the different data products into a set of linked tables that can be
efficiently searched with specialized indices and pre-computed joins.

We are in the process of making the sqlLoader generic and portable enough so
that other archives may adapt it to load, validate and publish their data.


Spectrum and bandpass services for the Virtual Observatory

Laszlo Dobos, Eotvos University
Alex S. Szalay, Johns Hopkins University
Istvan Csabai, Eotvos University
Tamas Budavari, Johns Hopkins University


We present easy-to-use web applications and Web Services to search, plot
and manage spectral energy distributions and filter profiles. We provide
keyword search, advanced query forms and SQL interfaces to select spectra
or bandpasses that may be retrieved in a variety of formats including
XML, VOTable and ASCII.

All SDSS DR1 spectra had been loaded into a database as well as the entire
2dF catalog that adds up to almost half million SEDs but registered
users can upload their own data making it available for the rest of the
community and are free to modify or delete them at any time. Scientific
services allow to build rest-frame composite spectra out of selected
spectra.

The bandpass database has a growing collection of photometric filters and
the same search interfaces. Using the spectrum and filter profile core
services, we plan to build higher level services to help astronomers
create color-color diagrams, simulated catalogs and estimate distances to
extragalactic objects.



Other NVO talks of interest

Searchable Registry for the National Virtual Observatory

Gretchen Greene STScI  
William O'Mullane JHU 
Bob Hanisch STScI 
Niall Gaffney STScI

As part of the NVO framework development initiative a prototype Astronomical Registry was 
designed for serving resource metadata across the internet to the world community.  While 
this registry incorporates many VO standard Cone Search and Simple Image Access (SIA) 
services it provides mechanisms for publishing custom archive services with associated 
metadata as well.  The registry is mirrored at two sites,  Space Telescope Science 
Institute and Johns Hopkins University, and additionally harvests resources at Caltech and 
NCSA OAI repositories.  Web services and forms were implemented for independent higher level 
application integration with the registry such as the NASA Data Inventory Service (DIS).  
These interface methods provide fundamental add, edit, remove features and include standard 
SQL query support.  This registry is built with .NET technology integrated with MS SQL Server 
Database, IIS Web server, and C# product code.



Title

       Astronomical Data Query Language :
       Simple Query Protocol for the Virtual Observatory
PDF version

Authors
       Naoki Yasuda, NAOJ
       Wil O'Mullane, JHU
       Vivek Haridas, JHU
       Alex Szalay, JHU
       Masatoshi Ohishi, NAOJ
       Yoshihiko Mizumoto, NAOJ

Abstract

The Astronomical Data Query Language (ADQL) is a proposed standard
query language for the interoperability of the International Virtual
Observatory. The data servers in the International Virtual Observatory
could be searched using an ADQL query. The servers would return
VOTables as a result of the query.

The development of SkyQuery and JVOQL which perform queries on
distributed databases revealed that a standard underlying query
protocol is required to access individual data servers. In the Virtual
Observatory Query Language (VOQL) architecture proposed in the
International Virtual Observatory Alliance, ADQL is a simple
underlying protocol which would allow data servers to join the
International Virtual Observatory. Higher level language, Virtual
Observatory Query Language (VOQL), will be built on top of ADQL and
other services.

AQDL is the XML equivalent of a scaled down SQL grammar. The only
operation permitted is a "select". A "circle" clause has been added to
SQL to facilitate the astronomical queries. The schema of ADQL is
defined as XSD and the query described in ADQL will be passed in parse
tree form as XML.

All data servers joining the International Virtual Observatory would

Resource Registries for the Virtual Observatory
R. Plante, G. Green, B. Hanisch, T. McGlynn, W. O'Mullane,
R. Williams, R. Williamson

Data discovery will be a core utility of the Virtual
Observatory (VO).  Registries that contain high-level descriptions of
resources such as archives and services are essential for making data
discovery efficient in a distributed environment.  We review a
framework architecture for VO registries currently under development
within a International Virtual Observatory Alliance (IVOA) working
group.  We also describe an prototype implementation of the framework
developed as part of the National Virtual Observatory (NVO) project.  We
illustrate how institutions can publish descriptions of their
resources within their own registries.  Other registries specialize in
harvesting these descriptions to centralized locations where users may
search them.  These searchable registries can be global in their
holdings or specialized toward particular subjects, audience, or
resource types.  Because the availability of resources changes over
time, the framework must allow registries to automatically update
their contents readily.  We show how our prototype registry supports
the NVO's first publicly released service, a Data Inventory Service.

Title:
-----
Resource Metadata for the Virtual Observatory
  
Requested Presentation Type:
---------------------------
poster
  
Authors:
-------
Robert Hanisch, Space Telescope Science Institute
Tony Linde, University of Leicester
Ray Plante, National Center for Supercomputing Applications-University of
Illinois
Anita Richards, Jodrell Bank Observatory
Elizabeth Auden, Mullard Space Science Laboratory
Keith Noddle, University of Leicester
Gretchen Greene, Space Telescope Science Institute
Wil O'Mullane, The Johns Hopkins University


Abstract:
--------
  The location and access methods of astronomical resources (catalogs,
observation logs, and data archives) and associated computational services
(e.g., data processing pipelines, source extraction services, theoretical
simulations) in the Virtual Observatory will be determined by querying
dynamic resource registries.  These registries function as a sort of
yellow-pages, providing descriptive information (metadata) about the
resources in order to locate information and services in response to user
queries.  The metadata also needs to describe the provenance of the
information, provide some indication of the data quality, quantity, and
type, and guide users to information appropriate to their needs (i.e.,
research-oriented data archives vs. educational resources).

  We describe the content and structure of the resource metadata being
developed for the international VO.  We have implemented several prototype
registries based on these metadata definitions, and will share 'lessons
learned' concerning metadata integrity and consistency.  We also describe
the challenges we expect in registry maintenance in an inherently
distributed environment.



William O'Mullane
Last Modified :Wednesday, November 24, 2004 at 9:29:59 AM , $Revision 1.17 $