Type of presentation: Oral

ID-4-O-3387 Organizing, Managing and Searching Large Collections of Images:  A New Resource To Handle High-Throughput Imaging

Morgan D. G.1, Gopu A.2, Young M.2, Hayashi S.2
1Electron Microscopy Center, Indiana University - Bloomington, 2Pervasive Technology Institute, Indiana University - Bloomington
dagmorga@indiana.edu

With the advent of digital imaging, it has become easy to record more images in less time than ever before. Individuals as well as research groups and imaging facilities can drown in the images recorded for a single project, much less those from a career or years of facility use. High-throughput methods that use imaging as recording and screening tools have made such issues worse. Archiving such large image collections can pose a problem, though university and commercial cloud storage can help. Ways to deal with storage and access have been and continue to be developed, and we present here a new resource designed with these issues in mind.

Groups from the IU-Bloomington Electron Microscopy Center (EMC) and Pervasive Technology Institute (PTI) have begun a project to deal with the tidal wave of images produced using EMC resources. The EMC PPA (Portal, Pipeline & Archive) is an offshoot of a project for handling images from the One Degree Imager on the WIYN 3.5m telescope at Kitt Peak, Arizona. The concept behind both projects is that images are uploaded from the recording instrument(s) into a database that is archival and searchable. Images are stored with calibration and reference data, instrument metadata and tag words to help users search for particular images. Access control exists for the images. The images and all functions of the database are accessible through any internet browser. Images are stored locally on the Data Capacitor shared file system and archivally in the Scholarly Data Archive (SDA). A number-crunching pipeline built on the Big Red II computing cluster is used for compute-intensive data reduction and image processing.

Fig 1 shows a typical search page. The most commonly used metadata search fields appear in the upper left, but all available fields can be accessed using the Other Fields button at the right. Tag words and anti-tag words can be search simultaneously. Search results are shown at the bottom as rows of image data starting with an image thumbnail. Sorting can be based on any column, and both actual columns and their order can be individualized.

Search results can be turned into a "collection," which can either be displayed as shown in Fig 1 or as the grid view in Fig 2. A collection can be downloaded, sent into
the pipeline for processing, or individual images can be examined (and retained or rejected from the collection). Tag words can be edited both for individual images and collections.

The EMC PPA is a work in progress. We hope it will help our users manage their image data and that it will also help the EMC track facility use and data mine for information lost in the thousands of image recorded yearly.  Since this framework now exists, the concepts can be extended to other types of image data and sources.


Fig. 1: Search Page.  Images can be searched using metadata tags (e.g., acquisition date, magnification and accelerating voltage), as well as with tags a user associates with particular images and data sets.  Metadata fields are pulled from the images themselves while other searchable tags are entered during and after uploading images into the portal.

Fig. 2: Grid View of Collection Called FirstImages_0001.  Groups of images can be assembled into permanent "collections" that are easily examined, manipulated and downloaded.  Mousing-over an image in this grid view pops up details about an image, while clicking the magnifying glass leads to an image exploration page for a single image.