Summary of CGL Lab Activities January-December 2007

The Community Grids Laboratory (CGL) was established in July 2001 as one of Indiana University’s Pervasive Technology Laboratories. It is funded by the Lilly Endowment, which provides about one third of the funding with the remainder coming from federal and industry funding. CGL is located in the Indiana University Research Park (Showers) in Bloomington. Its staff includes Director Geoffrey Fox, Associate Director Marlon Pierce, 4 senior (post-doctorate) research associates, 3 software engineers, and 14 PhD candidates. We have an international visitors program and 3 Chinese, 1 Japanese and 1 Korean scholar visited in 2006-2007 supported by their governments for periods from 3 months to a year. The students participate in Indiana University’s academic program while performing research in the laboratory. 17 CGL students have received their PhD since the start of the lab and we expect around 4 more students to graduate in 2008.

The Laboratory is devoted to the combination of excellent technology and its application to important scientific problems. Fox has worked in this fashion since he set up the Caltech Concurrent Computation Program (C3P) almost 25 years ago. The technologies we use have changed with field. Starting with parallel computing from 1983 until 1995, we then moved to Web-based computing and education with collaborative technologies. Around 2000, we focused on Grids rather broadly defined to include communities and collaboration. Recently our focus has been multi-core programming and applications while the Grid work continues with a Web 2.0 flavor. 

Research and Development Activity

Grid Architecture

We continue our core research in Grid and Web services architecture, which acts as a backdrop to all our projects. We finished a general analysis of services with Dennis Gannon to identify areas where further work needed by the global community. This activity identified data and metadata federation as a critical area where there are some approaches but no consensus on even the appropriate architecture. Our research in Grid management was received well in international conferences. We now believe that practical systems will inevitably mix Web 2.0 with Grid/Web services and this has been a recent focus with the implications of Cloud computing very important. We are also exploring the integration of coarse grain parallel computing with Grid workflow looking for possible unified approaches. We term this Parallel Programming 2.0. This work benefits greatly from our strong involvement with the Open Grid Forum where we lead eScience and several study groups.

Parallelism and Multi-core Chips

The computer industry will be revolutionized by new chip architectures with multiple cores (processing units) on the same chip. This is illustrated by the Cell processor that IBM has developed for gaming and is highlighted in their new Indianapolis Advanced Chip Technology Center.  Moreover, even commodity Intel chips now have 4 and will have over 100 cores in 5 years time. These designs require lower power and potentially offer huge performance increases. However this requires that one learn how to take parallel computing expertise now largely confined to the science and engineering domain and apply it to the broad range of applications that run on commodity clients and servers. We are just starting a major effort in this area funded by Microsoft and in collaboration with Rice University, University of Tennessee, and Barcelona Supercomputing Center with initial work focused on studying a range of AMD and Intel multi-core architectures and their performance. We are looking into a possible universal runtime for the different forms of parallelism and also at parallel data mining algorithms for multicore chips. Initial parallel algorithms for Cheminformatics and Geographical Information Systems (GIS) have been developed with a complete performance analysis. The first papers have been prepared and were well received at International conferences. The GIS work is collaborative with POLIS center at IUPUI.

Semantic Scholar Grid

This is a new project that started in 2006 that is exploring futuristic models for scientific publishing by developing Web 2.0 social networks to support the sharing, annotating and semantic analysis of scientific data and papers. We are building Web service tools that allow integration of capabilities of key systems such as del.icio.us, Connotea, CiteULike, Windows Academic Live and Google Scholar. The initial system is complete and extensive testing will begin early in 2008. Two PhD theses will be based on this work over next year and will cover difficult consistency questions for metadata prepared on different web sites as well as overall architecture. This will consider improved security models.

Chemical Informatics and Cyberinfrastructure Collaboratory (CICC)

The NIH-funded CICC project is building the Web Services, Web portals, databases, and workflow tools that can be used to investigate the abundance of publicly available data on drug-like molecules contained in the NIH’s PubChem and DTP databases.  As part of this effort, we have developed numerous services, including online services for accessing statistical packages, data services and user interfaces that allow users to search for full three-dimensional chemical structures for the ten million molecules, including one million drug-like molecules, currently in PubChem.  These can be used as inputs to many other calculations.  A prominent example includes an online docking results service that we also developed, which calculates the ability of the drug-like molecules to attach themselves to much larger proteins.  The initial versions of these calculations were used in the inaugural run of Indiana University’s Big Red supercomputer.  This database and the related Pub3D (which contains 3D structures for drug-like molecules) are currently online and are based on the entire PubChem catalog (over 10 million molecules). We have also developed Web Services in collaboration with Cambridge University for performing chemistry-specific document mining. This text mining tool (OSCAR) can be used to extract chemical information and other metadata from abstracts and journal articles available from the NIH Entrez PubMed system.  We have used this information to drive simulations (such as structural calculations described above) on Big Red, but we also see many other applications.

Minority-Serving Institutions Cyberinfrastructure Outreach Projects 

This initiative will help ensure that a diverse group of scientists, engineers, and educators from historically underrepresented minority institutions are actively engaged in the development of new Cyberinfrastructure (CI) tools, strategies, and processes.  Our key strategy was not to identify particular universities to work with but rather interact with the Alliance for Equity in Higher Education. This consortium is formed by AIHEC (American Indian Higher Education Consortium), HACU (Hispanic Association of Colleges and Universities) and NAFEO (National Association for Equal Opportunity in Higher Education) and ensures our efforts will have systemic impact on at least 335 Minority Serving Institutions. Our current flagship activity is MSI-CIEC Minority-Serving Institution Cyberinfrastructure (CI) Empowerment Coalition, which builds on success of our initial MSI CI2 (Minority-Serving Institutions Cyberinfrastructure Institute) project. Activities include workshops, campus visits and pro-active linkage of MSI faculty with Cyberinfrastructure researchers. As part of this project, we host the MSI-CIEC project wiki (http://www.msi-ciec.us) and have developed the MSI-CIEC Portal. This portal is designed to combine the Web 2.0 concepts of social networks and online bookmarking and tagging.  By using the portal and services, researchers can bookmark URLs (such as journal articles) and describe them with simple keyword tags.  Tagging in turn builds up tag clouds and helps users identify others with similar interests.  User profiles provide contact information, areas of research interest, tag cloud profiles, and RSS feeds of the user’s publications.  The value of social networking sites depends directly on the amount of data and users, so to populate the portal’s database, we imported NSF database information on previously awarded projects (available from http://www.nsf.gov/awardsearch/) and from the TeraGrid allocations database.  This information was converted into tags and user profiles, allowing users to use tags to search through awards by NSF directorate, find the top researchers in various fields, and find networks of collaborators.

Earthquake Crisis Management in a Grid of Grids Architecture

This DoD phase II SBIR is led by Anabas with CGL and Ball Aerospace as subcontractors and is creating an environment to build and manage Net-Centric Sensor Grids from services and component Grids. CGL technologies including our GIS and NaradaBrokering systems are used, and CGL will also supply non-military applications including earthquake crisis management. The project currently focuses on integrating wireless sensors (RFID, GPS, Lego Robot and video sensors) that are integrated and managed using lightweight Linux computers (Nokia N800 tablets and Gumstix miniature computers) will be supported in initial system that will allow initial deployment and dynamic real-time management of Collaborative sensor Grids.

Particle Physics Analysis Grid

This DoE phase II STTR aims at an interactive Grid using streaming data optimized for the physics analysis stage of LHC data grids. This differs from the mainstream work of the Open Science Grid and EGEE which concentrates on the initial batch processing of the raw data. We have come up with a novel concept (“Rootlets”) that provides a distributed collaborative implementation of the important CERN Root analysis package. We have built a prototype based on CGL’s NaradaBrokering and the Clarens software from our collaborators at Caltech. It allows collaborative data analysis from multiple distributed repositories and can be applied to any of a class we call composable of data analysis approaches. Interesting this includes information retrieval applications, and in future we will support Google MapReduce and the statistics package R.

Polar Grid

This is a new activity stemming from our collaboration with Elizabeth City State (ECSU an HBCU) in North Carolina. We are working with the CReSIS NSF Science and Technology center led by Kansas University to define and implement Cyberinfrastructure to support modeling and remote sensing of ice-sheets. The recent dramatic evidence of the impact of Climate Change on the Polar Regions makes this an urgent project of great societal importance. CGL Assistant Director Marlon Pierce spent a week in July at ECSU instructing students and research staff on Grid computing, deploying a Condor high throughput computing testbed, and establishing requirements for their science gateway to Polar Grid. We were awarded an NSF Major Research Instrumentation (MRI) grant for this work, which will deploy field and base Sensor Grids linked to dedicated analysis systems (Linux clusters) at Indiana University and ECSU. The first stage of this work focuses on data analysis with parallel SAR (Synthetic Aperture Radar) algorithms and the second stage on a new generation of simulation models for glaciers and their melting. These will exploit data gathered by CReSIS and analyzed on Polar Grid.

QuakeSim and GIS Grid Project

The QuakeSim project (formerly known as SERVOGrid) was refunded through NASA’s AIST and ACCESS programs.  The AIST funding continues work led by Dr. Andrea Donnellan at NASA JPL to build the distributed computing infrastructure (i.e. Cyberinfrastructure) begun under previous NASA AIST and CT program grants.  The Community Grid Lab’s focus in this project is to convert the QuakeSim portal and services into an NSF TeraGrid Science Gateway.  We have updated the QuakeSim portal to be compliant with current Java and Gateway standards.  We are also developing workflow and planning services based on the University of Wisconsin’s Condor-G software that will enable QuakeSim codes such as GeoFEST and Virtual California to run on the best available NSF and NASA supercomputers.  

The NASA ACCESS project is a joint project that combines team members from the QuakeSim project with the NASA REASoN project.  Our work here is to develop and exchange portal components and Web Services with the REASoN team.   Exchanged components include GRWS (a GPS data service developed by UCSD/Scripps), Analyze_tseri (portlets and services developed by CGL and adopted by the REASoN team), and RDAHMM (GPS data mining services developed by CGL using JPL codes and adopted by the REASoN team).  The RDAHMM portlets and services are currently being expanded to allow historical analysis of network state changes in the SCIGN (Southern California) and BARD (Northern California) GPS networks. We have also developed services and portlets for interacting with real-time GPS data streams from the California Real Time Network (CRTN).  This stream management was based on CGL’s NaradaBrokering software, and we demonstrated its scalability to networks 20 times the size of the current CRTN.

Our work during this period was dominated by a complete redevelopment of the QuakeSim portal and several of its Web Services for GPS station analysis and seismic deformation analysis.  These included major revisions to the GeoFEST, Disloc, Simplex, Analyze_tseri, and RDAHMM services to make them more self-contained and independent of the portal clients (that is, they can be easily used by other client applications, such as the Taverna workflow composer).   We build portlet web interfaces that combine Java Server Faces and Ajax/Google Maps.  We have also recently developed a plotting service that produces Google KML markups of grids and vector points, useful for representing the results of applications such as Disloc and Simplex.

Open Grid Computing Environments (OGCE)

The OGCE project provides downloadable, generic portal software for building scientific Web portals and gateways.  This NSF-funded project is a consortium of several universities and is led by CGL.  The OGCE project won a major continuation award from the NSF Office of Cyberinfrastructure this year, allowing us to continue the work initially begun under the NSF Middleware Initiative program in 2003.  The OGCE website (also recently revised) is http://www.collab-ogce.org.

A significant milestone was the release of version 2.2 of the core portal software, which completely reorganized and streamlined the build system.  This build system has been integrated with NMI testbed to provide nightly builds on over 25 operating systems (Mac OS and Linux variants).  The OGCE 2.2 release includes several portlets and services that are designed to work with the NSF’s TeraGrid.  These include job submission and management portlets (GRAM, Condor, Condor-G), information portlets (GPIR and QBETS), and remote file management (FileManager), which allows users to interact with data files on IU’s Data Capacitor and HPSS storage system.  The OGCE Workflow Suite (XBaya, XRegistry, and GFAC components, all adapted from software developed by the NSF funded LEAD project at IU) is a major new addition to the download, allowing users to create composite jobs out of individual Web services.  The OGCE’s other major release was the beta version of Grid Tag Libraries and Beans (GTLAB), an XML markup language that extends Java Server Faces and greatly simplifies the development of Grid portlets using reusable tag libraries.  In the same spirit, we are collaborating with Gregor von Laszewski at Rochester Institute of Technology to develop a JavaScript version of the COG kit to provide Web 2.0 compatible Grid client development libraries. 

The OGCE portlet components can be deployed into Java Specification Request 168 compliant containers such as GridSphere and Sakai.  Our modified build system uses the GridSphere container by default but is extensible to support other containers.  We are modifying our build process to give developers a choice between Sakai and GridSphere containers in the automated builds. 

Also under the OGCE banner, we continued our collaboration with Dr. Rick McMullen’s PTL laboratory on their CIMA portal.  A CGL graduate student is currently completing the development of a set of instrument Atom news feeds.  These are web-based content feeds of CIMA instrument metadata that can be integrated with popular news-readers such as iGoogle and Sage.

Finally, the OGCE team led the third Grid Computing Environments workshop (GCE 07) at Supercomputing.  This year’s workshop featured over 20 peer-reviewed and invited talks

NaradaBrokering Project

As part of the NaradaBrokering project we had 9 new releases (version 1.3.2, 2.0.1, 2.0.2, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2 and 3.1.3) within this reporting period. These releases incorporated support for graphical deployment of distributed broker networks, performance improvements at high publish rates and also to resolve compatibility issues with the new Microsoft operating system Vista.

During this timeframe we presented our research for securely for tracking the availability of entities in distributed systems: a pre-cursor typically to any fault tolerance scheme trying to mask failures in distributed components. This research was presented at the 21st IEEE International Parallel and Distributed Processing Symposium at Long Beach, California.

Failures often take place in distributed settings. As part of our research we have developed a framework for incorporating fine-tunable redundancies in our reliable delivery scheme. Our algorithm allows us to guarantee reliable delivery of streams in the presence of multiple failures within the system by incorporating support for fine-tuning the redundancy scheme associated with the repositories responsible for storing streams.  This work was presented at the 2007 IEEE Conference on Autonomic Computing in Jacksonville, Florida.

The Clarens effort by the High Energy Physics Group at Cal Tech has developed a framework for wrapping capabilities within ROOT – this a powerful particle physics analysis software suite from CERN – as distributed services. As part of a collaborative effort between IU and Caltech, we are developing a loosely-coupled framework using NaradaBrokering to discover, and load-balance accesses to, services that are available to physicists for analyzing, and collaborating over, data produced in particle physics experiments. This included a site visit in March 2007 to lay the groundwork for the algorithms and use cases.

The prototype system that demonstrated collaborative analysis of particle physics data was demonstrated at the Supercomputing conference in Reno, Nevada and was very well received.

OMII Software

We were funded by the UK Open Middleware Infrastructure Institute (OMII) to develop core Web Service (Grid) support for reliable messaging (FIRMS) and notification (FINS). Both these software packages have been successfully deployed within the latest version of the OMII Container. These projects are complete.

Collaboration Grids - Global MMCS (MultiMedia Collaboration System)

This project generated important input for the audio/video transport component of NaradaBrokering, and we are focusing on improving the core infrastructure and the application to e-Sports for sharing and annotating real time video between trainers and athletes. We are exploring collaborations with China on the 2008 Olympics with a project entitled e-Sports.

e-Sports

This is a new effort (spun off from GlobalMMCS and working closely with NaradaBrokering development) that we have started in the past few months to enable the following capabilities:

  1. Manage streams: Play multiple streams while eliminating network induced effects through the use of services for ordering, buffering and jitter reduction.
  2. Enable active replays: This is the ability to playback certain sections of a live stream while retaining the ability to revert to the live streams at any time.
  3. Streams annotation: This refers to the ability to annotate streams, and be able to record these annotated streams at a later time. These annotations can be based on text or graphics. This newly modified stream would then be made available for playbacks or further annotations.

We have developed several components of this system, and expect to have a functional prototype of this system in the next few months.

Educating the Residents of Indiana and Beyond (includes outreach)

The Community Grids Laboratory has major activities in outreach to Minority Institutions faculty and students. These efforts are motivated by the observation of a Dr. Richard Tapia, Rice University Professor and distinguished Hispanic American scientist:

“No first-world nation can maintain the health of its economy or society when such a large part of its population remains outside all scientific and technological endeavors.”

We have been successful in four proposals in this area (three to NSF and one to Lumina Foundation, Indianapolis). Our work hinges on the observation that Cyberinfrastructure and its underlying Grid technology inherently bridges the Digital Divide and can broaden participation in science and provide better education and business opportunities. Currently our activities are focused with the Navajo Nation in providing education and health applications for their Grid and with HBCU’s Elizabeth City State and Jackson State. We hosted an undergraduate student this summer from Jackson State as part of IU’s university-wide HBCU initiative. There are clear ways that our work could be extended to K-12 education but proposals in this area have not been successful.

Prof. Fox and CGL staff members frequently lecture on their research and broader topics as part of seminars and courses offered at Indiana University and IUPUI.  A comprehensive list of publications is available from http://grids.ucs.indiana.edu/ptliupages/presentations/.  The following presentations highlight our outreach seminars and lectures to students and general (non-technical, non-specialist) audiences:

  • Geoffrey Fox, “Net-Centric Sensor Grids” Seminar Presentation at Indiana University November 27 2007.
  • Geoffrey C. Fox and Marlon E. Pierce, “Web 2.0 for eScience: SC07 Education Program Tutorial,” Education Program Tutorial at SC07 November 12 2007 Reno Nevada.
  • Geoffrey Fox Computational Infrastructure for Policy Informatics Workshop on Policy Informatics in an Interdependent World, Washington DC September 13 2007.
  • Marlon Pierce, “Web Service Foundations: WSDL and SOAP,” I590 Class IUPUI April 5 2007.
  • Geoffrey Fox, “Informatics and Particle Physics Experiments” lecture at I573  Class March 27 2007.
  • Marlon Pierce, “QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science,” Department of Mechanical Engineering IUPUI January 25 2007.
  • Geoffrey Fox, “Global Grids Web 2.0 and Globalization Informatics,” Colloquium, Indiana University, January 12 2007.
  • Geoffrey Fox, “Cyberinfrastructure across the Globe,” Seminar to Indiana University Computer Science Honours Students, January 8 2007.

Accelerating Economic Growth

The Community Grids Lab partners with several small business ventures.  These described in greater detail in the previous sections.

  • We have won a DOD Phase II SBIR to dynamically build and manage Grids.  CGL partners with Anabas (a small startup company that leads the project) and Ball Aerospace.
  • We have won a DOE Phase II STTR with Caltech and Deep Web Technologies .
  • Anabas and CGL have recently won an additional DOE Phase I STTR for collaborative visualization systems for Plasma Physics.

Bringing Distinction to Indiana University and the State of Indiana

Geoffrey Fox continues as Vice President responsible for eScience for the Open Grid Forum. He was program chair of two major conferences this year; the annual eScience conference, which is in Bangalore India December 2007, and the Open Grid Forum event in Seattle October 2007. Indiana University will host the 2008 eScience event at IUPUI in December with Fox as general chair in collaboration with Prof. Dennis Gannon. An interesting milestone is that Fox has reached a total of 57 Ph. D. theses supervised. Fox has also been given courtesy positions at the University of Southampton (UK, renewal), University of Houston Downtown and the Alliance for Equity in Higher Education to recognize importance of collaborative work.

Lab Outlook January-June 2008

Grid Architecture.

We continue this foundation activity focusing on interaction of Grid, Web 2.0 and digital library technology.

Parallelism and Multi-core Chips

We expect this activity to grow in importance with a focus on applications that are likely on future multicore clients. This application work will be conjunction with research in new run time systems.

Semantic Scholar Grid

This flagship Web 2.0 activity will be augmented by a broader range of projects looking at ways of using Web 2.0 approaches in file transfer, document sharing and people networking in science and education.

Chemical Informatics and Cyberinfrastructure Collaboratory (CICC)

We will continue our work to populate new data services that add value to NIH PubChem. This work will focus on the calculation, storage, and efficient search of structural conformers for drug-like molecules.  In addition to its computational intensity, this work will expand our databases by a factor of 10 or more.  It is therefore crucial that we investigate scalable database partitioning techniques.  This is naturally done using clustering techniques developed in our Multicore work, so we will implement this.

CICC also provides a significant opportunity for investigations in applying Web 2.0 techniques to e-Science.  We are evaluating the use of Yahoo Pipes as an example workflow/mashup building tool that can be used to encode scientific use cases that combine several CICC RSS feeds.  We are also investigating problems in adapting Start Pages such as Netvibes and iGoogle to support ecosystems of user interface widgets to CICC gadgets.  Finally, the important new area of microformatting for metadata expression and management will be investigated.

Open Grid Computing Environments (OGCE)

The OGCE project has recently been awarded its second NSF grant from the Office of Cyberinfrastructure.  Our  next major release will be available for the TeraGrid Conference in June 2008 and will be accompanied by a hands-on tutorial.  The release will include an enhanced version of our workflow suite tools and components for interacting with the TeraGrid information services.

Minority-Serving Institutions Cyberinfrastructure Outreach Projects
Our recent awards give us plenty to do with a Web 2.0 portal for MSI faculty and students as a major focus. A hard problem is identifying those scientists at MSI’s that are good candidates for collaborating on eScience projects. Our idea is to promote self-identification (bottom-up) using Web 2.0 rather than traditional top-down approaches that tend to always locate the same small group of outreach candidates.

We are currently complete the initial phase of the MSI-CIEC portal that combines various Web 2.0 approaches (AJAX, shared bookmarking, profile building, researcher matchmaking, and community building).  This portal will be available starting Jan 15.  We will also investigating integrating the MSI-CIEC portal with Facebook and the Google-led Open Social networking sites.

Earthquake Crisis Management in a Grid of Grids Architecture

This work will continue with a focus on sensor Grids integrated with lightweight computing devices.

Polar Grid

We will work with Elizabeth City State University to bring up a prototype Polar Grid and Science Gateway.  This will include workshops on data analysis, Grid and Portal architecture. A major challenge will be support of data gathering expeditions from May 1, 2008 to June 15, 2008 in Greenland and December 2008 to January 2009 in Antarctica.

NaradaBrokering and Particle Physics Analysis Grid

During this period we are planning on releasing software that incorporates production implementations of our scheme for the scalable tracking of distributed entities. An additional capability would be to support the voluminous replay/recording of multimedia streams produced within the eSports project.

We also plan to release a prototype version of the framework, using NaradaBrokering, which will be used within Clarens to discover, and load-balance accesses to, services that are available to physicists for analyzing, and collaborating over, data produced in particle physics experiments

QuakeSim and GIS Work

We also plan several enhancements to existing portlets and services for better integration with GPS analysis applications.  One of our goals is to extend the current portal into a TeraGrid Science Gateway for selected codes, particularly GeoFEST.  This will allow us to use the much larger high performance computing resources that the TeraGrid provides.  We will be using our GTLAB project (from the OGCE work) to provide these new capabilities.

e-Sports: We have developed several components of the eSports system. We expect to have functional prototype of this system in the next few months. This software will harness the distributed repository capability available within NaradaBrokering to facilitate the recording and replay of multimedia streams.