The RaPyDLI homepage can be viewed here.

XPS: FULL: DSD: Collaborative Research: Rapid Prototyping HPC Environment for Deep Learning

            Sponsored by: National Science Foundation


The impact of Big Data is all around us and is enabling a plethora of commercial services. Further it is establishing the fourth paradigm of scientific investigation where discovery is based on mining data rather than from theories verified by observation. Big Data has established a new discipline (Data Science) with vibrant research activities across several areas of computer science. This “Rapid Python Deep Learning Infrastructure” (RaPyDLI) project advances Deep Learning (DL) which is a novel exciting artificial intelligence approach to Big Data problems, which also involves a sophisticated model and a corresponding “big compute” needing high end supercomputer architectures. DL has already seen success in areas like speech recognition, drug discovery and computer vision where self-driving cars are an early target. DL uses a very general unbiased way of analyzing large data sets inspired by the brain as a set of connected neurons. As with the brain, the artificial neurons learn from experience corresponding to a “training dataset” and the “trained network” can be used to make decisions. Trained on voices, the DL network can enhance voice recognition and trained on images, the DL network can recognize objects in the image. A recent study by the Stanford participants in this project trained 10 billion connections on 10 million images to recognize objects in an image. This study involved a dataset that was approximately 0.1% the size of data “learnt” by an adult human in their lifetime and one billionth of the total digital data stored in the world today. Note the 1.5 billion images uploaded to social media sites every day emphasize the staggering size of big data. The project aims to enhance by DL by allowing it to use large supercomputers efficiently and by providing a convenient DL computing environment that enables rapid prototyping i.e. interactive experimentation with new algorithms. This will enable DL to be applied to much larger datasets such as those “seen” by a human in their lifetime. The RaPyDLI partnership of Indiana University, University of Tennessee, and Stanford enables this with expertise in parallel computing algorithms and run times, big data, clouds, and DL itself.

RaPyDLI will reach out to DL practitioners with workshops both to gather requirements for and feedback on its software. Further it will proactively reach out to under-represented communities with summer experiences and DL curriculum modules that include demonstrations built as “Deep Learning as a Service”.

RaPyDLI will be built as a set of open source modules that can be accessed from a Python user interface but executed interoperably in a C/C++ or Java environment on the largest supercomputers or clouds with interactive analysis and visualization. RaPyDLI will support GPU accelerators and Intel Phi coprocessors and a broad range of storage approaches including files, NoSQL, HDFS and databases. RaPyDLI will include benchmarks as well as software and will offer a repository so users can contribute the high level code for a range of neural networks with benefits to research and education.