Contract Awarded to Develop Computational Chemistry Dataset Storage Solution

Illinois Rocstar has been awarded a contract entitled, “The Chemlab Chemistry and Materials Science Lifecycle Data Repository.” The Chemlab project is designed to not only provide a place for scientists and engineers to archive and document their simulation datasets, including provenance information and output data, but to offer a location to share and find datasets again later.

There are many sources for chemical and materials data available on the web, some of which are commercial, while others are freely available. Many universities have pages dedicated to helping their students and researchers find databases that can be accessed through their university library subscriptions (e.g., University of Illinois, MIT, etc.). There are even data repositories available for specific disciplines. There is no shortage of sources for data, albeit it can still be difficult to find what is needed, especially for highly-specific or otherwise unusual compounds and/or materials. The Chemlab project does not aim to be another chemistry/materials database project, although it is our intent to help researchers locate other databases, but rather stems from previous work Illinois Rocstar has performed in the area of Simulation Lifecycle Management. Scientists and engineers performing chemistry and materials-oriented M&S certainly have the need to find appropriate data for their simulation activities, but another important facet of M&S that is rarely addressed is the so-called “dataset lifecycle” for the simulation datasets and results themselves: birth→document→use→archive→search→reuse.

From a community perspective, a place to share data (when appropriate) is paramount. Other researchers with permission will be able to find and use the datasets, which in turn facilitates collaboration and reuse of data. Our vision is to have a publicly available version of Chemlab freely available on the web for researchers with the option to allow their data/datasets to be made public. Private Chemlab instances will also be available so that proprietary or sensitive data may be stored behind company/government/university firewalls to protect data that cannot be shared. By implementing a multisite search capability, we intend to allow private organizations to keep their data private while maintaining the ability to search the public instance seamlessly.