Giving big data publishing the royal treatment

U.K. royal society jumps into 21st century with the MarkLogic NoSQL database, opening 170 years of content to public view

The Royal Society of Chemistry (RSC) in the United Kingdom is Europe's largest organization for advancing the chemical sciences. The RSC is also 170 years old -- so it's no surprise the assets it's accumulated since the 1840s would be unwieldy to manage, publish, and otherwise make more broadly available. The recent explosive growth of digital assets has only exacerbated the problem.

A NoSQL database from MarkLogic offered the solution RSC was looking for, unlocking a treasure trove of assets and enabling the RSC to publish three times as many journals and four times as many articles. It also gave the Society the ability to develop new educational applications to make chemistry accessible to a wider audience.

[ Download InfoWorld's Big Data Analytics Deep Dive for a comprehensive, practical overview of this booming field. | Harness the power of Hadoop with InfoWorld's 7 top tools for taming big data. ]

Supported by an international publishing business and worldwide members, the RSC's activities span education, conferences, science policy, and the promotion of chemistry to the public. Its history is rooted in a combination of societies that were integrated as one in 1980: The Chemical Society, The Society for Analytical Chemistry, The Royal Institute of Chemistry, and The Faraday Society. The accumulated content includes more than 1 million images, millions of science data files, and hundreds of thousands of articles from more than 200,000 authors. On top of that, add the recent capture of social media, video, and other digital content.

Giving big data publishing the royal treatment
Searching RSC publications with MarkLogic

RSC determined the MarkLogic document database was the right solution to create one integrated repository -- and make it easily accessible to anyone online, from entrepreneurs to researchers to educators around the world. The key to MarkLogic is in how it stores content as XML documents: Information that should not or cannot be expressed in a straightforward fashion as rows and columns -- such as contracts, manuals, books, emails, tweets, and metadata -- is well suited to MarkLogic's XML-based, document-centric model.

David Leeming, manager at the projects office for the RSC commented, "A book chapter is very different to a journal article whereas in a relational model you couldn't work that out, and you couldn't put those two together. We can just fill out all our XML into MarkLogic and actually then bring it out together as a single integrated delivery mechanism."

A famous trait among many NoSQL systems is their schema-less nature, meaning the database's metadata does not have to be rigid in order to build the application -- a standard requirement for applications built on relational databases. With MarkLogic, information can be loaded as is, which is especially efficient for indexing and querying information with poorly defined, changing, and/or unknowable schemas.

Each piece of content is automatically tagged, which allows users to discover content quickly and understand the context around it, connecting the dots between different pieces of research, video, journal articles, or images. RSC's platform has also added new applications for children, journals for researchers, social features, and mobile capabilities all powered by MarkLogic.

Dr. Robert Parker, RSC Chief Executive, said, "The RSC began the process of making its data more open, social and mobile in 2010 and chose MarkLogic to bring all of its data into one database capable of handling any data, at any volume, in any structure. The project has already resulted in a 30 percent increase in the number of visitors accessing its 500,000 journal articles, a 70 percent increase in volume of searches on its educational websites, and an expanded international profile, with significant growth in visits from researchers in India, China, and Brazil."

This article, "Giving big data publishing the royal treatment," was originally published at InfoWorld.com. Read more of Andrew Lampitt's Think Big Data blog, and keep up on the latest developments in big data at InfoWorld.com For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.