Connect with us

Artificial Intelligence

COVID-19 Datasets Bring AI Experts, Life Sciences Researchers Together For A Cure

Published

on

The COVID-19 Open Research Dataset has been built by the BIO-IT community to support research into finding treatments for the virus. (GETTY IMAGES)

By Allison Proffitt, Editorial Director, AI Trends

All of the Bio-IT community is eager to contribute to plans for treatments, diagnostics and vaccines for SARS-CoV-2 and the resulting disease, COVID-19. Companies are donating consulting services, compute resources, tools for clinical trials, and so much more. But the biggest donations might be the sheer volume of data being pooled for researchers to mine for answers.

On March 16, the Allen Institute for AI (AI2), Chan Zuckerberg Initiative (CZI), Georgetown University’s Center for Security and Emerging Technology (CSET), Microsoft, and the National Library of Medicine (NLM) released the COVID-19 Open Research Dataset (CORD-19).

The dataset, accessible through the Allen Institute for AI’s Semantic Scholar platform, includes scholarly literature about COVID-19, SARS-CoV-2, and the coronavirus group.

Semantic Scholar is a free, AI-powered tool for navigating scientific literature, Doug Raymond, the general manager for Semantic Scholar told AI Trends. Established in 2015, Semantic Scholar collects millions of peer-reviewed journal articles, publications from preprint servers, related GitHub repositories, blog posts, clinical trial data, presentations, videos, and more. More than 180 million papers are included in Semantic Scholar.

Doug Raymond, General Manager, Semantic Scholar

The CORD-19 dataset currently includes over 47,000 scholarly articles including 36,000 full text articles from PubMed, found using a search query that includes COVID-19, coronavirus, SARS, MERS and other relevant terms. Pre-prints from bioRxiv and medRxiv are included based on the same query. The dataset includes information on coronaviruses in general, and papers date back to the 1970s, Raymond said.

“We’ve partnered with Elsevier, the World Health Organization, and a number of other institutions to get the full text of the articles, and then we’ve created a structured representation of this data in JSON format, which allows you to see all the metadata, the full text,” he said. “We’re planning to add additional metadata such as citations, which show the links between the different papers.”

Currently the CORD-19 dataset is updated weekly and can be downloaded by researchers. Raymond says that they are working to publish daily updates.

In addition to the data pool, the AI2 team has released tools as well. CoViz allows researchers to identify associations between concepts that occur in the CORD-19 database. CORD-19 Explorer is a search engine that is built on top.

Advertisement

“Essentially this is a way to take what previously was thousands of PDFs of papers and make it very, very easy to review that literature for any particular research interest.”

Structure Advantage

There is, in fact, a wealth of information on COVID-19 and coronaviruses in general, and many groups are working to collect and share those data. The World Health Organization has a COVID-19 Research Database and the National Institutes of Health LitCOVID resource also tracks COVID-19 literature. Microsoft has dedicated both a COVID-19 Resource Page and CORD-19 AI Powered Search. Overton has created a COVID-19 Policy Dataset and the Cochrane Library has also curated a COVID-19 Literature Review Collection.

“We’re sitting on this treasure trove of science we’ve created over—literally—the last century. We want to make anything relevant to COVID-19 open to the world to find a treatment and get us through what we’re going through right now, which is just surreal,” said Michael Dennis, echoing the thoughts of many.

Michael Dennis, VP of Innovation, Chemical Abstract Services

Dennis is VP of Innovation at Chemical Abstract Services, a division of the American Chemical Society. For more than 100 years, CAS has been collecting small molecules and cataloging their chemical structures, sequences, toxicity, and known biological activity. CAS has built a candidate compound dataset of about 50,000 compounds chosen based on their chemical structure’s similarity to known antiviral compounds and those structures druggability and toxicity. The collection is available within the CORD-19 dataset.

“It will scientists a head start, if you will,” Dennis said.

CAS started by compiling a list of all the known antivirals using SciFindern, the CAS discovery platform for mining the 100 million small molecules in the CAS registry.

“We pulled out known antiviral compounds. One example is remdesivir; that has a CAS registry number and we know a lot about that molecule including its shape. We ended up with about 100 known antivirals. We didn’t focus on just COVID-19; we didn’t focus on just coronaviruses. We went a little broader,” Dennis said. From there, the team expanded the pool of candidates based on those 100 known antivirals by looking for compounds with similar chemical structures doing substructure searching and similarity searching, and then further refining the list by size, toxicity, and biological activity. They looked for anti-infective agents, respiratory system agents, and enzyme inhibitors.

“We ended up with a candidate compound dataset of about 50,000 compounds,” Dennis said. “We can’t guarantee they’re going to treat a [viral infection], but they’re related to known antivirals based on all the work we did.”

Advertisement

CAS released its COVID-19 structures dataset in mid-March and made it available through the CORD-19 dataset hosted at Semantic Scholar. CAS is already working on additional datasets. “We’re starting to look at SAR data—structure activity relationship data. It has to do with how these molecules might bind to a target, a protein. That relationship is important in the treatment of any disease,” Dennis says.

Uniting Effort

Dennis says the CAS dataset has been downloaded by pharma companies, biotechs, and academic researchers all over the globe. Many are organizations CAS has had long relationships with, but some are new. “They’re organizations that aren’t traditional biotech or pharmaceutical companies. They’re organizations that focus more on software and AI. They normally wouldn’t license tools like SciFinder, but they want access to this kind of rocket fuel for their AI engines,” he said.

On the AI side, Raymond is seeing a similar convergence. “We’re seeing a lot of interest from both communities,” he said. “The NLP community which is using natural language processing techniques to try to unearth information embedded in this dataset is very much engaged and has been publishing tools and new reviews and information based on what we’ve released. We’re also seeing the medical research community take a great deal of interesting in the resources as well.”

Both Dennis and Raymond believe that offering these biomedical datasets to both life sciences researchers and AI researchers will accelerate the discovery of a cure.

“I think it’s going to be a hybrid [effort],” Dennis said of the future cure. “I think it’s going to be the combo of the AI tech with the more traditional science that’s going to unlock the next treatment for COVID-19. And it’s out there. I’m 100% convinced we will find it.”

Raymond agreed. “We were founded as an AI institute for the common good. To have a threat like COVID-19 [impacts all of us.] It’s a great opportunity to show how AI can support a better way of doing science. We hope that not only are we able to help find treatments and ultimately a cure for COVID-19, but we’re able to accelerate scientific progress more generally.”

Learn more at CORD-19 dataset.

Advertisement

Source

Continue Reading
Advertisement
Advertisement
Advertisement Submit

TechAnnouncer On Facebook

Advertisement
BRETT Sets a New Standard for Meme Coins BRETT Sets a New Standard for Meme Coins
Blockchain1 week ago

BRETT Sets a New Standard for Meme Coins with Social Change at Its Core

The popular meme coin Brett (BRETT) is having a stellar time, having surged about 160% since the US Presidential election’s...

A closer look at dYdX’s latest ‘Unlimited’ upgrade A closer look at dYdX’s latest ‘Unlimited’ upgrade
Blockchain2 weeks ago

A closer look at dYdX’s latest ‘Unlimited’ upgrade and why it matters for DeFi users

The decentralized finance (DeFi) landscape has witnessed exponential growth recently, with the total market capitalization of this space growing from...

Holiday Season Holiday Season
Blockchain2 weeks ago

This Holiday Season, Redeem Your Gift Cards for Crypto!

The holidays are around the corner, and so is the gifting season. According to the 2024 Deloitte holiday retail survey,...

ZIGChain soars ZIGChain soars
Blockchain2 weeks ago

ZIGChain soars as ecosystem developments mount and whales continue to accumulate $ZIG.

The last twelve months have seen the crypto market face innumerable swings, with many established projects seeing red during this...

Modern data center with advanced cooling technology in action. Modern data center with advanced cooling technology in action.
Blockchain2 weeks ago

Vertiv Partners With Ansys to Transform Data Center Cooling Systems

Vertiv has announced a strategic collaboration with Ansys to enhance its design processes for data center cooling systems. This partnership...

Futuristic landscape with wind turbines and solar panels. Futuristic landscape with wind turbines and solar panels.
Artificial Intelligence2 weeks ago

COP29: Digital Tech and AI Can Boost Climate Action

Leaders in technology and environmental sectors gathered at COP29 in Baku, Azerbaijan, to endorse a groundbreaking declaration aimed at leveraging...

Al Kingsley, CEO of the NetSupport Group Al Kingsley, CEO of the NetSupport Group
Business Technology1 month ago

The Business Cost of a Missed Message

Business leaders depend on emails and direct messages to deliver the information that keeps our teams advancing toward critical goals....

Right Airbnb Management Company Right Airbnb Management Company
Real Estate Technology2 months ago

How to Choose the Right Airbnb Management Company

Running a successful Airbnb property requires a lot of effort and time, which is why many hosts turn to Airbnb...

A Review of the Shure SM7B Microphone A Review of the Shure SM7B Microphone
Tech Reviews2 months ago

Unleashing the Power of Sound: A Review of the Shure SM7B Microphone

The Shure SM7B microphone has made waves in the audio world, becoming a favorite among podcasters, musicians, and broadcasters alike....

Pocket Cinema Camera 6K Pro Pocket Cinema Camera 6K Pro
Tech Gadgets2 months ago

Capturing Magic: A Review of the Blackmagic Pocket Cinema Camera 6K Pro

The Blackmagic Pocket Cinema Camera 6K Pro is a game-changer for filmmakers and content creators. With its impressive features and...

Advertisement
Advertisement Submit

Trending

Pin It on Pinterest

Share This