TFDS now supports the Croissant 🥐 format! Read the documentation to know more.

covid19sum

Description:

CORD-19 is a resource of over 45,000 scholarly articles, including over 33,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses.

To help organizing information in scientific literatures of COVID-19 through abstractive summarization. This dataset parse those articles to pairs of document and summaries of full_text-abstract or introduction-abstract.

Features includes strings of: abstract, full_text, sha (hash of pdf), source_x (source of publication), title, doi (digital object identifier), license, authors, publish_time, journal, url.

Additional Documentation: Explore on Papers With Code
Homepage: https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge
Source code: tfds.summarization.Covid19sum
Versions:
- 1.0.0 (default): No release notes.
Download size: Unknown size
Dataset size: Unknown size
Manual download instructions: This dataset requires you to download the source data manually into download_config.manual_dir (defaults to ~/tensorflow_datasets/downloads/manual/):
This dataset need to be manually downloaded through kaggle api: kaggle datasets download allen-institute-for-ai/CORD-19-research-challenge Place the downloaded zip file in the manual folder.
Auto-cached (documentation): Unknown
Splits:

Split	Examples

Feature structure:

FeaturesDict({
    'abstract': string,
    'authors': string,
    'body_text': Sequence({
        'section': string,
        'text': string,
    }),
    'doi': string,
    'journal': string,
    'license': string,
    'publish_time': string,
    'sha': string,
    'source_x': string,
    'title': string,
    'url': string,
})

Feature documentation:

Feature	Class	Dtype
	FeaturesDict
abstract	Tensor	string
authors	Tensor	string
body_text	Sequence
body_text/section	Tensor	string
body_text/text	Tensor	string
doi	Tensor	string
journal	Tensor	string
license	Tensor	string
publish_time	Tensor	string
sha	Tensor	string
source_x	Tensor	string
title	Tensor	string
url	Tensor	string

Supervised keys (See as_supervised doc): ('body_text', 'abstract')
Figure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe): Missing.
Citation:

@ONLINE {CORD-19-research-challenge,
    author = "An AI challenge with AI2, CZI, MSR, Georgetown, NIH & The White House",
    title  = "COVID-19 Open Research Dataset Challenge (CORD-19)",
    month  = "april",
    year   = "2020",
    url    = "https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge"
}

covid19sum Stay organized with collections Save and categorize content based on your preferences.

covid19sum