- Description:
The Yahoo Learning to Rank Challenge dataset (also called "C14") is a Learning-to-Rank dataset released by Yahoo. The dataset consists of query-document pairs represented as feature vectors and corresponding relevance judgment labels.
The dataset contains two versions:
set1
: Containing 709,877 query-document pairs.set2
: Containing 172,870 query-document pairs.
You can specify whether to use the set1
or set2
version of the dataset as
follows:
ds = tfds.load("yahoo_ltrc/set1")
ds = tfds.load("yahoo_ltrc/set2")
If only yahoo_ltrc
is specified, the yahoo_ltrc/set1
option is selected by
default:
# This is the same as `tfds.load("yahoo_ltrc/set1")`
ds = tfds.load("yahoo_ltrc")
Homepage: https://research.yahoo.com/datasets
Source code:
tfds.ranking.yahoo_ltrc.YahooLTRC
Versions:
1.0.0
: Initial release.1.1.0
(default): Add query and document identifiers.
Download size:
Unknown size
Manual download instructions: This dataset requires you to download the source data manually into
download_config.manual_dir
(defaults to~/tensorflow_datasets/downloads/manual/
):
Request access for the C14 Yahoo Learning To Rank Challenge dataset on https://research.yahoo.com/datasets Extract the downloadeddataset.tgz
file and place theltrc_yahoo.tar.bz2
file inmanual_dir/
.Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@inproceedings{chapelle2011yahoo,
title={Yahoo! learning to rank challenge overview},
author={Chapelle, Olivier and Chang, Yi},
booktitle={Proceedings of the learning to rank challenge},
pages={1--24},
year={2011},
organization={PMLR}
}
yahoo_ltrc/set1 (default config)
Dataset size:
795.39 MiB
Auto-cached (documentation): No
Splits:
Split | Examples |
---|---|
'test' |
6,983 |
'train' |
19,944 |
'vali' |
2,994 |
- Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 699), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
doc_id | Tensor | (None,) | int64 | |
float_features | Tensor | (None, 699) | float64 | |
label | Tensor | (None,) | float64 | |
query_id | Text | string |
- Examples (tfds.as_dataframe):
yahoo_ltrc/set2
Dataset size:
194.92 MiB
Auto-cached (documentation): Yes
Splits:
Split | Examples |
---|---|
'test' |
3,798 |
'train' |
1,266 |
'vali' |
1,266 |
- Feature structure:
FeaturesDict({
'doc_id': Tensor(shape=(None,), dtype=int64),
'float_features': Tensor(shape=(None, 700), dtype=float64),
'label': Tensor(shape=(None,), dtype=float64),
'query_id': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
doc_id | Tensor | (None,) | int64 | |
float_features | Tensor | (None, 700) | float64 | |
label | Tensor | (None,) | float64 | |
query_id | Text | string |
- Examples (tfds.as_dataframe):