Downloads

Download Datasets

DATASET	DISCRIPTION	Value
MNIST	The MNIST database of handwritten digits, has a training set of 60,000 training examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.	Download
CIFAR-10	The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.	Download
CIFAR-100	This dataset is just like the CIFAR-10, except it has 100 classes containing 600 images each. There are 500 training images and 100 testing images per class. The 100 classes in the CIFAR-100 are grouped into 20 superclasses. Each image comes with a "fine" label (the class to which it belongs) and a "coarse" label (the superclass to which it belongs).	Download
IMDB	This is a dataset for sentiment classification containing substantially large set of movie reviews.	Download
Reuters	In 2000 Reuters released a corpus of Reuters News stories for use in research and development of natural language-processing, information-retrieval or machine learning systems.	Download
Caltech101	Pictures of objects belonging to 101 categories. About 40 to 800 images per category. Most categories have about 50 images. Collected in September 2003 dataset has each image of size roughly 300 x 200 pixels.	Download Details
The Street View House Numbers (SVHN)	SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data preprocessing and formatting. It has small cropped images of digits. But incorporates an order of magnitude more labeled data (over 600,000 digit images) and comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.	Details
ImageNet	It is an image database organized according to the WordNet hierarchy (currently only the nouns), in which each node of the hierarchy is depicted by hundreds and thousands of images. Currently we have an average of over five hundred images per node. We hope ImageNet will become a useful resource for researchers, educators, students and all of you who share our passion for pictures.	Details
Caltech 256	Caltech 256 is a dataset with 256 object categories and each category have at least 80 images. In total this dataset has 30608 images.	Download Details
MovieLens	20 million ratings and 465,000 tag applications applied to 27,000 movies by 138,000 users. Includes tag genome data with 12 million relevance scores across 1,100 tags.	Download Details