Dataset Downloads

Preprocessed Datasets

Atherosclerosis Risk Factors: A dataset containing information related to behavioral habits and results of biomedical analyses for 1417 patients used for the analysis of atherosclerosis risk factors. The class value (normal, risk, pathologic) is known for 1224 of these patients.

Benchmark Datasets: A collection of 7 datasets used as a benchmark for many association rule extraction algorithms such as Apriori, Close, A-Close and MaxMiner among others. This collection contains both sparse datasets and dense datasets.

Cancer SAGE: Three datasets containing SAGE (Serial Analysis of Gene Expression) data on cancer generated from GEO and SAGE Map data. The original dataset contains information on 27679 tags for 90 biological conditions. These tags were filtered to select 822 genes, and then 516 genes, and 74 conditions of interest according to available biological information.

Derisi et al. Yeast: Integration of the gene expression measurements of 5984 Yeast genes under 7 biological conditions selected by Derisi et al. with 111 gene annotations originating from 6 heterogeneous sources of biological knowledge.

Eisen et al. Yeast: Integration of the gene expression measurements of 2465 Yeast genes under 79 biological conditions selected by Eisen et al. with 658 gene annotations originating from 5 heterogeneous sources of biological knowledge.

Human–HIV-1 Protein Interactions: Two datasets containing data on Human proteins and HIV-1 proteins interactions. The first dataset contains only interaction data for 1433 human and 19 HIV-1 proteins. The second dataset integrates with human and HIV-1 protein interaction data the Gene Ontology biological annotations with TAS evidence code and Pubmed and Reactome related publication annotations for human proteins.