Nicolas PASQUIER ♦ Université Côte d'Azur

Eisen et al. Yeast

This dataset was constructed by integrating the Eisen et al. gene expression data for 2465 Yeast genes under 79 biological conditions and 658 annotations of these genes:

  • The 79 gene expression data of Eisen et al. (1998) correspond to 4 experiments on cell cycle, sporulation, temperature shock and diauxic shift processes. These measurements were discretized using the Nordi algorithm with a 95% confidence level.
  • The 658 gene annotations were extracted from 5 biological data sources: 24 GO Slim terms, 14 pathways, 25 transcriptional regulators, 14 phenotypes and 581 Pubmed IDs.

The resulting matrix contains 2465 lines (genes) and 737 columns (expression levels and annotations).

Data Files
File Description
Expression measurements Expression ratios of 2465 Yeast genes under 79 biological conditions.
Experiment description Description of the 79 experiments.
Cutoffs Under-expressed and over-expressed cutoff thresholds computed by NorDi.
Discretized expression measurements Discretization of expression measures performed by Nordi.
Eisen dataset Data matrix of 2465 lines (genes) and 737 columns (expression levels and annotations). Each line contains expression profiles over the 79 biological conditions (measurements discretized by NorDi) and at most 658 gene annotations (24 GO Slim terms, 14 pathways, 25 transcriptional regulators, 14 phenotypes and 581 Pubmed IDs).
Reference

GenMiner: Mining informative association rules from genomic data, Ricardo Martinez, Claude Pasquier and Nicolas Pasquier, Proceedings of the IEEE BIBM international conference on Bioinformatics and Biomedecine, pages 15-22, IEEE Computer Society, 2007.

Detailed Description

The gene expression measurements were selected from the dataset published in Cluster analysis and display of genome wide expression patterns, M. Eisen, P. Spellman, P. O. Brown and D. Botstein, Proceedings of the National Academy of Sciences of the USA, 95:25(14863-14868), 1998.

Gene annotations were collected from the following sources:

  • Gene Ontology: Extracted from GO terms (version of 27/08/2007), annotations (version of 24/08/2007) and GO Slim terms.
  • Literature: Yeast genes and Pubmed IDs associations (version of 8/08/2007)
  • Pathways: Information on the metabolic pathways in which each gene is involved (version of 9/08/2007).
  • Phenotypes: Annotations were retrieved from the Saccharomyces Genome Database (version of 9/08/2007).
  • Transcriptional regulators: Information of transcriptional regulators that bind to promoter regions were extracted using a p-value threshold of 0.0005.

Experimental Results

The equivalence classes and the Informative Basis of association rules extracted from this dataset can be downloaded on the GenMiner page. The minimum support threshold was set to extract minimal non-redundant association rules concerning at least 7 genes. The minimum confidence threshold defining the minimal precision of extracted association rules was set to 50%.

Related Publications

Mining association rule bases from integrated genomic data and annotations, Ricardo Martinez, Nicolas Pasquier and Claude Pasquier, Proceedings of the CIBB international conference on Computational Intelligence methods for Bioinformatics and Biostatistics, Salerno, Italy, 2008.

GenMiner: Mining informative association rules from integrated gene expression data and annotations, Ricardo Martinez, Nicolas Pasquier and Claude Pasquier, Bioinformatics, Oxford University Press, September, 2008.