Supplementary Materials for:

GenMiner: mining non-redundant association rules from integrated gene expression data and annotations. Bioinformatics (Oxford, England) 2008

Software

The software can be downloaded here.

Data sources

Gene expression measures are those used by Eisen et al. (Cluster analysis and display of genome-wide expression patterns, PNAS December 8, 1998 vol. 95 no. 25). This dataset is discretized using the NorDi algorithm at a 95% confidence level.

Gene annotations were collected from the following sources:

Processed Data files

Eisen dataset

Expression ratios of 2465 Yeast genes under 79 biological conditions.

Microarray Experiments

Description of the 79 experiments.

Cutoffs

Under-expressed and over-expressed cutoff thresholds computed by NorDi.

Discretized expression measures

Discretization of expression measures performed by Nordi.

Data mining context

Data matrix of 2465 lines (genes) and 737 columns (discretized expression levels and annotations). Each line contains expression profiles over the 79 biological conditions (values discretized by NorDi) and at most 658 gene annotations (24 GOSlim terms, 14 pathways, 25 transcriptional regulators, 14 phenotypes and 581 pubmed IDs).

Equivalence classes

Frequent closed itemsets and their generators extracted by Close with a minsupport of 0.005. Each class if represented by a line of the form

[Generator] [Closed itemset] n

where 'n' is the number of items in the class.

Exact associations rules

All exact association rules displayed in the form

[antecedent] => [consequent] supp=s conf=c

where 's' and c are the support and the confidence of the rule respectively.

Approximate associations rules

All approximate association rules, with a confidence greater or equals to 0.5, displayed in the form

[antecedent] -> [consequent] supp=s conf=c

where 's' and c are the support and the confidence of the rule respectively