GenMiner

Genomic Data Miner

Description

Java implementation of the GenMiner algorithm for mining equivalence classes and minimal non-redundant association rule from gene expression data.

The software, distributed as an executable JAR file with graphic user interface, integrates the R implementation of the Normalized Discretization method (Nordi) to preprocess gene expression data and a Java implementation of the frequent closed itemsets based algorithm to extract equivalence classes and minimal non-redundant association rules from these data.

Downloads

GenMiner 2.0 (application and user guide)

Reference

GenMiner: Mining informative association rules from genomic data, Ricardo Martinez, Nicolas Pasquier and Claude Pasquier, Bioinformatics, Oxford University Press, September, 2008.

Experimental Results

These results were obtained from the annotations enriched Eisen et al. dataset containing integrated gene expression measures for 2465 Yeast genes and 737 columns (79 discretized gene expression levels and 658 gene annotations).

The gene expression measures were discretized by the Nordi algorithm with a 95% confidence level. The minimal confidence threshold was set to 50% and the minimal support threshold was set to 0.3% (association rules extracted correspond to at least 7 genes).

Equivalence classes: Frequent closed itemsets: Frequent closed itemsets and their generators extracted by JClose with a minimal support threshold of 0.003. Each equivalence class is represented by a line of the form:
[Generator] [Closed itemset] n
where n is the support (number of genes) of the equivalence class.
Exact association rules: Informative basis for exact association rules (confidence = 100%) displayed in the form:
[antecedent] => [consequent] supp=s conf=c
where s is the support and c is the confidence of the rule.
Approximate association rules: Informative basis for approximate association rules, with a confidence greater or equals to 0.5, displayed in the form:
[antecedent] -> [consequent] supp=s conf=c
where s is the support and c is the confidence of the rule.

Related Publications

Mining association rule bases from integrated genomic data and annotations, Ricardo Martinez, Nicolas Pasquier and Claude Pasquier, Proceedings of the CIBB international conference on Computational Intelligence methods for Bioinformatics and Biostatistics, Salerno, Italy, 2008.

GenMiner: Mining informative association rules from genomic data, Ricardo Martinez, Claude Pasquier and Nicolas Pasquier, Proceedings of the IEEE BIBM international conference on Bioinformatics and Biomedecine, pages 15-22, IEEE Computer Society, 2007.