Skip Search and Navigation

University of Louisville Bioinformatics

Bioinformatics Research at the University of Louisville





Ernur Saka1, Benjamin J. Harrison2,3, Kirk West4, Jeffrey C. Petruska2.3, Eric C. Rouchka1

  1. Department of Computer Engineering and Computer Science, University of Louisville.
  2. Deparmtent of Anatomical Sciences and Neurobiology, University of Louisville.
  3. Deparmtent of Neurological Surgery, Kentucky Spinal Cord Injury Research Center, University of Louisville.
  4. Department of Biochemistry and Molecular Biology, University of Arkansas for Medical Science.

Since the introduction of microarrays in 1995, researchers world-wide have used both commercial and custom-designed microarrays for understanding differential expression of transcribed genes. Publicly available databases such as ArrayExpress and the Gene Expression Omnibus (GEO) have made millions of samples readily available. One of the main drawbacks to microarray data analysis involves the selection of sets of probes to represent a particular transcript of interest, particularly in light of the fact that transcript-specific knowledge (particularly in the terms of alternative splicing) is dynamic in nature. We therefore have developed a framework for reannotating and reassigning probe groups for Affymetrix® GeneChip® technology based on a set of annotations of interest. This framework addresses three issues of Affymetrix® GeneChip® data analyses: removing nonspecific probes, updating probe target mapping based on the latest genome knowledge and grouping probes into gene, transcript and region-based (UTR, exon, CDS) probe sets. Updated gene and transcript probe sets provide more specific analysis results based on current genomic and transcriptomic knowledge. The framework selects unique probes, aligns them to gene annotations and generates a custom Chip Description File (CDF).
The analyses results reveal that only 87% of the Affymetrix® GeneChip® HG-U133 Plus 2 probes uniquely align to the current human assembly hg38 without mismatches. For the same Affymetrix® GeneChip®, 90% of the transcripts annotated by Affymetrix® remain after reannotation. We also tested new mappings on the publicly available data series GSE48611 obtained from GEO. Our gene-based CDF identified an additional 276 differentially expressed genes (DEGs) not previously identified.


  • Human Genome U133 Plus 2.0 Ensembl Gene CDF
  • Human Genome U133 Plus 2.0 Ensembl Transcript CDF
  • Human Genome U133 Plus 2.0 Gene Region (CDS, Exon, UTR) CDF


  • Mouse Genome 430 2.0 Ensembl Gene CDF
  • Mouse Genome 430 2.0 Ensembl Transcript CDF
  • Mouse Genome 430 2.0 Gene Region (CDS, Exon, UTR) CDF


  • Rat Genome 230 2.0 Ensembl Gene CDF
  • Rat Genome 230 2.0 Ensembl Transcript CDF
  • Rat Genome 230 2.0 Gene Region (CDS, Exon, UTR) CDF

Supported by NIH NIGMS P20GM103436 (Nigel Cooper, PI). The contents of this work are solely the responsibility of the authors and do not represent the official views of the NIH or the National Institute for General Medical Sciences (NIGMS).

Saka E, Harrison B, West K, Petruska J, Rouchka E. (2016) Framework for reanalysis of publicly available Affymetrix(R) GeneChip(R) datasets based on functional regions of interest. To Appear in Proceedings of the 6th IEEE Conference on Computational Advances in Bio and Medical Sciences (ICCABS 2016). October 13-15, 2016, Atlanta, GA, USA. (PDF).

Top of Page