Sample Projects

AbsIDConvert: Absolute gene ID conversion tool

With the availability of gene and protein centric databases (NCBI, Ensembl, UCSC, and others), as well as the wide variety of available platforms for measuring gene expression (Affymetrix, Agilent, custom arrays, and RNA-Seq), biological researchers need reliable methods for converting various identifiers from one type to another. AbsIDConvert is based on the unique idea that genomic identifiers can be converted to genomic intervals, and therefore conversion between identifiers requires simply finding overlapping intervals.

Mohammad F, Flight RM, Harrison BJ, Petruska JC, Rouchka EC: AbsIDconvert: An absolute approach for converting genetic identifiers at different granularities. BMC Bioinformatics 2012, 13:229. doi:10.1186/1471-2105-13-229.

Available as web interface and virtual machine

ANGIOGENES: A Knowledge Database for Angiogenesis

ANGIOGENES is a tool to explore and compare the expression profiles of transcripts in endothelial cells.
Muller R, Weirick T, John D, Militello G, Chen W, Dimmeler S, Uchida S. (2016) ANGIOGENES:Knowledge database for protein-coding and noncoding RNA genes in endothelial cells.Scientific Reports 6:32475. (DOI: 10.1038/srep32475; PMID: 27582018)

Available as Web Interface

C-It-Loci: A Knowledge Database for Tissue-Enriched Loci

C-It-Loci is a tool to explore and to compare the expression profiles of conserved loci among various tissues in three organisms. Conversed loci are pairs of adjacent homologous protein-coding genes shared between one or more species. Expression profiles are based on RNA-seq data from many sources to derive tissue enrichment or specificity. Classifications of transcripts are based on the latest release of ENSEMBL, which will be updated in a timely manner. In addition to protein-coding genes, expression profiles of yet-to-be-characterized long non-coding RNAs (lncRNAs) are included. To define species-conservation of lncRNAs, we introduced the concept called "positional conservation" on top of "sequence conservation", which is the most common way to find homologous lncRNAs among species. We anticipate that C-It-Loci will be a valuable tool to perform in silico screening of tissue-enriched lncRNAs to be studied further by biological experiments.
Weirick T, John D, Dimmeler S, Uchida S. (2016) C-It-Loci: A Knowledge Database for Tissue-Enriched Loci. Bioinformatics 31(21):3537-3543. (DOI: 10.1093/bioinformatics/btv410; PMID: 26163692)

Available as Web Interface

CategoryCompare: High-throughput data meta-analysis using gene annotations

CATEGORYCOMPARE is a methodology for cross-platform and cross-sample comparison of high-throughput data at the annotation level (such as GO ontologies; KEGG pathways; and gene sets (GSEA)). This approach allows for the comparison of datasets from heterogeneous platforms. CategoryCompare provides a powerful visualization utilizing Cytoscape that allows for users to quickly view the shared features between annotations. CategoryCompare is available as an R bioconductor package. A web version of categoryCompare is currently under construction which employs cytoscape.js.
Flight RM, Harrison BJ, Mohammad F, Bunge MB, Moon LDF, Petruska JC, Rouchka EC: categoryCompare, an analytical tool based on feature annotations. Frontiers in Genetics 2014, 5:98. doi: 10.3389/fgene.2014.00098

Available as R bioconductor package

CSI-UTR: Cleavage Site Interval Analysis of 3'UTR Differential Expression

Untranslated regions of the 3' end of transcripts (3'UTRs) are critical for controlling transcript abundance and location. 3'UTR configuration is highly regulated and provides functional diversity, similar to alternative splicing of exons. Detailed transcriptome-wide profiling of 3'UTR structures may help elucidate mechanisms regulating cellular functions. This profiling is more difficult than for coding sequences (CDS), where exon/intron boundaries are well-defined. To enable this we developed a new approach, CSI-UTR. Meaningful configurations of the 3'UTR are determined using cleavage site intervals (CSIs) that lie between functional alternative polyadenylation (APA) sites. The functional APAs are defined using publicly available polyA-seq datasets biased to the site of polyadenylation. CSI-UTR can be applied to any RNASeq dataset, regardless of the 3' bias.
Harrison BJ, Park JW, Gomes C, Petruska JC, Sapio MR, Iadarola MJ, Rouchka EC. (2018) Detection of significantly different expressed cleavage site intervals within 3' untranslated regions using CSI-UTR. Under review.

Available as source code

DNA Motif Detection Using Particle Swarm Optimization and Expectation-Maximization

The Motif Swarm Algorithm aids in motif discovery, the process of discovering a meaningful pattern of nucleotides or amino acids that is shared by two or more molecules, is an important part of the study of gene function. In this work, we developed a hybrid motif discovery approach based upon a combination of Particle Swarm Optimization (PSO) and the Expectation-Maximization (EM) algorithm. In the proposed algorithm, we use PSO to generate a seed for the EM algorithm.
Hardin CT, Rouchka EC (2005). DNA Motif Detection Using Particle Swarm Optimization and Expectation-Maximization. Proc IEEE Swarm Intell Symp., 2005:181-184. (PMCID: 137489, PMID: 20436786)

Available as source code

Multiple Primer Design

MPrime is an interface which allows the effiicient high-throughput detection of multiple primers or oligonucleotides for genic regions in either the human, mouse, rat, zebrafish, or fruit fly genomes. In order to choose the regions of interest for primer or oligo design, you must choose the organism you are interested in, as well as the genic regions of interest. Genic regions can be identified by the gene name, GenBank or RefSeq accession, or by a keyword. Additionally, MPrime1.3 will now allow you to enter in fasta formatted sequences. Before primers are designed, you will be sent to a page that will allow you to select the genic regions you wish to use.
Rouchka EC, Khalyfa A, Cooper NGF. (2005) MPrime: efficient large scale multiple primer and oligonucleotide design for cutomized gene arrays. BMC Bioinformatics, 6:175. (doi:10.1186/1471-2105-6-175).

Available as a web interface

RBF-TSS: Identification of transcription start sites using radial basis functions

RBF-TSS is a novel identification method for identifying transcription start sites that improves upon published TSS detection models. RBF-TSS incorporates a metric feature based on oligonucleotide positional frequencies, taking into account the nature of promoters. A radial basis function network for identifying transcription start sites is created using non-overlapping chunks (windows) of size 50 and 500 on the human genome.
Mahdi RN, Rouchka EC. (2009) RBF-TSS: Identification of transcription start site in human using radial basis functions network and oligonucleotide positional freqeuncies. PLoS One, 4(3):e4878. (10.1371/journal.pone.0004878)

Available as source code

RenalDB: A Knowledge Database for Kidney RNA Expression

RenalDB is a tool designed to assist researchers in hypothesis-driven research of lncRNAs by allowing in silico screening of enriched/specific transcripts of humans, mice, and zebrafish with respect to nephrotic tissues and cells, developmental stages, and other metadata.
Weirick T, Militello G, Ponomareva Y, John D, Doring C, Dimmeler S, Uchida S. (2018) Logic programming to infer complex RNA expression patterns from RNA-seq data. Briefings in Bioinformatics 19(2):199-209. (PMID: 28011754; doi: 10.1093/bib/bbw117

Available as web interface

RNA Map Analysis and Plotting Server

rMAPS rMAPS is a web server that systematically generates RNA-maps for the analysis of RNA-binding proteins (RBPs) binding sites which have position-dependent functions. Users can easily perform analysis of binding sites around differential alternative splicing events for over 100 of known RBPs. rMAPS can also analyze CLIP-seq peaks around differential alternative splicing events to generate an RNA-map of CLIP-seq experiment.
Park JW, Jung S, Rouchka EC, Tseng Y, and Xing Y. rMAPS: RNA Map Analysis and Plotting Server for Alternative Exon Regulation. Nucleic Acids Research, 2016 (PMID:27174931; PMCID: PMC4987942; doi: 10.1093/nar/gkw410)

Available as web server

rMotifGen: Random Motif Generator for Genomic Sequences

rMotifGen is a solution with the sole purpose of generating a number of random DNA or amino acid sequences containing short sequence motifs. Each motif consensus can be either user-defined, or randomly generated. Insertions and mutations within these motifs are created according to user-defined parameters. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms.
Rouchka EC, Hardin CT (2007). rMotifGen: random motif generator for DNA and protein sequences. BMC Bioinformatics, 8:292. (doi:10.1186/1471-2105-8-292).

Available as a web interface and source code


UGAHash is an algorithm designed to provide a distributed method for assigning accession numbers to genomic features. The UGAHash website is designed to introduce the utility of UGAHash by providing resources for mapping human genomic features among various bioinformatic databases and versions of databases with a focus on lncRNAs.
Weirick, T., John, D., Uchida, S. (2016). . Briefings in bioinformatics, bbv067. (PMID: 26921280; doi: 10.1093/bib/bbw017)

Available as a web interface and source code