The testing data set was extracted as the set of all new genes from dbTSSv5 which is based on hg17 and did not appear in dbTSSv4 (training & validation data). Genes that have more than a 30% mRNA overlap are removed from consideration. The complete list of the 1024 testing genes were taken from (http://www.fml.tuebingen.mpg.de/raetsch/projects/arts) We downloaded the corresponded sequences from (http://genome.ucsc.edu). for every gene on the positive strand was downloaded as: Min( start, tss) -2000 ----> end+1000 While from the negative strand the reverse complement was downloaded as: Max(end, tss)+2000 ----> start -1000. Note: Training and validation dataset was downloaded from (http://www.fml.tuebingen.mpg.de/raetsch/projects/arts). More details are available in the published paper.