Absolute ID Conversion Tools


Tutorials and Frequently asked questions (FAQs)




How to set the range search parameters

Since we are working with intervals and each and every identifiers are reduced to their genomic coordinates, it is better to keep these information in range tree or interval tree data structure which is an augmented red-black tree. The reason for storing into interval tree is efficiency. Bioconductor's IRanges package deal with this very efficiently. We adoped this implemetation and the inpur parameters are same as the input parameters of the "findOverlaps: function in IRanges package. We considered only four such parameters which we considered relevant. “findOverlaps” function finds all the intervals that overlap a “query” interval.

Various parameters can be interpreted as follows:

Type of overlap (type): By specifying the “type” parameter, one can select for specific types of overlap. By default, “any” overlap is accepted. If “type” is “start” or “end” then only those intervals are reported that have matching start or end location. If “equal&rdquo is specified then the “start” and “end” both matches. If “within” is specified then all those subjects are reported that wholly contain the query.

maximum gap (maxgap): Intervals with a separation of ‘maxgap’ or less are considered overlapping. It can take non-negative integers. Default is 0.

minimum overlap (minoverlap): Two intervals are considered overlap if at least “minoverlap” characters overlap. “minoverlap” takes positive interger values. Default is 1.

Which overlap do you want in your result (select):
When “select” is "all", then all overlapping intervals are reported. When “select” is “first” then only the first overlapping intervals in subject is reported. For “last” and “arbitrary” values only the last and arbitrary overlapping intervals in subject is reported.

Back to top


Mapping parameters

Only few of the Bowtie alignment parameters are being used here. For other parameters in Bowtie, it is better if you download the Bowtie and run as standalone program. We considered three important parameters and they are described below:

Mismatches allowed (v): Bowtie will report all alignments with at most v mismatches. Default value in the tool is assumed to be 0 or no mismatch. Completely conserved alignments will be reported. Gapped alignments are not considered as it is not supported by Bowtie. When the v value is set at 3, the execution will be slower as backtrackings are performed to look for alignments.

Reporting the alignments:
With this option, one can flexibly select which alignments to be reported. “All” option report all valid alignments determined by the alignment policies described by other parameters. The alignment are not any specific order. However id “All Best” is selected then the output will be all valid alignments in best-to-worst order. IF “k” is selected then Bowtie will report up to k valid alignments per read. IF “k Best” is selected then those k alignments will be reported in best-to-worst order. Bowtie is slower when the value of k is large and/or “best” option is select.

Do not report (m) - Refrain from reporting alignments
Choosing and entering this option a value will suppress all alignments for a particular read if more than m reportable alignments exist for it. Reportable alignments are those that would be reported according to the given parameters. Default value is -1 that means no limit.

Back to top


How to check the array type of identifiers

Steps

1. Select Organism (Required).

2. Enter your list of identifiers in the text box. Optionally you can also upload a file containing these identifiers.

Once submit button is pressed, and everything is went find the output will be shown something like the figutre below. Your identifiers are queried into mysql table to look for its type. The message shows that how many of the identifiers were submitted and how many were unique. Out of those unique identifiers how many are found in different array types. You can ignore the prefix mouse and mm9 as they are internal database table names. The output below shows that out of 115 identifiers, 7 are found in Affymetrix exon array of mouse (MoEx_1_0_st_v1), 15 mapped to Entrez ID and 71 mapped to Gene names.

Back to top


How to do ID conversion when identifiers are given as input

Steps to process the input

1. Select Organism and Genome Version (Required).

2. Select Type of your input (Required).

3. Select Target types. Which output type you want your inputs to be converted into? (Required).

4. Interval range parameters are optional. Default is set. If you want to change these parameter then click ‘+’ and change the values.

5. Enter your list of identifiers in the text box. Optionally you can also upload a file containing these identifiers (Required).

Error will be shown if some fields are empty or not mentioned. You can either input text or can upload file, NOT BOTH

An example input may be as follows: (Copy + paste the following text in the in the textbox to see the result)

TYR LGALS3 uchl1 HXB CD24 SERPINA1 COL11A1 NNMT S100A11 CD9 HLA-DRB1 SPP1 COL5A2 PLAU FN1 PRSS11 MCAM PFN2 LGALS3BP ANXA1 497097 100038975 664792 619785 19888 20671 664830

OR

tyr Xkr4 LOC100418032 Gm7341 LOC100503874 Gm10568 Gm6101 Rp1 Sox17 Gm7357 Gm7369 Gm6085 Gm6119 Gm2053 Gm6123 Mrpl15 Lypla1 LOC100503730 Tcea1 Gm6104 Rgs20 LOC100417473 Atp6v1h Gm7182 LOC100418201 Oprk1 Npbwr1 LOC100418441 Rb1cc1 Gm2147 Gm7417 HXB
LGALS3
uchl1
HXB
CD24
SERPINA1
COL11A1
NNMT
S100A11
CD9
HLA-DRB1
SPP1
COL5A2
PLAU
FN1
PRSS11
MCAM
PFN2
LGALS3BP
ANXA1
497097
100038975
664792

619785
19888
20671
664830

Initially, the input screen while converting intervals to corresponding identifiers looks like below:

When all the required inputs are selected and/or supplied, and the submit button is pressed, AbsIDconvert check if all the input types are in required format or not. If there is no any error, and you get your input shown in separate cell (as shown below) and that shows that the given input format is correct.

Now, you can press the “Process Input! ” and wait for result. The wait time is directly proportional to the number of output type you want. For quick result you can select less number of output type. There is no any limit on the file size however if you want to perform high throughput data analysis, it is better to download the virtual machine version or the complete code to run on local machine.

If no any error is reported then you will see output similar to the one below. You can download the ID conversion result or view in the browser itself.

1. “Genomic intervals” contains the genomic intervals that are mapped from your input. The fiels are “seqnames” which is same as “space”, “start”, “end”, “width”, “strand”, and “name”. An example genomic interval output is shown in the figure below.

In the table:
seqnames : represents the chromosome number
start: represent the start location of the identifier on space (chromosome)
end : end location on space (chromosome)
width : width of the input sequence
strand: strandedness
name : name of your identifier.

2. Clicking any of the buttons in “View converted ” column will result in viewing the conversion result in the browser itself as a paginated table as shown in figure below. This file can be downloaded by “Save+As” the save image on the top. It also contain the proper link to its authoritative database.

3. Links in “Download raw file as overlapping intervals ” has all the information regarding input intervals and the corresponding target intevals. The raw overlapping file looks like as follows. Each row corresponds to an identifier mapped to some other identifier of required output type. You can see two fields with space, start, end, width, strand and name. The first of these values corresponds to the input while the second one denoted the is the mapped identifier and its location on the genome. This file is useful in showing the result on track and we are working on getting this done.

4. “View consolidated result ” shows the conversion result at one place as table. There are links to the authoritative websites for the corresponding identifiers. The file can also be downloaded from ‘Save As ’ The screenshot looks loke as follows:

5. You can also view the conversion result as custom track on UCSC genome browser. You need to select the identifier type taht you want to be included in the custom annotation file and AbsIDconvert will generate those that can be downloaded for later use.

6. It is better if you download all your data on your local machine. You can also delete all your analysis result by pressing the “Free memory” button. It will delete all the datafiles created during analysis. We run program to delete all those files which are older than a day.


Pl. contact us for comment, feedback or any bug."

Back to top


Intervals (Genomic Coordinates) as query

Steps to process the intervals

1. Select Organism and genome Version ( Required ).

2. Select Target types. You want your identifiers to be converted to these types. ( Required ).

3. Interval range parameters are optional. Default is set. If you want to change these parameter then click ‘+’ and change the values.

4. Enter your list of intervals in the text box. The intervals MUST be in either in comma separated, tab delimited or standard interval format and must contain fields space, start, end. Name and strand fields are optional, if not supplied, AbsIDconvert will include the name automatically as "space_start_end" and strand as "*" (not known). you can also upload a file containing these intervals. Standard interval format can not have more than three compulsory fields. ( Required )

Allowed file types are ‘txt’, ‘csv’, ‘tsv’, ‘text’, and ‘range’

Example intervals may be similar to any of the following types:

Either a comma separated or tab delimited file with ‘space’ , ‘start’ and ‘end’ as compulsory fields. Also include fields at the top. ‘name field is optional’, if not supplied then AbsIDconvert append name to each interval containing all the compulsory fields separated by ‘_’ ‘space_start_end’

Comma Separated
"space","start","end"
"chr1",3034961,3034986
"chr1",3034996,3035021
"chr1",3042608,3042633
"chr2",3042620,3042645
"chr2",3045437,3045462
"chr1",3045565,3045590
"chr9",3045705,3045730
"chr1",3045794,3045819
"chr1",3045890,3045915
"chrX",3046807,3046832
"chrY",3046862,3046887
"chr10",3047029,3047054
"chr12",3047077,3047102
Comma separated file
"probeName","probeSize","strand","space","start","end","name"
"1456340_at",25,"-","chr1",3034961,3034986,"1456340_at:363:39"
"1456340_at",25,"-","chr1",3034996,3035021,"1456340_at:58:317"
"1434615_x_at",25,"+","chr1",3042608,3042633,"1434615_x_at:337:707"
"1434615_x_at",25,"+","chr1",3042620,3042645,"1434615_x_at:75:201"
"1437867_at",25,"+","chr1",3045437,3045462,"1437867_at:491:209"
"1437867_at",25,"+","chr1",3045565,3045590,"1437867_at:476:671"
"1437867_at",25,"+","chr1",3045705,3045730,"1437867_at:672:355"
"1437867_at",25,"+","chr1",3045794,3045819,"1437867_at:383:617"
"1437867_at",25,"+","chr1",3045890,3045915,"1437867_at:401:341"
"1437534_at",25,"+","chr1",3046807,3046832,"1437534_at:692:185"
"1437534_at",25,"+","chr1",3046862,3046887,"1437534_at:256:703"
"1437534_at",25,"+","chr1",3047029,3047054,"1437534_at:142:705"
"1437534_at",25,"+","chr1",3047077,3047102,"1437534_at:202:437"
"1437534_at",25,"+","chr1",3047096,3047121,"1437534_at:375:499"
"1437534_at",25,"+","chr1",3047099,3047124,"1437534_at:224:417"
"1437534_at",25,"+","chr1",3047113,3047138,"1437534_at:21:671"
"1437534_at",25,"+","chr1",3047123,3047148,"1437534_at:425:505"
"1437534_at",25,"+","chr1",3047163,3047188,"1437534_at:340:3"
Tab delimited file
spacestartend
chr130349613034986
chr130349963035021
chr130426083042633
chr230426203042645
chr230454373045462
chr130455653045590
chr930457053045730
chr130457943045819
chr130458903045915
chrX30468073046832
chrY30468623046887
chr1030470293047054
chr1230470773047102
Tab delimited
probeNameprobeSizestrandspacestartendname
1456340_at25-chr1303496130349861456340_at:363:39
1456340_at25-chr1303499630350211456340_at:58:317
1434615_x_at25+chr1304260830426331434615_x_at:337:707
1434615_x_at25+chr1304262030426451434615_x_at:75:201
1437867_at25+chr1304543730454621437867_at:491:209
1437867_at25+chr1304556530455901437867_at:476:671
1437867_at25+chr1304570530457301437867_at:672:355
1437867_at25+chr1304579430458191437867_at:383:617
1437867_at25+chr1304589030459151437867_at:401:341
1437534_at25+chr1304680730468321437534_at:692:185
1437534_at25+chr1304686230468871437534_at:256:703
1437534_at25+chr1304702930470541437534_at:142:705
1437534_at25+chr1304707730471021437534_at:202:437
1437534_at25+chr1304709630471211437534_at:375:499
1437534_at25+chr1304709930471241437534_at:224:417
1437534_at25+chr1304711330471381437534_at:21:671
1437534_at25+chr1304712330471481437534_at:425:505
1437534_at25+chr1304716330471881437534_at:340:3
Standard Interval format
space:start-end
chr1:3034961-3034986
chr1:3034996-3035021
chr1:3042608-3042633
chr2:3042620-3042645
chr2:3045437-3045462
chr1:3045565-3045590
chr9:3045705-3045730
chr1:3045794-3045819
chr1:3045890-3045915
chrX:3046807-3046832
chrY:3046862-3046887
chr10:3047029-3047054
chr12:3047077-3047102

IF your input is not in the required format you can convert some of the other formats in the required format the file format converter tools on this website.

Initially, the input screen while converting intervals to corresponding identifiers looks like below:

When all the required inputs are selected or supplied, and the submit button is pressed, AbsIDconvert check if all the input types are in required format or not.

If there is no any error, and you get output with each element in separate cell (as shown below) and that shows that the given input format is correct.

Now, you can press the “Process Input! ” and wait for result. The wait time is directly proportional to the number of output type you want. For quick result you can select less number of output type. There is no any limit on the file size however if you want to perform high throughput data analysis, it is better to download the virtual machine version or the complete code to run on local machine.

If no any error is reported then you will see output similar to the one below. You can download the ID conversion result or view in the browser itself.

Pressing the “ perform ID conversion ” will find all the target identifiers that you selected for the aligned intervals. The output will look as below.

1. “Genomic intervals” contains the genomic intervals that are mapped from your input. The fiels are “seqnames” which is same as “space”, “start”, “end”, “width”, “strand”, and “name”. An example genomic interval output is shown in the figure below.

In the table:
seqnames : represents the chromosome number
start: represent the start location of the identifier on space (chromosome)
end : end location on space (chromosome)
width : width of the input sequence
strand: strandedness
name : name of your identifier.

2. Clicking any of the buttons in “View converted ” column will result in viewing the conversion result in the browser itself as a paginated table as shown in figure below. This file can be downloaded by “Save+As” the save image on the top. It also contain the proper link to its authoritative database.

3. Links in “Download raw file as overlapping intervals ” has all the information regarding input intervals and the corresponding target intevals. The raw overlapping file looks like as follows. Each row corresponds to an identifier mapped to some other identifier of required output type. You can see two fields with space, start, end, width, strand and name. The first of these values corresponds to the input while the second one denoted the is the mapped identifier and its location on the genome. This file is useful in showing the result on track and we are working on getting this done.

4. “View consolidated result ” shows the conversion result at one place as table. There are links to the authoritative websites for the corresponding identifiers. The file can also be downloaded from ‘Save As ’ The screenshot looks loke as follows:

5. You can also view the conversion result as custom track on UCSC genome browser. You need to select the identifier type taht you want to be included in the custom annotation file and AbsIDconvert will generate those that can be downloaded for later use.

6. It is better if you download all your data on your local machine. You can also delete all your analysis result by pressing the “Free memory” button. It will delete all the datafiles created during analysis. We run program to delete all those files which are older than a day.


Pl. contact us for comment, feedback or any bug.

Back to top