Overview

Total Integrated Archive of short-Read and Array (TIARA) database contains personal genomic information obtained from heterogeneous technologies including next generation sequencing (NGS) and ultra-high-resolution comparative genomic hybridization (CGH) arrays. This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels, and structural variants (SVs). Moreover, TIARA allows comparison of genomic variants between whole genome sequencing and transcriptome sequencing for matched samples as well as the features of allele specific gene expression and transcriptional base modifications (TBM), or RNA editing. All data are derived from the publications listed on the "Publications" page.

Getting Started

The User Interface


The user can specify the genomic region and individuals of interest for browsing in the Control Panel (A). Areas (B), (C), (D), (E) and (F) present, respectively, the RefSeq gene, SNPs, Integrative Multi-Omics Display Window, short indels and RDs from whole genome sequencing and transcriptome sequencing, respectively. Areas (G) and (H) present the CNV regions and log2 ratios from the high-resolution CGH array data, respectively. Once the user selects (or deselects) an individual genome data set, the personal genome data is displayed (or removed) in the display windows [(C), (D), (E), (F) and (G)].

[(A): Control panel, (B): RefSeq genes display window, (C) SNP display window, (D) Integrative Multi-Omics Display Window, (E) Indel display window, (F) Read Depth display window, (G) CNV region display window, (H) Log2 ratio display window]
(A) Control panel
1) Genomic region selection
The user can specify the human genome reference sequence (hg18 or hg19) genomic region and individual regions of interest for browsing. The "Clear Position" button erases the text in the genomic position box.


2) Variants selection
By clicking the check boxes for SNP, Indel, Read Depth (DNA), Read Depth (RNA) or Log2 ratio in the variants selection group (②), users can choose the genomic variants to be provided in display windows.
Because of the increase in data quantity and the addition of transcriptome sequence, full simultaneous display of genomic variants can be confusing. For ease of use, a 'Group by Variants/Samples' combo box offers the function of grouping by genomic variants or samples. The two figures to the right and below show the retrieval of genomic regions arranged by samples and variants, respectively, according to the user's selection. In particular, this is necessary to allow useful viewing of newly added transcriptome data.











































3) Retrievals by gene name
The 'GeneSearch' button from the original TIARA has been removed. Instead of this, the user can retrieve any gene information of his/her interest. The user can browse the genome data for a specific gene of their selection by typing it into the ‘Gene Name’ box. For example, the user can browse the TP53 gene locus as in the figure to the right. The "Clear Gene" button also removes the text in the 'Gene Name' text box. The 'GeneSearch' button of original TIARA has been removed. In stead of 'GeneSearch' activities, by keyboard typing, the user can retrieve any gene information of his/her interest. The user can browse the genome data for a specific gene by own selection. For example, the user can browse the TP53 gene locus like as right figure. The "Clear Gene" button also remove the text in 'Gene Name' text box.















4) Sample selection
The user can specify the sample name whose data they would like to browse in ③ DNA-Seq data sample list box, ④ RNA-Seq data sample list box and ⑤ CGH array sample list box. If you select or deselect individual genome data in this check-box set (③, ④ or ⑤), SNPs, Multi-Omics features, indels, and read depths for the personal genome data are displayed in or removed from the SNP display window (C), Integrative Multi-OMICS display window (D), Indel display window (E), and Sequence Read Depths for DNA-Seq and RNA-Seq (gene expression) display window (F), respectively.
Furthermore, the CNV lists and log2 ratios for the personal genomes are also displayed in or removed from the CNV regions window (G) and log2 ratio display window (H), respectively.
The buttons "Select All" and "De-select All" are located in ③ DNA-Seq data sample list box, ④ RNA-seq data sample list box and ⑤ CGH array sample list box for the user’s convenience. These buttons allow the user to select or de-select all individuals by one click.


5) Integrative viewing of Multi-Omics data


The Integrative Multi-Omics Display Window (D) displays instances of allele-specific expression as points colored green and TBMs as points colored purple at their corresponding genomic positions. Users may click on one of these points to receive information about the number of reads supporting reference and variant in whole genome sequencing and transcriptome sequencing and the statistical significance. The figure on the left, also shows the integrative viewing of genomic variants with allele specific expression in the gene SEC22B at the position 143,815,304 bp on chromosome 1.































6) Download XML document
The 'XMLDownload' button exports an XML document that contains structured information describing the SNPs, indels, RDs, and log2 ratios visualized in the genome browser. The downloaded XML document permits analysis of the selected genomic region using other genomic browsers or custom scripts of the user’s creation.











7) Reference error free log2 ratio
If you deselect or select the 'Absolute called' checkbox (⑥), the conventional log2 ratio relative to the CGH reference DNA (NA10851) or alternatively the reference-free log2 ratio is displayed in the log2 ratio display window (G) (Park H et al., Nature Genet. 2010 and Ju YS et al., Nucleic Acids Research 2010). For example, if studying individual AK1, when the reference (NA10851) has copy gain or copy loss, the Absolute call option provides corrected CNV calls. Users can also see the CNV results studied in (Conrad et al., Nature 2010).


8) Sample selection synchronization
When the check box 'Sample List Sync' is selected, clicking a sample in only one of ③ DNA-Seq data sample list box, ④ RNA-Seq data sample list box or ⑤ CGH array sample list box, will select all data types for that sample together.


(B) SNP display window
All SNPs detected across multiple individuals for a selected genomic region are displayed as points in the SNP display window. Homozygous and heterozygous SNPs are colored in blue and red, respectively. Users may click on one of these SNPs to receive information about the short read data. For example, the figure below shows the information on short reads for the SNP at the 74,583,581 bp position of chromosome 14. A comparison of multiple genomes provides clues as to the functional impact of each variant. For example, the SNP shown in the figure below appears to have a high frequency because five individuals have the SNP (homozygous SNPs for AK1 and AK6 and heterozygous SNPs for AK2, AK4, and NA10851).
A popup window is available to display the SNPs common to multiple individuals, as shown in the figure below.


(C) Indel display window
The start positions of indels are marked with a circle. As with SNPs, homozygous and heterozygous indels are colored in blue and red, respectively. Insertions are indicated by a filled circle and deletions are indicated by an open circle.


(D) Utilities

1) Gene Expression
The button 'Gene Expression' provides the below window. The expression levels of genes (RefSeq genes) are normalized by reads per kilobase of exon per million mapped reads (RPKM), as calculated in Ju et al. (2011). Clicking a gene name by the user will display the genomic variants in its region in each display window. Furthermore, users can extract specific data which they want to see by specifying the filter conditions. For RPKM, user can specify the expressed value with one of operators (>=, >, <=, or <). The number of genes which they'd like to retrieve can be also decided with the options 'Top' and 'Bottom'. This window also provides the number of query results. When a query is finished, the results can also be extracted by chromosomes and sorted by the average RPKM for all individuals using the ascending (▲) or descending (▼) button. The columns of retrievals are Gene Name, Chromosome, Start position of gene, Stop position of gene, Average RPKM of all individuals and RPKM of each individual.




















2) CNV Regions


We studied Asian-specific CNVs from 30 Asian samples in the previous publication by Park et al. (2010) and provided these to world-wide users through TIARA. The CNV segments called in each sample were merged into groups, termed 'CNV elements' (CNVEs), based on greater than 50% overlap between segments. TIARA contains the absolute copy number states for 5,177 Asian CNVEs. In this window, clicking any gene name by the user also displays the genomic variants of the region in a display window. The columns are the CNVR ID which was determined by GMI, Chromosome, Start position of CNVR, Stop position of CNVR, Log2ratio of each individual within CNVR and Gene annotation, respectively.




















3) Unknown transcripts
In study of Ju et al. 2011, to identify transcripts expressed at genomic loci previously not annotated as genes (unknown transcripts), we analyzed RNA short reads that could not be aligned to the cDNA sequences generated by our study. We aligned these unmapped reads to human genome sequences excluding known genic regions in the RefSeq, UCSC, Ensembl and GenBank databases. Furthermore, we removed short reads overlapping known expression sequence tags and transcripts detected in only one individual. After filtering, there were 4,414 unknown transcripts from at least two individuals, which do not overlap any known genes or pseudogenes. The columns are in-house index ID, chromosome, start position of unknown transcript, stop position of unknown transcript, individuals expressing unknown transcript, # of individuals, the nearest gene to the unknown transcript, the genomic distance between unknown transcript and nearest gene.


















Genomic Medicine Institute, Seoul National University College of Medicine, 28 Jongno-Gu, Yongon-Dong, Seoul 110-799, Korea