header.jpg
Plant IsomiR Atlas

1) IsomiR Identification

Plant IsomiR Atlas (PIA) is a database depositing isomiRs identified from plant landscape. For this version v1.0, PIA deposits 196,829 unique isomiR signatures (98,734 unique isomiR sequences) identified from 6,167 plant miRNA hairpins by using 667 Illumina small RNA sequencing datasets of 23 species, whose genomes, primary transcripts and annotation information are mostly from phytozome, except those of Nelumbo nucifera, which are from lotus-db (Table 1).

Table 1 Genome backgrounds we used for isomiR identification
Species Source Genome Transcript GO Background
Amborella trichopoda PhytozomeV11 Atrichopoda_291_v1.0.fa.gz Atrichopoda_291_v1.0.transcript_primaryTranscriptOnly.fa.gz Atrichopoda_291_v1.0.annotation_info.txt
Arabidopsis lyrata PhytozomeV11 Alyrata_107_v1.fa.gz Alyrata_107_v1.0.transcript_primaryTranscriptOnly.fa.gz Alyrata_107_v1.0.annotation_info.txt
Arabidopsis thaliana PhytozomeV11 Athaliana_167_TAIR9.fa.gz Athaliana_167_TAIR10.transcript_primaryTranscriptOnly.fa.gz Athaliana_167_TAIR10.annotation_info.txt
Brachypodium distachyon PhytozomeV10 Bdistachyon_283_assembly_v2.0.fa.gz Bdistachyon_283_v2.1.transcript_primaryTranscriptOnly.fa.gz Bdistachyon_283_v2.1.annotation_info.txt
Brassica rapa PhytozomeV11 BrapaFPsc_277_v1.fa.gz BrapaFPsc_277_v1.3.transcript_primaryTranscriptOnly.fa.gz BrapaFPsc_277_v1.3.annotation_info.txt
Carica papaya PhytozomeV11 Cpapaya_113_r.Dec2008.fa.gz Cpapaya_113_ASGPBv0.4.transcript_primaryTranscriptOnly.fa.gz Cpapaya_113_ASGPBv0.4.annotation_info.txt
Citrus clementina PhytozomeV11 Cclementina_182_v1.fa.gz Cclementina_182_v1.0.transcript_primaryTranscriptOnly.fa.gz Cclementina_182_v1.0.annotation_info.txt
Citrus sinensis PhytozomeV11 Csinensis_154_v1.fa.gz Csinensis_154_v1.1.transcript_primaryTranscriptOnly.fa.gz Csinensis_154_v1.1.annotation_info.txt
Glycine max PhytozomeV11 Gmax_275_v2.0.fa.gz Gmax_275_Wm82.a2.v1.transcript_primaryTranscriptOnly.fa.gz Gmax_275_Wm82.a2.v1.annotation_info.txt
Gossypium raimondii PhytozomeV11 Graimondii_221_v2.0.fa.gz Graimondii_221_v2.1.transcript_primaryTranscriptOnly.fa.gz Graimondii_221_v2.1.annotation_info.txt
Malus domestica PhytozomeV11 Mdomestica_196_v1.0.fa.gz Mdomestica_196_v1.0.transcript_primaryTranscriptOnly.fa.gz Mdomestica_196_v1.0.annotation_info.txt
Manihot esculenta PhytozomeV11 Mesculenta_305_v6.fa.gz Mesculenta_305_v6.1.transcript_primaryTranscriptOnly.fa.gz Mesculenta_305_v6.1.annotation_info.txt
Medicago truncatula PhytozomeV11 Mtruncatula_285_Mt4.0.fa.gz Mtruncatula_285_Mt4.0v1.transcript_primaryTranscriptOnly.fa.gz Mtruncatula_285_Mt4.0v1.annotation_info.txt
Nelumbo nucifera http://lotus-db.wbgcas.cn/ Lotus_mega_gap1M.fa lotus_marker_all_mega.gff.CDS gene.go.tbl
Oryza sativa PhytozomeV11 Osativa_323_v7.0.fa.gz Osativa_323_v7.0.transcript_primaryTranscriptOnly.fa.gz Osativa_323_v7.0.annotation_info.txt
Populus trichocarpa PhytozomeV11 Ptrichocarpa_210_v3.0.fa.gz Ptrichocarpa_210_v3.0.transcript_primaryTranscriptOnly.fa.gz Ptrichocarpa_210_v3.0.annotation_info.txt
Setaria italica PhytozomeV11 Sitalica_312_v2.fa.gz Sitalica_312_v2.2.transcript_primaryTranscriptOnly.fa.gz Sitalica_312_v2.2.annotation_info.txt
Solanum lycopersicum PhytozomeV11 Slycopersicum_225_iTAGv2.40.fa.gz Slycopersicum_225_iTAGv2.3.transcript_primaryTranscriptOnly.fa.gz Slycopersicum_225_iTAGv2.3.annotation_info.txt
Solanum tuberosum PhytozomeV11 Stuberosum_206_v3.fa.gz Stuberosum_206_v3.4.transcript_primaryTranscriptOnly.fa.gz Stuberosum_206_v3.4.annotation_info.txt
Sorghum bicolor PhytozomeV11 Sbicolor_313_v3.0.fa.gz Sbicolor_313_v3.1.transcript_primaryTranscriptOnly.fa.gz Sbicolor_313_v3.1.annotation_info.txt
Triticum aestivum PhytozomeV11 Taestivum_296_v2.fa.gz Taestivum_296_v2.2.transcript_primaryTranscriptOnly.fa.gz Taestivum_296_v2.2.annotation_info.txt
Vitis vinifera PhytozomeV11 Vvinifera_145_Genoscope.12X.fa.gz Vvinifera_145_Genoscope.12X.transcript_primaryTranscriptOnly.fa.gz Vvinifera_145_Genoscope.12X.annotation_info.txt
Zea mays PhytozomeV11 Zmays_284_AGPv3.fa.gz Zmays_284_5b+.transcript_primaryTranscriptOnly.fa.gz Zmays_284_5b+.annotation_info.txt

The species-specific hairpin and mature sequences used for isomiR identification are from miRBase and Plant Non-coding RNA Database. We integrated miRNAs sequences in these two databases and removed the redundancy. Most datasets are from the NCBI SRA database and transformed into collapsed FASTA files using in-house Perl script which calls cutadapt to remove adapter sequences accurately. Clean reads are then used for isomiRs identification by a Perl script of modified isomiR2Function called isomiRIden. Briefly, sequenced reads and canonical miRNAs are mapped on species-specific pre-miRNAs allowing no mismatch. By comparing mapping information, templated isomiRs as well as their relative position to canonical miRNAs are identified. After that, reads not mapped on precursors are mapped on genome allowing no mismatch. Reads not mapped on the genome are then been mapped on species-specific pre-miRNAs again allowing two mismatches. By comparing mapping information, non-templated isomiRs as well as their relative position to canonical miRNAs and mismatch positions are identified. Finally, by analyzing the identified information, isomiRs are indexed and accurately classified into different categories (Figure 1). You can read our paper " isomiR2Function: An Integrated Workflow for Identifying MicroRNA Variants in Plants" for more information.

Figure 1 Workflow of plant isomiR identification
workflow.jpg

2) Structure of Plant IsomiR Atlas

PIA was implemented in MySQL, PHP, JavaScript and Perl. Anyone can access this database totally free. The MySQL database of PIA consists of seven tables which are seq, hairpin, exist, evidence, isomiR_alignment, mature_alignment, precursor_alignment, targetfinder, psRNATarget, miRNA_targetfinder and datasets. database is an independent table (Figure 2) which stores the information of datasets. Relation between other tables and their column information are shown in the following Figure 3, where 'P' is for primary key, 'I' is for indexed and 'F' is for foreign key. The same name columns in different columns with a 'F' reference to same name column without 'F'. For example, seq.ID is the foreign key of tables evidence.ID, hairpin.ID, targetfinder.ID and psRNATarget.ID.

Figure 2 Column information of datasets table
datasets.png.jpg
Figure 3 Database structure
structure.png