The primary RNA-Seq dataset included in the public release of POTAGE is the
Chinese Spring tissue series dataset;
Non-oriented library (TruSeq, Illumina) sequenced on Illumina HiSeq2000 2x100bp (PE) for 15 different conditions corresponding to five wheat organs (root, leaf, stem, spike, grain) at three developmental stages each in 2 duplicates
|
Stage | Wheat growth stage | Feekes scale | Zadoks scale | Leaves | Root | Stem | Spike | Grain |
Seedling | First leaf through coleoptile | 1 | 10 | x | x | | | |
Three leaves | 3 leaves unfolded | | 13 | | x | | | |
Three tillers | Main shoot and 3 tillers | | 23 | x | | | | |
Spike at 1 cm | Pseudostem erection | 5 | 30 | | | x | | |
Two nodes | 2nd detectable node | 7 | 32 | | | x | x | |
Meiosis | Flag leaf ligule and collar visible | 9 | 39 | | x | | x | |
Anthesis | 1/2 of flowering complete | | 65 | | | x | x | |
2 DAAs |
(50°C.days)" | Kernel (caryopsis) watery ripe | | 71 | x | | | | x |
14 DAAs |
(350°C.days)" | Medium Milk | | 75 | | | | | x |
30 DAAs |
(700°C.days)" | Soft dough | | 85 | | | | | x |
|
The computation of the expression values for the main Chinese Spring dataset available in POTAGE is described in
this paper. The relevant paragraph is included below:
To obtain gene expression information for genes located in the QTL region we used the PopSeq map for bread wheat
together with the high confidence (HC) gene predictions described in IWGSC (2014) and the publicly available
RNA-Seq data for cv. Chinese Spring.
This dataset covers five different organs (root, leaf, stem, spike, grain) at three developmental stages, each in two replicates.
We used the HC gene predictions,
version 2.1
as reference for mapping the RNA-Seq reads. The reference was prepared by extracting the genomic sequence for each of the predicted genes
with up to 2kb upstream and downstream bases whenever available from the corresponding IWGSC-CSS contig.
The coordinates in the corresponding GTF/GFF transcript annotations file were adjusted accordingly.
RNA-Seq reads were quality-, adapter- and length-trimmed using Trimmomatic (Bolger et al., 2014), version 0.30,
with a custom list of adapter sequences and the following settings: ‘ILLUMINACLIP:adapters.fa:1:6:6 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:6 MINLEN:60’.
After indexing the reference using Bowtie2 (Langmead and Salzberg, 2012) version 2.2.1,
trimmed reads were aligned to the reference using TopHat (Kim et al., 2013) version 2.2.1, not allowing any mismatches or indels.
Paired reads were required to map concordantly (--no-discordant setting) to the same (--no-mixed setting) reference sequence.
BAM files for the biological replicates were merged, apart from a single tissue/stage sample (spike_Z39) where only one sample was available.
Expression was quantified by Cufflinks (Roberts et al., 2011) version 2.1.1 utilizing (through the --GTF option)
the adjusted version of the reference transcript annotations provided with the HC gene predictions.
The remaining settings were left at their defaults, except for: --max-multiread-fraction 1, --frag-len-std-dev 50 --max-intron-length 5 000.
FPKMs (fragments per kilobase of exon per million fragments mapped) per gene (rather than per isoform) were extracted and aggregated in tabular form.