Streptomyces venezuelae transcription start sites
Download zip file of genome and track files for genome browsers
The raw data has been deposited in
ArrayExpress under
accession
number E-MTAB-10690.
RNA was prepared from 4 time points—10hr, 14hr, 18hr and 24hr to
represent the transition throughout development (Vegetative,
pre-sporulation, onset of sporulation and mid/late sporulation).
For details of how the cDNA libraries were constructed and sequenced
by Vertis please see the accompanying PDF.
Each time-point has the following files associated with it:
1 | xh_control_for.wig | Normalised coverage by control reads on the
forward strand |
2 | xh_control_rev.wig | Normalised coverage by control reads on the
reverse strand |
3 | xh_TSS_for.wig | Normalised coverage by TSS reads on the forward
strand |
4 | xh_TSS_rev.wig | Normalised coverage by TSS reads on the reverse
strand |
5 | xh_for_diff.wig | Result of subtracting 1. from 3. |
6 | xh_rev_diff.wig | Result of subtracting 2. from 4. |
7 | xh_for_lfc.wig | Result of subtracting log(2) of 1. from log(2) of
3. |
8 | xh_rev_lfc.wig | Result of subtracting log(2) of 2. from log(2) of
4. |
If you want to quickly look at the TSS of your favourite gene, the
best place to start is with the xh_for_diff.wig and xh_rev_diff.wig
files (#5 and #6). This gives you the reads that are only from 5'
triphosphate ends i.e. with the reads from monophosphate ends
subtracted from the overall normalised coverage. However if you want
to look at the data in more detail—select the xh_control_for/rev.wig
(#1 and #2) and xh_TSS_for/rev.wig (#3 and #4) files. This will
allow you to view the contribution of RNA ends with monophosphates
(resulting from mRNA degradation and RNAs that are not transcribed)
versus those with triphosphates (genuine TSSs). The xh_for/rev_lfc.wig
files (#7 and #8) are equivalent to the xh_for/rev_diff files but
have been transformed by log(2). This can allow you to more easily
view transcripts of varying abundance across the genome. However, you
can still do this by changing the scale in IGB using the
xh_for/rev_diff.wig files (see below).
Instructions/tips for viewing the data
- The 5' triphosphate end-capture RNA-seq data has been mapped
onto the new vnz genome sequence (NZ_CP018074.1).
- The best way to view the data is by using
Integrated Genome Browser
although other packages are available. It is free to download.
- To open the reference genome, go to "Open Genome" and select the
appropriate .fna file. Go to "Open file" and select the appropriate
.bed file.
- Zoom into a specific region of the chromosome and then open the
RNAseq data you want to look at via "Open file". You will need to
select each file in turn and set the display to "load
Genome" ("Data Access" tab).
- Once loaded, you can toggle on/off whether a given track is
visible. Each data file will need the peak height adjusting so that
they are all the same i.e. comparable. Go to the "Graph" tab and
choose a max peak height. It may be helpful to decrease this every so
often to identify less abundant transcripts. Likewise, if a transcript
is highly abundant, increasing the max peak height may allow you to
detect differences between the time-points throughout
development.
- The sequence from each 5' triphosphate end is approximately 60
bp, so if you see something wider, it means there are 2 promoters
within 60 bp of each other
- Because of the way the protocol works (see PDF), the data is
strand-specific. Therefore if you want to look at transcripts in the
Forward direction, using the "for" files. If you want to look at
transcripts in the Reverse direction, using the "rev" files.
- You should get nucleotide resolution—the first base on the
upstream side is the first base of the transcript.
- You can change the colours of the tracks in the "Data
Access"
tab—select "FG" for each sample and then choose the colour you
want.
- If you want to view a specific gene, go to "Advanced
Search" and
type (or paste) the locus tag you want to see. Remember to use the vnz
locus tags.
- If you want to visualise the nucleotide sequence in any
particular region, zoom into the region and select "Load
Sequence".
The nucleotide sequence should then appear below the genome
coordinates (colour coded by nucleotide).
Matt Bush
June 2021