vnzTSS

Streptomyces venezuelae transcription start sites

Download zip file of genome and track files for genome browsers

The raw data has been deposited in ArrayExpress under accession number E-MTAB-10690.

RNA was prepared from 4 time points—10hr, 14hr, 18hr and 24hr to represent the transition throughout development (Vegetative, pre-sporulation, onset of sporulation and mid/late sporulation). For details of how the cDNA libraries were constructed and sequenced by Vertis please see the accompanying PDF.

Each time-point has the following files associated with it:

1	xh_control_for.wig	Normalised coverage by control reads on the forward strand
2	xh_control_rev.wig	Normalised coverage by control reads on the reverse strand
3	xh_TSS_for.wig	Normalised coverage by TSS reads on the forward strand
4	xh_TSS_rev.wig	Normalised coverage by TSS reads on the reverse strand
5	xh_for_diff.wig	Result of subtracting 1. from 3.
6	xh_rev_diff.wig	Result of subtracting 2. from 4.
7	xh_for_lfc.wig	Result of subtracting log(2) of 1. from log(2) of 3.
8	xh_rev_lfc.wig	Result of subtracting log(2) of 2. from log(2) of 4.

If you want to quickly look at the TSS of your favourite gene, the best place to start is with the xh_for_diff.wig and xh_rev_diff.wig files (#5 and #6). This gives you the reads that are only from 5' triphosphate ends i.e. with the reads from monophosphate ends subtracted from the overall normalised coverage. However if you want to look at the data in more detail—select the xh_control_for/rev.wig (#1 and #2) and xh_TSS_for/rev.wig (#3 and #4) files. This will allow you to view the contribution of RNA ends with monophosphates (resulting from mRNA degradation and RNAs that are not transcribed) versus those with triphosphates (genuine TSSs). The xh_for/rev_lfc.wig files (#7 and #8) are equivalent to the xh_for/rev_diff files but have been transformed by log(2). This can allow you to more easily view transcripts of varying abundance across the genome. However, you can still do this by changing the scale in IGB using the xh_for/rev_diff.wig files (see below).

Instructions/tips for viewing the data

The 5' triphosphate end-capture RNA-seq data has been mapped onto the new vnz genome sequence (NZ_CP018074.1).
The best way to view the data is by using Integrated Genome Browser although other packages are available. It is free to download.
To open the reference genome, go to "Open Genome" and select the appropriate .fna file. Go to "Open file" and select the appropriate .bed file.
Zoom into a specific region of the chromosome and then open the RNAseq data you want to look at via "Open file". You will need to select each file in turn and set the display to "load Genome" ("Data Access" tab).
Once loaded, you can toggle on/off whether a given track is visible. Each data file will need the peak height adjusting so that they are all the same i.e. comparable. Go to the "Graph" tab and choose a max peak height. It may be helpful to decrease this every so often to identify less abundant transcripts. Likewise, if a transcript is highly abundant, increasing the max peak height may allow you to detect differences between the time-points throughout development.
The sequence from each 5' triphosphate end is approximately 60 bp, so if you see something wider, it means there are 2 promoters within 60 bp of each other
Because of the way the protocol works (see PDF), the data is strand-specific. Therefore if you want to look at transcripts in the Forward direction, using the "for" files. If you want to look at transcripts in the Reverse direction, using the "rev" files.
You should get nucleotide resolution—the first base on the upstream side is the first base of the transcript.
You can change the colours of the tracks in the "Data Access" tab—select "FG" for each sample and then choose the colour you want.
If you want to view a specific gene, go to "Advanced Search" and type (or paste) the locus tag you want to see. Remember to use the vnz locus tags.
If you want to visualise the nucleotide sequence in any particular region, zoom into the region and select "Load Sequence". The nucleotide sequence should then appear below the genome coordinates (colour coded by nucleotide).

Matt Bush
June 2021