polyadq instructions

Sequences

To submit sequences to polyadq, paste them into the text box on the form. The sequences must be in FASTA format, e. g.:

>sequence header line
CTTTATCTGAAACTTGATTGTCTTAAATGTATTTGTGGAGAAATAAAATTATTGTATATT
TTGTGTAACAGAATCAGTGAGAATAAGCTGGTCGCAAACTGTCTTGCCTAGAGAGAGGGC
GCCTCCCAAAGTGCTGAGATTACAGGCGTGAGTCACTGCACCCAGCCTTGG
>another sequence
TTTGAGACGGAGTCTCACTCTGCCGCCCAGGCTGGAGTGCAGTGGAACGATCTCAGCTCA
CTGCAACCTCCACTTCCTGGGTTCAAGCCATTCTCCTGCCTCAGCCTCCCTAGTAGCTGG
GATTACAGGCGCCCACCACCACACCTGGCTAATTTTTGTGTGTTTTTGGTAGAGACAGGG
etc...
While the header line may be any length, polyadq will only print the first 80 characters as an identifier. The sequence lines may be any length. Whitespace and non-alphabetic characters in the sequence are ignored. The sequence may consist of A, C, G, T, and U characters. Other letters are retained as placeholders, but are disregarded by the various scoring functions. There may be multiple sequences in the sequence file.

Due to circumstances beyond my control, uploading files via multipart/form-data encoding is not available.

Options

polyadq decides whether a given AATAAA or ATTAAA hexamer is a true polyA signal by comparing the hexamer's QDF score to a cutoff value. Hexamers scoring above the cutoff are reported as true signals. There are separate cutoff scores for AATAAA and ATTAAA hexamers.

The form provides four methods for setting polyadq's cutoff scores:

Default: By default, polyadq's cutoffs are set to the levels that have given the best performance in our tests. These are good for about 64% sensitivity, 83% specificity, and a correlation coefficient of 0.512. Your mileage may vary.

Set sensitivity level: polyadq's cutoffs will be set to the levels that (in our tests) give the best specificity at the approximate sensitivity level that you specify...

Set specificity level: ...or vice versa.

Set cutoffs: This option lets you set the cutoff levels directly. Not too useful.

The performance statistics used here are only estimates, based on our tests of the program! These tests were performed on a set of (hopefully representative) known polyA signals. Obviously, pathological data sets could be constructed that cause polyadq to perform much worse (or much better). Caveat user.

Output

The output from polyadq looks like this:
Sequence header:
(sequence identifier)

Prediction Site Sequence Score
neg 14426 ATTAAA 0.093228
neg 15153 AATAAA 0.150803
POS 16335 AATAAA 0.381718
POS 17322 AATAAA 0.669579

2 sites found out of 4 considered


For each AATAAA or ATTAAA found in a sequence, a prediction is reported. The prediction line starts with POS or neg, depending respectively on whether polyadq thinks it has found a true polyA signal or not. The position of the signal and its sequence is then listed, followed by its score. Note that the scores assigned to AATAAA and ATTAAA signals are produced by different discriminant functions, and therefore are NOT comparable. So, while you can use the assigned scores to help decide which of two positive AATAAA calls is "better", you can't use them to choose between a positive AATAAA call and a positive ATTAAA call (but as a rule of thumb, choose the AATAAA signal unless it's really weak and the ATTAAA is really strong) (but don't quote me on that).

After the individual site predictions, polyadq reports the number of sites it has found and the number considered in a sequence. At the bottom of the output, polyadq reports the total number of sites found and considered in the whole input sequence file.

Reference

J. E. Tabaska and M. Q. Zhang (1999). Detection of polyadenylation signals in human DNA sequences. Gene 231: 77 - 86.

Back to polyadq


Jack Tabaska
Last modified: Tue Aug 24 13:04:13 EDT