>sequence header line
CTTTATCTGAAACTTGATTGTCTTAAATGTATTTGTGGAGAAATAAAATTATTGTATATT
TTGTGTAACAGAATCAGTGAGAATAAGCTGGTCGCAAACTGTCTTGCCTAGAGAGAGGGC
GCCTCCCAAAGTGCTGAGATTACAGGCGTGAGTCACTGCACCCAGCCTTGG
>another sequence
TTTGAGACGGAGTCTCACTCTGCCGCCCAGGCTGGAGTGCAGTGGAACGATCTCAGCTCA
CTGCAACCTCCACTTCCTGGGTTCAAGCCATTCTCCTGCCTCAGCCTCCCTAGTAGCTGG
GATTACAGGCGCCCACCACCACACCTGGCTAATTTTTGTGTGTTTTTGGTAGAGACAGGG
etc...
While the header line may be any length, polyadq will only print the
first 80 characters as an identifier. The sequence lines may be any
length. Whitespace and non-alphabetic characters in the sequence are
ignored. The sequence may consist of A, C, G, T, and U characters.
Other letters are retained as placeholders, but are disregarded by the
various scoring functions. There may be multiple sequences in the
sequence file.
Due to circumstances beyond my control, uploading files via
multipart/form-data encoding is not available.
Options
polyadq decides whether a given AATAAA or ATTAAA hexamer is a true
polyA signal by comparing the hexamer's QDF score to a cutoff value.
Hexamers scoring above the cutoff are reported as true signals. There
are separate cutoff scores for AATAAA and ATTAAA hexamers.
The form provides four methods for setting polyadq's cutoff scores:
Default: By default, polyadq's cutoffs are set to the levels that have given the best performance in our tests. These are good for about 64% sensitivity, 83% specificity, and a correlation coefficient of 0.512. Your mileage may vary.The performance statistics used here are only estimates, based on our tests of the program! These tests were performed on a set of (hopefully representative) known polyA signals. Obviously, pathological data sets could be constructed that cause polyadq to perform much worse (or much better). Caveat user.Set sensitivity level: polyadq's cutoffs will be set to the levels that (in our tests) give the best specificity at the approximate sensitivity level that you specify...
Set specificity level: ...or vice versa.
Set cutoffs: This option lets you set the cutoff levels directly. Not too useful.
Sequence header:
(sequence identifier)
Prediction Site Sequence Score neg 14426 ATTAAA 0.093228 neg 15153 AATAAA 0.150803 POS 16335 AATAAA 0.381718 POS 17322 AATAAA 0.669579 2 sites found out of 4 considered
POS
or neg
,
depending respectively on whether polyadq thinks it has found a true
polyA signal or not. The position of the signal and its sequence is
then listed, followed by its score. Note that the scores assigned to
AATAAA and ATTAAA signals are produced by different discriminant
functions, and therefore are NOT comparable. So, while you can
use the assigned scores to help decide which of two positive AATAAA
calls is "better", you can't use them to choose between a positive
AATAAA call and a positive ATTAAA call (but as a rule of thumb, choose
the AATAAA signal unless it's really weak and the ATTAAA is really
strong) (but don't quote me on that).
After the individual site predictions, polyadq reports the number of sites it has found and the number considered in a sequence. At the bottom of the output, polyadq reports the total number of sites found and considered in the whole input sequence file.