MZEF(Michael Zhang's Exon Finder) is an internal coding exon
predictiton program. It starts with a potential exon (AG+ORF+GT, currently
minimum orf size =18 bp (9 bp for Arabadopsis) and maximum orf size = 999 bp
or 2000 for Arabidopsis), mearues 9 (10 for Arabidopsis) discriminant varibales
and then calculates its posterior exon probabilty. If the probabilty P > 1/2,
it will be output as a predicted exon. The output contains the following 7 fields:
Coordinates -- exon bounaries (in bp)
P -- Posterior probability (between .5 to 1.)
Fri -- Frame preference score for the ith frame of the genomic sequence
Orf -- ORF indinator,"011" (or "211") means 2nd and 3rd frames are open
3ss -- Acceptor score
Cds -- Coding preference score
5ss -- Donoor score
User can run the program by typing mzef followed by a return. He/she will be
asked to enter 4 parameters:
1. the name of the sequence file ('my.seq' for example). Input sequence should
be in a FASTA format, maximum size = 200 kb.
2. strand (1 for forward, 2 for reverse).
3. the prior probability (.04 for example) which depends on the gene-density
and GC-content of the locus (.08 for high gene-density region).
4. over lapping number (0 for example), 0 means "no overlap", 1 means "at most
1 overlap".
Output goes to the standard output which may be redirected to a file.
mzef is compiled for SunOS 5.5 and mzef.osf is compiled for DEC Alpha
OSF/1 3.2c. In order to allow bench-scientists to have the tool as soon as
possible, I am giving it out before my paper -- "Identification of Protein
Coding Regions in the Human Genome Based on Quadratic Discriminant Analysis"
is published (see reference below). If you have problems or comments, please
contact the author Michael Zhang (mzhang@cshl.org).
When ftp, all the .dat files should be in ~/MZEF/ (or in a subdirectory defined by
the environment variable MZEFDATA if you have the newer version mzef_new)
and the executable maybe in ~/bin/.
Click here to go to ftp site.
-Michael Zhang
May,30 1996; Nov.13 1996; Apr. 9 1998.
Cold Spring Harbor Laboratiry
* MZEF (c) 1997 Cold Spring Harbor Laboratory is available for free to
non-profit institutions using it for non-commercial purposes. Commercial
users may obtain a site license for a fee. For information about obtaining
a commercial use license contact Dr. Carol Dempster, email: dempster@cshl.org,
Phone: 516-367-6885 or Fax: 516-367-8855. ALL RIGHTS RESERVED.
REFERENCE: Identification of Protein Coding Regions in the Human Genome
Based on Quadratic Discriminant Analysis (M.Q.Zhang, PNAS, 94:565-568, 1997).
Click paper for a online version.
Click here to go back to gene finder page.