GitHub - richarddurbin/hexamer: find likely coding segments in DNA using composition-normalised hexamer tables

richarddurbin / hexamer Public

Notifications You must be signed in to change notification settings
Fork 0
Star 18

find likely coding segments in DNA using composition-normalised hexamer tables

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
AH6.dna		AH6.dna
LICENSE		LICENSE
Makefile		Makefile
README		README
hexamer.c		hexamer.c
hextable.c		hextable.c
readseq.c		readseq.c
readseq.h		readseq.h
worm.coding		worm.coding

Repository files navigation

hexamer and hextable
--------------------

hextable makes files of statistics that hexamer uses to scan for
likely coding regions. The principle is to use 6mers, but to avoid
deriving any information from base composition.  I therefore normalise
the frequencies of each 6mer by dividing by the total frequency of all
6mers with the same base composition.

The input of hextable is a fasta file of coding sequences in frame.
The -o file output is an ascii list of 4096 floating point numbers
giving log likelihood ratio scores in bits.  The output on stdout is a
summary of the information content of the table, indicating how
discriminative it is likely to be.  The output of hexamer is maximal
scoring segments of its input with score greater than or equal to T, 
in GFF format (http://www.sanger.ac.uk/Users/rd/gff.html).

Type "make" to build the programs, and "make clean" to remove them.

Example usage:

	hextable -o worm.hex worm.coding
	hexamer -T 20 worm.hex AH6.dna

NB these programs assume all a,c,g,t.  n's found in sequences are
converted to c.

Richard Durbin ([email protected]) 9/95-4/98

PS 30/3/99 The original version of hexamer had some initialisation
bugs, which have been fixed today.