Capstone Project

M.H.S. in Bioinformatics

Jichao Chen



Abstract

       Precise spatial and temporal regulation of gene expression is achieved by the binding of transcription factors to their cognate binding sites (motifs) in gene promoters. The availability of large-scale expression profiling data, such as microarray and in-situ, generates lists of presumably co-regulated genes. This program ("Comtifinder") attempts to decipher the process of gene regulation by identifying common motifs in the promoters of a group of genes that are shown to be regulated by the same transcription factor experimentally. The program has the following features. (1) It searches exhaustively for all possible permutations of short sequences allowing mismatches at any position, and summarizes in a unique way taking advantage of the fact that the real motif will be represented multiple times in the short sequence permutation. (2) It maximally utilizes experimental evidence to set a high cut-off and to achieve a high signal to noise ratio. (3) It can easily incorporate prior knowledge of the target motif, such as length or complexity.