Subdividing globally important zones based on data distribution across multiple genome fragments Academic Article uri icon

abstract

  • In multiple genome fragments, a globally important mode is a zone represented by a significant change, where the change has a similar impact on every related fragment in the zone. This zone may represent the cancer related genes involved in diverse tumors. Globally important zones are characterized by two features: (1) there are more data points in globally important zones than in other areas of fragments; (2) the data points are distributed evenly on as many genome fragments as possible. Globally important zone mining needs to contain the following features: (1) independent of data distribution; (2) noise filtering; (3) pattern boundary identification; and (4) zone ranking. We have developed a hierarchical and density-based method, called GIZFinder (globally important zone finder), to detect and rank such zones based on two criteria: distribution width and distribution depth. The comparisons on the simulated data shows our method performs significantly better than the kernel framework and the sliding window. By experimenting on real cancer gene data, we identify 53 novel cancer genes, some of which have been proven correct.

publication date

  • May 2014