Saturday, December 18, 2010

Mutation-prediction software rewarded

California contest looks to boost software that can analyse genetic data.

Genetic data from patients will soon be flooding doctor's offices.

A computer program that predicts the effects of gene mutations has earned its author a doctorate, a stack of journal publications — and now a dancing wind-up toy named Molly.

Yana Bromberg, a bioinformatician at Rutgers University in New Brunswick, New Jersey, won the toy for her program, SNAP, in an experimental contest that culminated on 10 December in Berkeley, California. The competition, called the Critical Assessment of Genome Interpretation (CAGI), asks researchers to predict the biological effects of different mutations, and compares their results against unpublished experimental data.

The contest was conceived by Steven Brenner, a computational genomicist at the University of California, Berkeley, and John Moult, a computational biologist at the University of Maryland in Rockville. Their goal is to accelerate the development of software that can quickly interpret large amounts of genetic data — for example, the whole genome sequence of a tumour from a biopsy.

Data mountain

Such data is already flooding labs and will soon be hitting doctors' offices. "We've already got an enormous amount of data to contend with and we're struggling to make sense of it," says Moult. "I see CAGI as one mechanism to help with that process."

He helped to start a similar competition in 1994, to improve scientists' ability to determine the shapes of proteins from their amino-acid sequences. That effort, named the Critical Assessment of protein Structure Prediction (CASP), challenges scientists to predict protein structures that have been determined experimentally, but not yet published. The results are revealed at a biannual meeting in Pacific Grove, California.

CAGI works in a similar way. Instead of proteins, Brenner, Moult and coordinator Susanna Repo, a postdoc in Brenner's lab, provided several challenges that typically involved determining the biological effect of mutations in particular genes and the proteins they encode.

For instance, one challenge provided entrants with different variations in the cancer-associated gene CHEK2 that had been uncovered by a study of the gene in patients with cancer and healthy people, but not yet published. CAGI participants were asked to determine whether given mutations belonged to a patient or a control.

Despite being hastily organized — some of the challenges were posted just a couple of weeks before their deadlines — CAGI drew more than 100 entries. About 40 people made the trip to Berkeley to learn the results and to collect prizes, which Brenner awarded to anyone who gave a talk on their approach.Each team tackled these challenges differently. But their entries generally involved either predicting how a certain mutation changes the shape and function of a protein, or scouring genetic databases to determine the effects of similar mutations. "The ones that did best combined a large number of methods together," says Brenner.

Although the organizers were apprehensive about how the contest would work, "it went as well as it possibly could have", says Brenner. He and his team are still analysing the entries, and hope to reveal the official results in a peer-reviewed publication. On the basis of the success of the Berkeley workshop, they plan to hold the contest again within 2 years.

A challenge too far

There were a few hitches, however. One challenge proved so difficult to tackle at short notice that it generated no entries. In another, to predict the consequence of mutations in the tumour-suppressor geneP53, the relevant experiments became contaminated with mould, so the entries could not be compared against real data in time for the 10 December meeting.

Joost Schymkowitz and Frederic Rousseau's team at the Free University of Brussels worked on two of the problems. They fared better on one challenge than on the other, but Schymkowitz points out that failures can be as illuminating as successes because they highlight the shortcomings of particular approaches. "It makes you acutely aware of things you cannot do," he says.

Scott Kahn, chief information officer at the gene-analysis company Illumina in San Diego, California, who attended CAGI as an observer, says that the contest should help to speed up advances in genome prediction. "This does really focus effort in the community," he says.

Brenner, meanwhile, points out that protein-structure predictions improved greatly after CASP started. "Our hope is that same thing will happen here."

Sunday, December 5, 2010

Genomic fault zones come and go

The fragile regions in mammalian genomes thought to play a key role in evolution go through a 'birth and death' process



		IMAGE: In this graphic, the colored marks represent positions of the putative fragile regions in the human genome. The Turnover Fragile Breakage Model suggests that these regions likely form (still active) fragile regions...

The fragile regions in mammalian genomes that are thought to play a key role in evolution go through a "birth and death" process, according to new bioinformatics research performed at the University of California, San Diego. The new work, published in the journal Genome Biology on November 30, could help researchers identify the current fragile regions in the human genome – information that may reveal how the human genome will evolve in the future.

"The genomic architecture of every species on Earth changes on the evolutionary time scale and humans are not an exception. What will be the next big change in the human genome remains unknown, but our approach could be useful in determining where in the human genome those changes may occur," said Pavel Pevzner, a UC San Diego computer science professor and an author on the new study. Pevzner studies genomes and genome evolution from a computational perspective in the Department of Computer Science and Engineering at the UC San Diego Jacobs School of Engineering.

The fragile regions of genomes are prone to "genomic earthquakes" that can trigger chromosome rearrangements, disrupt genes, alter gene regulation and otherwise play an important role in genome evolution and the emergence of new species. For example, humans have 23 chromosomes while some other apes have 24 chromosomes, a consequence of a genome rearrangement that fused two chromosomes in our ape ancestor into human chromosome 2.



		IMAGE: According to research performed at UC San Diego, the fragile regions in mammalian genomes that are thought to play a key role in evolution go through a "birth and death "...

This work was performed by Pevzner and Max Alekseyev – a computer scientist who recently finished his Ph.D. in the Department of Computer Science and Engineering at the UC San Diego Jacobs School of Engineering. Alekseyev is now a computer science professor at the University of South Carolina.

Turnover Fragile Breakage Model

"The main conclusion of the new paper is that these fragile regions are moving," said Pevzner.

In 2003, Pevzner and UC San Diego mathematics professor Glen Tesler published results claiming that genomes have "fault zones" or genomic regions that are more prone to rearrangements than other regions. Their "Fragile Breakage Model" countered the then largely accepted "Random Breakage Model" – which implies that there are no rearrangement hotspots in mammalian genomes. While the Fragile Breakage Model has been supported by many studies in the last seven years, the precise locations of fragile regions in the human genome remain elusive.

The new work published in Genome Biology offers an update to the Fragile Breakage Model called the "Turnover Fragile Breakage Model." The findings demonstrate that the fragile regions undergo a birth and death process over evolutionary timescales and provide a clue to where the fragile regions in the human genome are located.

Do the Math: Find Fragile Regions

Finding the fragile regions within genomes is akin to looking at a mixed up deck of cards and trying to determine how many times it has been shuffled.

Looking at a genome, you may identify breaks, but to say it is a fragile region, you have to know that breaks occurred more than once at the same genomic position. "We are figuring out which regions underwent multiple genome earthquakes by analyzing the present-day genomes that survived these earthquakes that happened millions of years ago. The notion of rearrangements cannot be applied to a single genome at a single point in time. It's relevant when looking at more than one genome," said Pevzner, explaining the comparative genomics approach they took.

"It was noticed that while fragile regions may be shared across different genomes, most often such shared fragile regions are found in evolutionarily close genomes. This observation led us to a conclusion that fragility of any particular genomic position may appear only for a limited amount of time. The newly proposed Turnover Fragile Breakage Model postulates that fragile regions are subject to a 'birth and death' process and thus have limited lifespan," explained Alekseyev.

The Turnover Fragile Breakage Model suggests that genome rearrangements are more likely to occur at the sites where rearrangements have recently occurred – and that these rearrangement sites change over tens of millions of years. Thus, the best clue to the current locations of fragile regions in the human genome is offered by rearrangements that happened in our closest ancestors – chimpanzee and other primates.

Pevzner is eagerly awaiting sequenced primate genomes from the Genome 10K Project. Sequencing the genomes of 10,000 vertebrate species – including 100s of primates – is bound to provide new insights on human evolutionary history and possibly even the future rearrangements in the human genome.

"The most likely future rearrangements in human genome will happen at the sites that were recently disrupted in primates," said Pevzner.

Work tied to the new Turnover Fragile Breakage Model may also be useful for understanding genome rearrangements at the level of individuals, rather than entire species. In the future, the computer scientists hope to use similar tools to look at the chromosomal rearrangements that occur within the cells of individual cancer patients over and over again in order to develop new cancer diagnostics and drugs.
Pavel Pevzner is the Ronald R. Taylor Professor of Computer Science at UC San Diego; Director of the NIH Center for Computational Mass Spectrometry; and a Howard Hughes Medical Institute (HHMI) Professor.

"Comparative Genomics Reveals Birth and Death of Fragile Regions in Mammalian Evolution," in Genome Biology, Volume 11 Issue 11, by Max A. Alekseyev from the Department of Computer Science and Engineering, University of South Carolina, Columbia, SC, USA; and Pavel A. Pevzner from the Department of Computer Science and Engineering, University of California, San Diego, CA, USA.

Contact: Daniel Kane
dbkane@ucsd.edu
858-534-3262
University of California - San Diego

Friday, December 3, 2010

WELCOME

Hello,

Welcome everybody to our new platform of sharing bioinformatics news and bioinformatics discussions. This blog is initiated jointly by Bioinformatics students of University of Tampere and University of Turku. We expect your active participation as in your own blog.
Your suggestions and comments are heartily welcomed and appreciated.

Thank you!!