All the genome sequences of organisms known throughout the world are stored in a database belonging to the National Center for Biotechnology Information in the United States. As of today, the database has an additional entry: Caulobacter ethensis-2.0. It is the world's first fully computer-generated genome of a living organism, developed by scientists at ETH Zurich. However, it must be emphasised that although the genome for C. ethensis-2.0 was physically produced in the form of a very large DNA molecule, a corresponding organism does not yet exist.
C. ethensis-2.0 is based on the genome of a well-studied and harmless freshwater bacterium, Caulobacter crescentus, which is a naturally occurring bacterium found in spring water, rivers and lakes around the globe. It does not cause any diseases. C. crescentus is also a model organism commonly used in research laboratories to study the life of bacteria. The genome of this bacterium contains 4,000 genes. Scientists previously demonstrated that only about 680 of these genes are crucial to the survival of the species in the lab. Bacteria with this minimal genome are viable under laboratory conditions.
Beat Christen, Professor of Experimental Systems Biology at ETH Zurich, and his brother, Matthias Christen, a chemist at ETH Zurich, took the minimal genome of C. crescentus as a starting point. They set out to chemically synthesise this genome from scratch, as a continuous ring-shaped chromosome. Such a task was previously seen as a true tour de force: The chemically synthesised bacterial genome presented eleven years ago by the American genetics pioneer Craig Venter was the result of ten years of work by 20 scientists, according to media reports. The cost of the project is said to have totalled 40 million dollars.
Rationalising the production process
While Venter's team made an exact copy of a natural genome, the researchers at ETH Zurich radically altered their genome using a computer algorithm. Their motivation was twofold: one, to make it much easier to produce genomes, and two, to address fundamental questions of biology.
To create a DNA molecule as large as a bacterial genome, scientists must proceed step by step. In the case of the Caulobacter genome, the scientists at ETH Zurich synthesised 236 genome segments, which they subsequently pieced together. "The synthesis of these segments is not always easy," explains Matthias Christen. "DNA molecules not only possess the ability to stick to other DNA molecules, but depending on the sequence, they can also twist themselves into loops and knots, which can hamper the production process or render manufacturing impossible," explains Matthias Christen.
Simplified DNA sequences
To synthesise the genome segments in the simplest possible way, and then piece together all segments in the most streamlined manner, the scientists radically simplified the genome sequence without modifying the actual genetic information (at the protein level). There is ample latitude for the simplification of genomes, because biology has built-in redundancies for storing genetic information. For example, for many protein components (amino acids), there are two, four or even more possibilities to write their information into DNA.
The algorithm developed by the scientists at ETH Zurich makes optimal use of this redundancy of the genetic code. Using this algorithm, the researchers computed the ideal DNA sequence for the synthesis and construction of the genome, which they ultimately utilised for their work.
As a result, the scientists seeded many small modifications into the minimal genome, which in their entirety are, however, impressive: more than a sixth of all of the 800,000 DNA letters in the artificial genome were replaced, compared to the "natural" minimal genome. "Through our algorithm, we have completely rewritten our genome into a new sequence of DNA letters that no longer resembles the original sequence. However, the biological function at the protein level remains the same," says Beat Christen.
Litmus test for genetics
The rewritten genome is also interesting from a biological perspective. "Our method is a litmus test to see whether we biologists have correctly understood genetics, and it allows us to highlight possible gaps in our knowledge," explains Beat Christen. Naturally, the rewritten genome can contain only information that the researchers have actually understood. Possible "hidden" additional information that is located in the DNA sequence, and has not yet been understood by scientists, would have been lost in the process of creating the new code.
For research purposes, the scientists produced strains of bacteria that contained both the naturally occurring Caulobacter genome and also segments of the new artificial genome. By turning off certain natural genes in these bacteria, the researchers were able to test the functions of the artificial genes. They tested each one of the artificial genes in a multistep process.