Mathematics Explains Why Crispr-Cas9 Sometimes Cuts the Wrong DNA

(Originally published by TU Delft)

February 7, 2018

The discovery of the Cas9 protein has been of great value to medical science. It has simplified gene editing tremendously, and may even make it possible to eliminate many hereditary diseases in the near future. Using Cas9, researchers have the ability to cut DNA in a cell to correct mutated genes, or paste new pieces of genetic material into the newly opened spot. Initially, the Crispr-Cas9 system seemed to be extremely accurate. But unfortunately, it is now apparent that Cas9 sometimes also cuts other DNA sequences similar to the exact sequences it was programmed to target. Scientists at Delft University of Technology have developed a mathematical model that explains why Cas9 cuts some DNA sequences while leaving others alone.

The Crispr-Cas9 system is a defence mechanism that protects bacteria from viruses. If a virus enters a bacterium but does not manage to take over the cell, the defence system cuts out some genetic material from the virus and stores it in the bacterium’s own genome. The built-in viral DNA acts as a genetic memory. If the same virus attacks the bacterium (or its descendants), it quickly recognises the attacker and can send out Cas9 proteins to track it down. Using viral RNA as a sort of 'cheat sheet', the protein hunts for hostile DNA in the cell. If it finds a match, the Crispr-Cas9 system then cuts the viral DNA, incapacitating the threat.

Harmful effects

Scientists initially thought that Crispr-Cas9 only cleaves a piece of DNA if it exactly matches the cheat sheet of RNA that it carries. Unfortunately, however, that assumption has now been proven wrong. The protein sometimes cuts DNA sequences that resemble the material it is looking for, but that contain a number of different letters. According to researcher Martin Depken of Delft University of Technology, cutting such slightly differing sequences is very logical from an evolutionary point of view. ‘Viruses mutate constantly, and can therefore have a different genetic make-up than what Cas9 is looking for’, he says. ‘By also cutting DNA sequences that are slightly different, the Crispr-Cas9 system can track the evolution of a virus and better protect the bacterium against its foes.’

But in this case, what is good for bacteria is bad for humans. If we want to use Cas9 in order to erase diseases from our DNA, it is imperative that no other genes are cut than the ones that we target. If Cas9 destroys other genetic material, that can have dire consequences.

Reward with energy

Experiments have shown that Crispr-Cas9 is more likely to cut certain non-matching sequences than others. Scientists from the research group of Martin Depken, led by PhD student Misha Klein, wondered what the underlying physics is that determines this preference. According to Depken, the answer is very simple: it's all about the energy it costs to make base pairs that deviate from the RNA template. ‘When Cas9 checks if a DNA sequence is a match, it starts at one end of the strand,’ expains Depken, who is a member of the Kavli Institute for Nanoscience at Delft. ‘Then, it checks all of the letters of the strand in turn. For each match, Cas9 is rewarded with energy, while any mismatch costs energy. The more errors a DNA sequence contains, and the closer they are to the start of sequence, the more likely it is that the protein will refrain from cutting. Instead, it will unbind from the DNA and continue its search for a piece of genetic material that better matches its RNA template.

Better predictions

According to Depken, the simple mathematical model developed by his group predicts existing data about Cas9’s cutting behaviour surprisingly well. If an error is situated at the end of the sequence, the protein will likely have amassed enough energy to overcome that hurdle, which increases the probability of cutting. The model also explains why Cas9 refrains from cutting when it encounters a mismatch at the beginning of a sequence, or when two mismatches are close together.

When it comes to the probability that a DNA sequence will be cut, the physical properties of the Cas9 protein itself also play a role. Depken and his colleagues are now looking to incorporate this variable into their model. Ultimately, the model should lead to better predictions of the errors that Cas9 is likely to make. ‘Sometimes there is a choice in what exact location to cut when fixing a gene, and our model will help determine which locations are the best to target’ says Depken. The physical understanding provided by the model can also help efforts to re-engineer Cas9 to not make life-threatening mistakes while editing our DNA.

Nanoscience