Sarah Iqbal is a senior research fellow at the department…
Crystal structure of a chaperonin protein complex, which assists with protein folding. The highlighted portion is a single subunit. Image: Thomas Splettstoesser/Wikimedia Commons, CC BY-SA 3.0.
Artificial intelligence (AI) is long past its initial days. From predicting nutrient deficiencies in the soil to helping physicians with medical decisions, AI has been providing novel solutions for tricky real world challenges.
Now, some researchers have claimed an AI they are working with has solved one of modern biology’s grand challenges: the protein-folding problem.
The British AI company behind this feat, DeepMind, has claimed that its new programme AlphaFold can process the sequence of amino acids that cells use to produce each protein and accurately predict the protein’s shape.
Computational biologists have been dreaming of this technology for decades.
Like tiny machines, proteins catalyse, support and regulate biochemical processes. Proteins also carry messages between cells, help store and transport metabolites, provide structure to cellular architecture and guard cells’ gates. Knowing how they function has significant implications for both biology research and drug-development.
At the heart of all these diverse functions lies their shapes of proteins. Each protein is custom-built to serve a purpose. For example, haemoglobin is globular in shape with little pockets. When blood flows past the lungs, these pockets are saturated with oxygen, and that’s how oxygen is transported to other parts of the body.
Proteins are not synthesised as globules or rods or helices. They are born as single chains of smaller units called amino acids.
Think of proteins as a very long word with different alphabets, each representing a different amino acid – like NLYIQWLKDGGPSSGRPPPS.
The order in which amino acids link together is important and unique, so each protein has a different spelling. And once the cell has synthesised a protein, this long chain of alphabets spontaneously folds into a particular shape.
Any change in the spelling can have wide ranging effects. To continue from the last example: patients with sickle-cell anaemia have a point mutation in the haemoglobin B gene that changes the spelling of the protein chain by one letter (one amino acid). This gives rise to fibre-like, instead of globular, haemoglobin proteins that aren’t very good at ferrying oxygen.
In the 1950s, a biologist named Christian Anfinsen demonstrated that proteins’ shapes are not random. When Anfinsen added increasing amounts of urea to a protein called ribonuclease, the protein lost its structure and began clumping together. But when he diluted the mixture with water, the protein started to regain its shape. Anfinsen thus found that the protein sequence is code for its eventual shape.
“By definition, the problem of decoding the [protein’s] structure from its sequence is a computational problem,” according to N. Srinivasan, a biophysicist at the Indian Institute of Science (IISc), Bengaluru. “Now that we have genome sequence data available for several different species, we can predict the genes and therefore protein sequences.”
But to be able to pinpoint the protein’s function or to target it in a useful way, we need to work out its structure.
The underlying mathematics is extremely challenging. A protein with 101 amino acids will have 100 links connecting the various units together, so altogether, there will be 5 × 1047 possible ways in which the proteins’ atoms can be arranged in space.
If a classical computer with sufficient computing power had to sample all the possible structures to determine the correct one, it might need about 30 years.
But within the cells of our bodies, proteins fold into their requisite shapes within seconds, and proteins designated to perform the same function fold into the same shape.
The weirdness of this event is captured in Levinthal’s paradox. Cyrus Levinthal realised in 1969 that instead of proceeding in discrete steps, like chemical reactions do, proteins seem to ‘know’ which shape to fold into the moment they’re born.
The source of this knowledge is the heart of the protein-folding problem.
Several scientists around the world are trying to crack it – i.e. unravel how proteins decide to fold, and also understand how proteins might fold (or misfold) in different circumstances.
One of the most important tools on this front scientists have had is called the Ramachandran plot, named for structural biologist G.N. Ramachandran (also Srinivasan’s mentor). Along with C. Ramakrishnan and V. Sasisekharan, Ramachandran found that simpler laws of physics allowed the bonds in protein molecules to bend around in limited ways.
So just by studying these ways, the trio discovered that proteins could fold only into certain shapes, while others were forbidden. The Ramachandran plot is a graphical technique that visualises these possibilities for quicker decision-making.
As revolutionary as Ramachandran’s work was, DeepMind’s AlphaFold portends a different sort of leap in the modern history of structural biology. In 2018, AlphaFold demonstrated 80% accuracy at predicting protein structures in the Critical Assessment of protein-Structure Prediction (CASP) challenge. CASP is an organisation that conducts community-wide experiments to test new software for their ability to correctly predict the folded structures of proteins.
These experiments are practically the Oscars of the protein-folding world. The first of them were conducted in 1994, and since then have been organised once every two years, with hundreds of teams participating.
During this year’s challenge (CASP14), AlphaFold wasn’t just the winner. It also breached the longstanding barrier of 90% accuracy in structure prediction – a bar set by CASP members. This result sparked claims that AI had solved the protein-folding problem.
This is not correct. The protein-folding problem is not a single entity – like a math problem. The challenge itself is multifold, with three key aspects, Sandhya Bhatia, a graduate student at the National Centre for Biological Sciences, Bengaluru, told The Wire Science.
The first is to determine the protein’s final structure from its generative sequence; the second, to determine how the protein’s atoms change their spatial arrangement as a function of its environment; and the third, to fully reveal the forces that keep a protein stable during this process.
“AlphaFold can guess and predict the structure for small, single-domain proteins, [and] this addresses only the first part of the problem,” Bhatia said.
Indeed, AlphaFold and other similar self-didactic programmes can predict only the static 3D structure of a protein.
But proteins are very dynamic. They exist in a state of flux, changing their shapes, swinging their arms. Their shape-shifting ability is what makes them so versatile. So predicting the static structure, while important, is just one step in a longer journey to truly knowing protein-folding.
There’s also the issue of predicting useful structures, according to Srinivasan. Even if AlphaFold has deduced a protein’s shape in a given context, scientists will still need to make sure the deduction holds true for the protein’s smallest units and the parts that participate in chemical reactions.
But none of the tools scientists can currently access are capable of generating a clear picture of how the protein structures change in time, and in response to chemical changes.
At present, scientists combine high-resolution microscopic techniques with computational approaches.
Bhatia and her colleagues are studying how monellin proteins – extracted from an African berry plant – folds in different conditions. Their findings prove useful to understand folding in other similar proteins.
So AlphaFold’s achievement is significant – but it hasn’t exactly ‘solved’ the problem itself.
Then again, it’s worthwhile to celebrate the significance.
Srinivasan started working on protein structures in 1984 as a student at IISc. At the time, the best tool to study protein structures was X-ray diffraction. Researchers observe how a protein crystal scatters an X-ray beam, and the scattering pattern indicates where atoms sit within the crystal.
Since then, scientists have developed several other, more advanced techniques to study proteins. Of them, nuclear magnetic resonance (NMR) and cryo-electron microscopy have been particularly useful to elicit high-resolution data.
But there is a downside.
For X-ray diffraction studies, scientists need to crystallise the protein – which can take several months to years. NMR and cryo-electron microscopy require the protein samples to be purified first. Cryo-electron microscopy also can’t deal with proteins weighing less than 50 kilodaltons. (One dalton is one-twelfth the mass of one neutral carbon-12 atom in its ground state and at rest.)
It’s easier to sequence proteins, determine the alphabet of letters for each protein, and present it to AlphaFold. So if DeepMind’s claims about AlphaFold survive closer scrutiny – as scientists are currently doing – we can stop bothering with protein crystals and accelerate the study of protein structures, especially of small proteins, Smriti Priya, a senior scientist at the Indian Institute of Toxicology Research, Lucknow, told The Wire Science.
Priya is studying a protein called alpha synuclein. When alpha synuclein doesn’t fold properly in brain cells, it can collect together as the plaque commonly found among people with Parkinson’s disease.
According to her, AlphaFold’s success also presents a financial angle important for countries like India. For example, there is only one cryo-electron microscope in Lucknow – and she doesn’t see “the instrument becoming common in low-resource settings in the next ten years”.
“But because AlphaFold is an AI-based programme,” she continued, we can access it “in maybe fewer than five years, if an exorbitant premium is not put on its use”.
The DeepMind team hasn’t said anything about when AlphaFold will become generally available.
In fact, thus far, celebrations of AlphaFold’s prowess have been limited largely to scientific journals. And “these are press releases, not peer-reviewed publications,” Srinivasan cautioned.
“The beauty of science is someone has to demonstrate it in a way you and I can do it. Only then is it useful.”
Sarah Iqbal is a freelance science writer.