“… there are known knowns. There are things we know that we know. There are known unknowns. That is to say there are things that we now know we don’t know. But there are also unknown unknowns. There are things we don’t know we don’t know.”
– Donald Rumsfeld
By a unique quirk of destiny, bipeds on an insignificant green planet evolved the ability to reason, developed exceptional pattern matching skills, a community memory of past events, and the ability to make devices to help do exactly this. Through these we became collectors and utilisers of knowledge, and masters of the universe. We now live in a world with more known knowns than ever before, and where new knowledge is being accumulated at a rate never before seen in history.
Science has been built upon our ability to search for answers to specific questions, and finding answers to them. For millennia, much of our knowledge was built by people making observations and connecting dots. This continued at sporadic rates, with a mixing of fact, fiction and fable, until the establishment of the modern scientific method, which has now become the cornerstone of modern science. This is an ongoing process of systematic observation, measurement, experimentation and the formation and testing of a hypothesis. This enables us to make general conclusions about a phenomenon.
This process of science driving our civilisation is undoubtedly exhilarating. Yet, much of the day to day process of science is mundane, focusing upon the systematic processing of observations and conclusions. Knowledge is built upon an accumulation of findings, a process followed by most scientists everywhere. Occasionally there are larger advances in science and these findings become elevated to discoveries.
How then do you classify a discovery? It’s the process of detecting something new and so understanding an existing natural phenomenon. It’s the ultimate “unknown unknown”. While “discovery” is synonymous with “finding”, we use the term only when the scale and significance of the finding is groundbreaking. Isaac Newton discovered gravity when he found out what it was, and by describing what it was, transformed our understanding of the natural world. To enhance the rate of discoveries, we often rely on another related process: of invention. An invention is something that did not exist earlier, but is a work of intuition, which is different from routine craftsmanship. The process of invention relies on existing knowledge and the use of existing devices.
Discoveries therefore demand exploration while inventions do not require it. And the end product of an invention is an object or artefact like a phone or lightbulb, while discoveries lead to theories of natural phenomena, like gravity or evolution. Neither discovery nor invention can survive in isolation but constantly feed off each other. It took the invention of the telescope to discover the nature of planetary bodies and our solar system, or the microscope to discover how cells work. The best scientists are differently suited towards making inventions or discoveries, yet the best often move seamlessly between the two, using invention to make discoveries.
While Edison gets credit for being a great inventor of the 20th century, his rival Tesla made pioneering inventions that continue to spur discovery on till this day. At the highest level, this distinction between invention and discovery blurs. It’s therefore essential for scientific policy-makers to recognise this, to create conditions where science can thrive.
Hypothesis-driven research v. inductive reasoning
So what then are the processes which enable great discoveries and inventions? The modern scientific method is built upon the process of hypothesis-driven research. This starts with the formulation of a hypothesis or conjecture based on existing knowledge, which must then be tested systematically. So an important aspect of a scientific hypothesis is that it must be falsifiable, in order to be tested in any meaningful way. Conclusions on a hypothesis are only drawn after testing the hypothesis. Which therefore means that a good hypothesis has to be based on observations, inferences and previous knowledge, must be testable, and then found to be right or wrong at end of investigation. So for a hypothesis, or model, to yield true discoveries, the onus is on the scientist to relentlessly probe its validity, ruthlessly tear into it, and be its worst critic. The conclusions must only be drawn after all holes have been filled, and the data should speak for itself. This is hard, since it is only human to be overly attracted or attached to a hypothesis or model, since much of our world is built on fallacies! Yet this is what is required.
However, there is more to the process of discoveries than hypothesis driven research. Many discoveries are made by chance, when stumbled upon. Pasteur famously said that “luck favors the prepared mind”, where a prepared mind only emerges from a solid understanding of the scientific method, and the ability to form and test hypothesis based on observations. Yet an overreliance on hypothesis driven research can sometimes stifle discovery. For that, and for the possibilities of chance discoveries, another powerful method, that of inductive (or sometimes deductive) inquiry is required.
Inductive reasoning is when the premises try to supply strong evidence for (not absolute proof of) the validity of a conclusion. This validity of a conclusion is only probable, based upon the evidence given. In other words, the premises of an inductive logical argument only suggest truth but do not ensure it. Inductive reasoning therefore creates the possibility of moving from very general statements to individual instances. Sometimes inductive reasoning is derided by modern scientists overly tied to hypothesis driven research as purely descriptive, when scientists are accused of “going fishing”. In reality, the best types of inductive reasoning do not allow blind fishing, but follow observable phenomenon without introducing too many constraints, and use inductive reasoning to build and constantly adapt a hypothesis.
Appreciating this distinction is critical for sifting through good v. bad science. A related process is that of deductive inquiry. Deductive or logical reasoning is a process of reasoning from one or more statements to reach a logically certain conclusion. Deductive reasoning therefore links premises with conclusions. If all premises are true, the terms are clear, and the rules of deductive logic are followed, then the conclusion reached is necessarily true.
Inductive and deductive reasoning have both held place in the process of inquiry, with deductive reasoning being particularly effective. A famous example of deductive reasoning remains: “All men are mortal, Socrates is a man, therefore Socrates is mortal”. However, with inductive reasoning, while based on observations, there is a possibility that the premise may be true but the conclusion false, and this remains the weakness of these forms of inquiry. For example, if all dogs on the street are black, saying that all dogs are black would be wrong.
A recent example of flawed inductive reasoning to draw a major conclusion was the proclamation of the existence of DNA with arsenic instead of phosphorus, which should have only been a hypothesis emerging from inference, not a conclusive statement. A more systematic study following proper experimental methods to test this refuted this concept entirely. Hence, while inductive reasoning can be extremely informative, we can only rely on it with skepticism, and use it to formulate a fully testable hypothesis which must be verified or falsified.
Both inductive and deductive reasoning therefore rely heavily on Occam’s razor, or the laws of parsimony, which simply states that “Among competing hypotheses, the one with the fewest assumptions should be selected.”
Particularly with inductive inference based approaches towards discovery, the pitfall to avoid is that of confirmation bias. This is based on a wholly human tendency to confirm rather than to deny a current hypothesis. We are creatures of habit, and inclined towards believing things we want to be true, as opposed to necessarily being true, and like to find solutions to problems that are consistent with our world view, rather than attempting to refute this. This form of bias can creep easily into research relying on inductive inference, since such bias might blind us to what the data says. The most successful science and scientists utilise hypothesis driven and inductive inference based research seamlessly, while ruthlessly reducing confirmation bias, and a human tendency to love one’s own models.
The shortcomings of Big Data
The utility of inductive inference in biology has been especially tapped by geneticists. Geneticists combine “screens”, which depend heavily on inductive inference, with working with a broad hypothesis that can be suitably strengthened with data that comes in. In a genetic screen, scientists use a mutagenised population, and look to identify individuals with “phenotypes” or traits. Using observation, scientists can infer what these traits might be because of, and explain what the function of a gene is. This inference alone is insufficient, but helps formulate a hypothesis (on the function of a gene), which can then be tested.
Today, much modern science – biology in particular – has moved towards using Big Data. Some of these studies have been extremely informative because they have been used to generate excellent, testable hypothesis. However, many big data studies have yielded little or thrown up more confusion, drowning out good science. A recent example of underwhelming results from such approaches could be the ENCODE project, which collected huge amounts of genomic data and prematurely proclaimed the non-existence of “junk DNA”, ignoring decades of other research. More systematic analysis of this data combined with a proper interpretation of decades of genetic and biochemical research suggests that this isn’t true at all.
One reason for some big studies yielding little is that in the excitement of having the ability to generate a lot of data, we sometimes forget that broader principles emerge only if this data will generate good, testable hypotheses, and allow properly reasoned inductive or deductive inquiry. For good hypothesis-driven science, the hypotheses should come from broader observations of natural systems with the ability to include inductive and deductive inference.
To paraphrase one of my scientific mentors, we tend to look only where we can see, and if everyone is looking somewhere, we tend to go join them in the same place. Yet there are great discoveries to be made where we can’t see, and these will only come if we abandon our comfort zones and our reliance on running with the herd, to go the other way. To paraphrase the opening of Star Trek, we need to combine the best of hypothesis driven and inquiry based research, and boldly go where no human has gone before.
Sunil Laxman is a scientist at the Institute for Stem Cell Biology and Regenerative Medicine, where his research group studies how cells function, and how they communicate with each other. He has a keen interest in the history and process of science, and how science influences society.