I keep hearing some variations of the following comment: “Living organisms generate the equivalent of exabytes (or zettabytes, or whatever) amount of information per second. We will need to store all this data, and then analyze it to make sense of what is going on.” The first part of the statement is certainly true. A detailed description of the position and state of every molecule even within a single cell will take enormous amounts of data. Similarly, recordings of the activity of a population of neurons already generate gigabytes of data per second. As our recording techniques get better, the rate at which data is generated will also increase.
However, I am worried about the second part of the statement. There are a couple of concerns here. If we mindlessly accumulate data, it is possible that important features will be buried in the mess. As I wrote separately, some have suggested that this is perfectly fine. We just need to feed the whole shebang to some data crunching supercomputer, and it will tell us what matters and what does not. In this brave new world, scientists would only have to decide what questions need to be answered – machines will collect and interpret the data for us.
However, I doubt that our algorithms are this powerful. For the foreseeable future, we will have to play an active part in analyzing and understanding the data. And this means that blindly collecting all the data we can may not be the best approach.
A second related question is what is the complexity of a satisfactory description of a living organism — say a bacterium, or the human brain. I would expect that the complexity of the description will dictate how much data we will need to fully develop it.
What constitutes a satisfactory description is subjective. Satisfactory can mean that the description gives us a feeling that we understand how the organism functions. A satisfactory description could also give accurate predictions of how an organism behaves, without giving us an understanding of the mechanisms.
I am relatively optimistic that we will be able to develop the second type of models. We already have some computational models of organisms that give very good predictions about their behavior (here is an example by Jae Kyoung Kim, and collaborators and a computational model of a cell about which I wrote before). However, these models are not simple. I doubt that you can really stare at them and gain a deep understanding of how the model, or the organism, ticks.
Perhaps we will be able to develop models of living organisms that both give us accurate predictions, and deep insights into how they function. After all, physicists have given us such descriptions of the physical world. However, I doubt that we will get there just by blindly amassing data.