Something about our field has been bothering me for a while: Overall, we mathematicians do a relatively poor job of presenting our research to a general audience (here is Doron Zeilberger’s comment on the subject). There are certainly some spectacular expositors in the field. But overall, we could do much better in presenting our work. This is a problem both of training and practice.
I work at the intersection of mathematics and biology. Over the years, I have been on a number of thesis committees for graduate students in both fields. Graduate training in the two disciplines is quite different. Importantly, students of mathematics are not trained as extensively to give presentations, or write in an accessible way. There are a few differences in the way we do things that could be responsible:
Mathematicians spend a lot less time preparing graduate students to present their research. Most biologists require students to give talks regularly during lab meetings, and at conferences. Many graduate biology programs require graduate students to give oral progress reports annually or twice a year. Although things are changing, this does not seem to be the norm in mathematics.
We also put less emphasis on writing. Publishing one or two peer-reviewed paper is frequently a requirement for a PhD in biology. Moreover, students often have to submit a thesis proposal, or a mock grant proposal as part of their qualifying exams. The writing is critically reviewed – I have been on committees where students had to rewrite their proposal several times before it was accepted. For many students in mathematics the thesis is the first, and sometimes only, original piece of scientific writing they will produce.
The reason for the differences may be that good presentations are much more important in biology. Even a mediocre talk will raise eyebrows, and it can kill your chances of getting a job. And a poorly written grant will not be funded, no matter how good the ideas. In biology there is a high overlap between people that do excellent research and give excellent presentations. Less so in mathematics. The cynical reply here is that these are just better salesman and get more funding, and hence run bigger and more productive labs. But perhaps these are simply the people who view the presentation of their research as an integral part of their work.
I don’t mean to say that we need to emulate biologists in every way. I see plenty of problems in graduate education in biology – graduate students frequently get no programming experience, and little teaching experience. They most certainly do not learn enough math and statistics.
However, with the current situation in academia, only a handful of PhD students will find academic jobs. Mathematics students entering industry will have learned the persistence and concentrated effort necessary to do research. Many will learn how to program. These are invaluable skills. However, the ability to write well and present ideas clearly is also indispensable. Shouldn’t we do a better job in teaching our students these skills?
Things are getting better. For instance, the students in our SIAM chapter at my university (University of Houston) have organized a student paper presentation event. Participation was strong, and I was impressed with the presentations. Paper exchanges, where students read and comment on each other’s writing have also been helpful. There are numerous other ways in which we can help our students and postdocs become better communicators. And I think we should consider this an essential part of their training.
I know this issue has been brought up many times, but I just read this excellent post, and wanted to bring it up again.
If you read scientific articles, you have likely encountered p-values many times over. Many people think they understand what a p-value means, but I believe that many do not. In science we frequently test different hypotheses. We naturally want to determine the probability that the hypothesis is false or true, given the data that was observed. The p-value is frequently interpreted as somehow giving us such a probability. But this is not what the p-value tells you – it only gives you the probability of observing the data you have, or a more extreme sample, under the given hypothesis. If this probability is small, then either you observed a low probability event, or your hypothesis is wrong.
The main point of this post is to direct you to the following clear discussion of the issue. Although this point has been made so many times, I think it is worth re-emphasizing.
Perhaps I am going out on a limb here, but it seems like we naturally tend to the Bayesian approach. We can compute the probability of the data given a hypothesis, p(D | H). What we would like is to determine the probability of the hypothesis given the data, p(H|D). We want to be able to say: “The data tells me that this hypothesis is very probably true,” or even “The data tells me that the probability that this hypothesis is true is 99%”. If you don’t want to use a Bayesian approach then you can’t go directly from p(D | H) to p(H|D). In particular, p-values deal with the first, but not the second.
Next time you see a p-value in an article, pay attention how it is interpreted. I am sure you will find many examples where the interpretation is not quite correct.
I just came from a very interesting lecture about Wiener, cybernetics and the counterculture by Cyrus Mody. So let me continue with some further thoughts on Wiener and cybernetics. One thing that Wiener warned about in Cybernetics is the takeover of machines. Despite what this may bring to mind – after all cybernetics gave as the word cyborg – I do not think that Wiener thought that one day Skynet will become self-aware and exterminate the human race. He writes
The modern industrial revolution is similarly bound to devalue the human brain at least in its simpler and more routine decisions. Of course, just as the skilled carpenter, the skilled mechanic, the skilled dressmaker have in some degree survived the first industrial revolution, so the skilled scientist and the skilled administrator may survive the second. However, taking the second revolution as accomplished, the average human being of mediocre attainments or less has nothing to sell that is worth anyone’s money to buy.
Thus, Wiener says, our brains now give us the only advantage we still have over machines. But this advantage will not last – and what then?
We are a part of many feedback loops involving machines. But how of how much value is the human component in these loops? How much longer will it be necessary? Let me give a couple examples (I am indebted to Evgeny Morozov for a some of these).
For instance, if your goal is to maximize profits, you need to minimize the difference between what your audience wants and what you are delivering. In the movie industry that would allow you to avoid another “Waterworld” or “The Adventures of Pluto Nash“. You can avoid such disasters by improving your predictions about what people like. And if you are Netflix, you have the data that allows you to do so. You know not only what movies people streamed, liked and disliked, but also when they paused them and when they skipped ahead. You analyze your data and you find, perhaps, that people want to see Kevin Spacey directed by David Fincher in a remake of the British TV series “House of Cards”. You are guaranteed a success.
Or perhaps you are a punk band, that wants to get the biggest possible response from the crowd, i.e. moshpit intensity. You install some sensors in the floor, and correlate the dance intensity with the features of the songs played at that moment. You can then design your songs according to what drives the crowd wild. This is what a band in China called Bear Warrior did. Quoting the singer of the band:
…the data helps us understand how we can improve our performance to make the audience respond to our music like we intend.
The potential problem here is that for centuries we had a feedback loop between the artist and the public. These were and are imperfect. We are now changing them to be more efficient, and give the public more of what they want. But in doing so, are we marginalizing, or even removing the artist? Could the machines that we put in their place really be creative? This new system could result in sterile solutions, which may be satisfactory, or even delight us. But by minimizing the possibility of failure, we may also minimize the possibility of generating something truly new and surprising.
And what happens when humans are taken completely out of the feedback loop. A self-driving car is in the near future. But as Gary Marcus at NYU asks, do you want to leave the entire decision making process to a machine? What if you are driving down a narrow bridge with a school bus full of children coming your way. Should your car be allow to kill you in order to save the children? This may be the moral decision, but should a machine be able to make it. Isaac Asimov thought about this, and came up with some interesting answers.
Wiener himself tried to suggest the changes that are necessary so that all humans remain valued in the future:
The answer, of course, is to have a society based on human values other than buying or selling.
What other values? Wiener does not say.
I do not believe that the singularity is near. But I do believe that, as they did in chess, machines will surpass us in many other ways. As a society we should follow Wiener’s advice, and agree on what we really need to value.
I recently gave a lecture at Rice University as part of a seminar series devoted to the work of Norbert Wiener. I signed up to talk about cybernetics. When I signed up, I had limited knowledge of cybernetics and Wiener’s contributions to the field. Therefore I spent the last couple of months reading about it. In particular, I read Wiener’s original book on the subject fairly carefully, so I wanted to comment on a few aspects.
Every 10 years or so a mathematical theory breaks out of the confines of academia and runs rampant in popular culture. Perhaps it resonates with the current thinking. Or, if you are more cynical, you can say that occasionally, but rarely, mathematicians can be clever salesmen. In my memory we had catastrophe theory, followed by chaos theory and more recently network theory. Each of these has a perfectly respectable mathematical foundation – singularity theory, dynamical systems, and graph theory, respectively. However, after they become popular the theories took on a life of their own, and often became unrecognizable (Remember the chaos theorist in Jurassic Park?).
When I started reading about cybernetics, I thought I would find something similar. Surprisingly, this wasn’t the case at all. Cybernetics is far more encompassing than any of the theories mentioned above. It also has much less of a definite mathematical foundation. The ideas that Wiener describes in his book are far reaching. And most are more relevant today, than they were when the book was written.
Wiener’s interest in cybernetics stemmed in part from his work on the development of computers, and automated anti-aircraft artillery during WWII. This work lead him to the question of automated prediction. How can new data be continuously incorporating to predict the future of a system that includes the observer? Wiener notes that it is necessary to model the system as a feedback loop. Biological systems and humans can be part of such loops, as can machines.
Wiener builds on this idea, noting that feedback is essential in all biological systems. For instance if you want to pick up a pencil you need feedback to perform this task well. This feedback is only partly visual. Our sense of proprioception, or the feedback information that allows you to know what part of your body is where (Oliver Sacks has a good example of what happens if you loose this in his essay “The Disembodied Lady“). Wiener was very interested in this question – and properly incorporating proprioception is still a major obstacle to brain-machine interfaces.
In his book Wiener takes these ideas much further speculating about neuroscience, the role of machines in society and linguistics. The book is a treasure trove of provocative ideas, many likely coming from friends who were as brilliant as Wiener himself. But one idea that does belong to himself, is the early realization of the dangers of automating our thinking process. He notes that when we develop thinking machines, humans will become dispensable. He writes:
The modern industrial revolution is similarly bound to devalue the human brain at least in its simpler and more routine decisions. Of course, just as the skilled carpenter, the skilled mechanic, the skilled dressmaker have in some degree survived the first industrial revolution, so the skilled scientist and the skilled administrator may survive the second. However, taking the second revolution as accomplished, the average human being of mediocre attainments or less has nothing to sell that is worth anyone’s money to buy. The answer, of course, is to have a society based on human values other than buying or selling.
He was also very concerned about the impact of the mechanization of the military. He refused to take government funding after the war, and actively opposed sharing his ideas to further military research. This position was certainly not popular during the Cold War. He wrote about it anonymously in a column “The Rebel Scientists” in the Atlantic, but his position is clear in a letter published in the Bulletin of the Atomic Scientists in 1946:
The practical use of guided missiles can only be to kill foreign civilization indiscriminately, and it furnishes no protection whatsoever to civilians in this country…. If therefore I do not desire to participate in the bombing or poisoning of defenseless peoples–and I most certainly do not–I must take a serious responsibility as to those to whom I disclose my scientific ideas. Since it is obvious that with sufficient effort you can obtain my material, even though it is out of print, I can only protest pro forma in refusing to give you any information concerning my past work. However, I rejoice at the fact that my material is not readily available, inasmuch as it gives me the opportunity to raise this serious moral issue. I do not expect to publish any future work of mine which may do damage in the hands of irresponsible militarists.
I will write a bit more about cybernetics in modern society.
I recently had an interesting discussion with my colleagues Dan Graur and Ricardo Azevedo about their recent critique of the ENCODE project. If you have not read the article, I recommend it highly. I can guarantee that you will be entertained more than by almost any scientific article you’ve read in your life.
I believe that the authors make very good points here – they argue that the claims that 80% of the genome is functional (i.e. there is very little “junk DNA”) are unsupported by the data. A very interesting point is that the data was mined somewhat blindly. Evolutionary biologists who may have had a different perspective and interpretation were not consulted, for instance. I am not a specialist in these areas, and will leave this debate to others. But the article, and the response it received raise a number of other questions.
How best to interpret the large amounts of data that are becoming available is a problem of tremendous importance. However, I wanted to address another question here. One of the main criticism of the paper by Graur, et al. is about the tone they chose to adopt. Some have deemed it too aggressive or disrespectful, or simply unbecoming of a scientific publication.
I know the people who wrote the paper well. I am certain they would not have written a controversial paper for the sake of controversy – they believe deeply and stand behind what they wrote. This was a small group of scientists in a good, but not super-famous department. They were calling into question the findings of a project that cost $200 million and involved over 400 scientists, or more. Earlier publications have questioned the results of the ENCODE project, but none have gotten the attention that the paper by Graur, et al. did – the snarky tone certainly caught people’s attention, and that was the point.
However, this also exposes a fundamental problem with Big Science. By definition, Big Science projects cost millions or even billions of dollars. They can fund hundreds of labs and thousands of scientists. Could such projects be too big to fail? If the projects do not achieve the desired results, will it be possible to admit so? Or suppose even that great new data is generated, and our knowledge is advanced by the project. But perhaps the results are not easily summarized in a headline or a soundbite. Perhaps they are complicated, not easily interpreted, or simply not definitive. However, if billions were spent, will there be a need to try to come up with headline worthy claims, and pronouncements that textbooks will have to be rewritten?
When so much money and so many careers are at stake, the rules under which science is done probably become different. To openly challenge such results becomes a questionable career move – it becomes likely that you have just made hundreds or thousands of enemies. If you have a disagreement with one or a few colleagues, that may not mean that your next grant will not get funded. But if you offended hundreds of them things may look different.
I don’t want to say that big scientific projects should not exist. Some such projects have been amazingly successful in the past – big science gave us the map of the human genome, and nuclear weapons (you can argue with the benefit to humanity, but not with the success), and Big Engineering landed people on the Moon. However, there is a political side of such projects that sets them apart. The goals of the projects will need to be carefully defined, and whenever possible, all steps of the project will need to be open to the entire scientific community.
In almost all cases, controversy is nothing to be afraid of in science. Controversy is frequently what drives science forward. If the result of Big Science is to tame or extinguish controversy, we should be worried indeed.
There has been a lot of discussion about whether we will really be able to have a deep understanding of scientific results of the future. Steve Strogatz explains the problem succinctly here.
This debate is somewhat abstract. It is therefore interesting to read about how this controversy is playing out in particular scientific disciplines. One of the best examples I saw recently was the exchange between some of the best minds of this and a past generation: It started with a number of remarks that Noam Chomsky has made at the Brain, Minds and Machines symposium at MIT. Here is a transcript of Chomsky’s remarks. He also clarifies the position in this interview.
There is a lot here, but I wanted to address one particular point that Chomsky makes: His position is that statistical models may work very well in engineering applications, but do not (at least typically do not) reveal the underlying principles or rules that govern the universe. Chomsky argues that theories should still govern experiments. As usual, he adopts a fairly extreme position. For instance, he says about Gregor Mendel:
“Yeah, he did the right thing. He let the theory guide the data.”
(as an aside, I think Feynman provided the best response to such arguments: “It does not make any difference how beautiful your guess is. It does not make any difference how smart you are, who made the guess, or what his name is – if it disagrees with experiment it is wrong.”)
However, Chomsky’s position is overall understandable: Science is not just engineering – it should provide insights into the workings of the universe. Or should it?
The answer to Chomsky by Peter Norvig, Head of Research at Google, is very much worth reading. This is a well thought out response, and I can’t really do it justice by summarizing it. Read it, along with the paper by Leo Breiman that is discussed in one of the last paragraphs.
I think there are many interesting questions that are being addressed here, and I felt that sometimes this was not sufficiently clear in the discussion. Let me point out a couple of things that came to mind:
Are probabilistic models useful? I find the disagreement between Norvig and Chomsky about this specific question a bit puzzling. This is perhaps because they are talking mainly about linguistics, and have particular probabilistic models in mind. Of course probabilistic models are useful: quantum mechanics describes the world in probabilistic terms. Statistical models are also often inspired by certain hypotheses about how the world works. If the data points in favor of one model, it frequently provides evidence that this model captures something about how the universe works.
I think the disagreement is about what Leo Breiman calls “algorithmic models”. Such models typically have thousands of parameters, and can be so complex that we cannot understand why or even how they work, even when they give very good predictions. They are frequently not constructed to fit any particular set of data. It is therefore quite possible that the principles which make them work would not help us understand the rules of the universe. Thus even understanding such models may not tell us much about the world around us.
If we describe the world in terms of such models, are we still doing science? This is where Norvig and Chomsky disagree. I think that this is somewhat of a semantic dispute. I am certain that both will agree that in all of science we should strive to find simple rules that accurately describe the workings of the universe in a language understandable to humans (this language has traditionally been mathematics).
However, there is a possibility that we will never be able to completely understand how the universe works. The Four Color Conjecture has been proved 35 years ago with the assistance of computers. We know that the statement is true, but nobody has a clear understanding why that is so – the proof is simply to complex to be fully comprehended by a human being. What if describing natural phenomena will require models of similar, or higher complexity?
I don’t see any reason why the universe should be simple enough for us to understand. In that case, handing over the business of doing science to computers may be our only option.
Here is another interesting take on this issue.
In the book Innumeracy, John Allen Paulos describes our problems in dealing with big numbers. I agree that we can, and should, learn to distinguish easily between a million, a billion, and a trillion. However, some numbers are just too large to grasp. We can’t really comprehend a number as large as a googol , or 10^100. And I don’t really know of any useful analogies that would allow me to compare the quantity represented by a googolplex, 10^(10^100), to that of a googol. These quantities, and the difference between them, are just so vast that we can’t really wrap our minds around them. There are only 10^78 or so atoms in the universe, and even that is a number which we cannot really fathom. But large numbers do pop up even when you ask relatively simple questions in mathematics.
For instance, in how many ways can you arrange a full deck of cards? You can easily show that the answer is 52 x 51 x … x 1, or 52! We can use Stirling’s approximation to estimate this number as sqrt(2*pi*52)(52/e)^52 which is approximately 8 x 10^67. This number is getting close to the number of atoms in the universe. Indeed, whenever you shuffle a deck of cards well, you can be nearly certain that nobody in the history of the world has played with the same arrangement before.
Large numbers appear frequently whenever we count the number of possible ways of arranging different objects. You have probably heard that if you put a number of monkeys in a room and let them pound away on typewriters, they will eventually produce all the works of Shakespeare. This may be true, but the amount of time it would take is not really comprehensible.
Let us look at a related question: In his story “The Library of Babel“, the Argentinian writer Luis Borges described a strange universe that consists of hexagonal cells lined by shelves full of books. Each book is unique and consists of different permutations of 25 letters and symbols. The narrator reveals that any possible arrangement of these 25 characters is contained somewhere in the library.
Borges writes that therefore:
Everything: the minutely detailed history of the future, the archangels’ autobiographies, the faithful catalogues of the Library, thousands and thousands of false catalogues, the demonstration of the fallacy of those catalogues, the demonstration of the fallacy of the true catalogue, the Gnostic gospel of Basilides, the commentary on that gospel, the commentary on the commentary on that gospel, the true story of your death, the translation of every book in all languages, the interpolations of every book in all books.
Taking into account Borges’ description of each book we can compute exactly how many of them are contained in the library: Each book has 410 pages, each page 40 lines, and line 80 characters. Therefore there are 410 x 40 x 80 = 1,312,000 characters in each book. The Library of Babel therefore consists of 25^(410x40x80) = 25^(1,312,000) books. (You can also check here for a more detailed explanation, or go here to download Borges’ number). I was trying to come up with a way to illustrate how large the Library or Babel. But I don’t think there is a useful analogy – the best we can do with numbers this big is to write them down. Perhaps my imagination is lacking, but I don’t see any way to illustrate their vastness other than essentially saying “This number is really, really big” in a different way.
This is pretty weird stuff. But these numbers are not really all that big. You can easily come up with much bigger numbers yourself. For example, you can order operations in an ascending order as follows: Repeated summation leads to multiplication, and repeated multiplication leads to exponentiation. Next repeated exponentiation give you towers of powers. You can keep on going, and soon you’ll reach operations that quickly produce immense integers (check out Ackerman’s notation to get a handle on how to write such numbers).
Still we have not come even close to infinity. Here is one way to see how far we are: We know that irrational numbers have a series of non-repeating digits. But there are different types of irrational numbers. For instance, you can just arrange 1s and 2s in a non-repeating pattern. Or you can take .1010010001000001000001… Since this expansion is not eventually periodic, this is an irrational number. However, the arrangement of digits in this number seems non-random. When we think of an arbitrary irrational numbers, we have a notion that the appearance of any digit, and group of digits, should be equally likely. Irrational numbers that have this property are called normal (I will actually consider numbers that are normal in any base. These are sometimes called absolutely normal numbers). More precisely, in the decimal expansion of a normal irrational number, each digit, each pair of digits, each triplet and so on appear with equal frequency. Interestingly, while we do know that almost all irrational numbers are normal, we don’t know whether specific irrational numbers are normal. Nobody knows whether pi or sqrt(2) are normal.
As a consequence, any and all sequences of digits appear in the expansions of normal numbers: Suppose that you have a normal number in base 25. We can map each of the numbers in this base to one of 25 characters that fill the books in the Library of Babel. That means that any normal number contains the text of all books in the Library of Babel. To see this, expand the number in base 25, convert the numbers to the corresponding characters, and break the expansion into blocks of 1,312,000 letters. Each such block corresponds to a book. Indeed, each book from the Library, and the entire Library itself appear in the expansion infinitely many times.
Let me finish with a last point that I owe to Lawrence Krauss. I recommend his book entitled A Universe from Nothing. Krauss talks a lot about quantum fluctuations which are present even in a vacuum. They allow particles to pop in and out existence all the time. Indeed, Krauss proposes that our entire universe may have simply popped into existence out of nothingness. Even more bizarrely, it is possible that any arrangements of particles may pop into existence out of nothingness, and disappear in the next instant. If you allow for infinitely many possibilities of this type, it means that we, along with all our memories, could be nothing more than a quantum fluctuation that exists only for a moment and disappears. Indeed, this may be a much more likely possibility than the alternative that we truly exist continuously in time.