The Design of Experiments and Their Interpretation


I would like to start this post asking the readers of this blog to take 3 min to watch this short TED clip:

Arthur Benjamin: Teach statistics before calculus!

I watched this video few days ago and I found it interesting not only because of the undoubtedly courageous proposal of Prof Arthur Benjamin about teaching statistics before calculus, but because it made me think (again) on how important is improving the way I deal with “uncertainty” and probability in my research.
The “invention” of the scientific method (whose framework was firstly established by René Descartes in his treatise, Discourse on Method Then, in 1637, and then further developed by Alhazen, Bacon and John Stuart Mill) is, in my view, one of the greatest accomplishments we have achieved as human beings. However, to be worth its power, this “tool” has to be “handled” with care. The scientific method, used in the process of “expanding” our knowledge of nature, has to be translated into well designed and then well interpreted experiments. To support this assumption, I would like to quote the work of Fisher (1935) “The Design of Experiment”, (a milestone of modern science), which presents the problem of design and interpretation in scientific experiments:

“WHEN any scientific conclusion is supposed to be proved on experimental evidence, critics who still refuse to accept the conclusion are accustomed to take one of two lines of attack. They may claim that the interpretation of the experiment is faulty, that the results reported are not in fact those which should have been expected had the conclusion drawn been justified, or that they might equally well have arisen had the conclusion drawn been false. Such criticisms of interpretation are usually treated as falling within the domain of statistics…

The other type of criticism to which experimental results are exposed is that the experiment itself was ill designed, or, of course, badly executed. If we suppose that the experimenter did what he intended to do, both of these points come down to the question of the design, or the logical structure of the experiment…

Now the essential point is that the two sorts of criticism I have mentioned come logically to the same thing, although they are usually delivered by different sorts of people and in very different language. If the design of an experiment is faulty, any method of interpretation which makes it out to be decisive must be faulty too. It is true that there are a great many experimental procedures which are well designed in that they may lead to decisive conclusions, but on other occasions may fall to do so; in such cases, if decisive conclusions are in fact drawn when they are unjustified, we may say that the fault is wholly in the interpretation, not in the design. But the fault of interpretation, even in these cases, lies in overlooking the characteristic features of the design which lead to the result being sometimes inconclusive, or conclusive on some questions but not on all. To understand correctly the one aspect of the problem is to understand the other. Statistical procedure and experimental design are only two different aspects of the same whole, and that whole is the logical requirements of the complete process of adding to natural knowledge by experimentation."

Now, let’s put this in practice. Let’s say we have been good (or lucky) enough to have our experiment within the ones considered, as mentioned by Fisher, being so well designed to have the potential to lead to decisive conclusions. Now it comes to interpretation. Faulty interpretations may cause well designed studies to be faulty in their conclusions.

Experimental interpretation is based on the so called “Null hypothesis significance testing”. By applying statistical procedure, we support experimental hypotheses, we make choices about data and we estimate whether our results have occurred by chance alone, or if a different explanation can be found, which can be potentially extended into more general conclusions.

However, developing a mature and correct use of statistical testing to make interpretations is one of the most difficult challenges I am facing as a young scientist.

How do we improve the power of our interpretation? In my view, by a better understanding of the “instruments” we have, in other words (and coming back to the opening video), by a better understanding of statistics. For these reasons, I would like to bring to your attention these two very interesting pieces of work, which deal with interpretation and its implications.

Enjoy the reading!

Davide Filingeri
PhD Researcher
Environmental Ergonomics Research Centre
Loughborough University, UK

Misinterpretations of significance: A problem students share with their teachers. 
Haller & Krauss, 2002

The use of significance tests in science has been debated from the invention of these tests until the present time. Apart from theoretical critiques on their appropriateness for evaluating scientific hypotheses, significance tests also receive criticism for inviting misinterpretations. We presented six common misinterpretations to psychologists who work in German universities and found out that they are still surprisingly widespread – even among instructors who teach statistics to psychology students. Although these misinterpretations are well documented among students, until now there has been little research on pedagogical methods to remove them. Rather, they are considered “hard facts” that are impervious to correction. We discuss the roots of these misinterpretations and propose a pedagogical concept to teach significance tests, which involves explaining the meaning of statistical significance in an appropriate way.

On being sane in insane places
Rosenhan, 1972

It is clear that we cannot distinguish the sane from the insane in psychiatric hospitals. The hospital itself imposes a special environment in which the meanings of behavior can easily be misunderstood. The consequences to patients hospitalized in such an environment-the powerlessness, depersonalization, segregation, mortification, and self-labeling-seem undoubtedly countertherapeutic. I do not, even now, understand this problem well enough to perceive solutions. But two matters seem to have some promise. The first concerns the proliferation of community mental health facilities, of crisis intervention centers, of the human potential movement, and of behavior therapies that, for all of their own problems, tend to avoid psychiatric labels, to focus on specific problems and behaviors, and to retain the individual in a relatively non-pejorative environment. Clearly, to the extent that we refrain from sending the distressed to insane places, our impressions of them are less likely to be distorted. (The risk of distorted perceptions, it seems to me, is always present, since we are much more sensitive to an individual's behaviors and verbalizations than we are to the subtle contextual stimuli that often promote them. At issue here is a matter of magnitude. And, as I have shown, the magnitude of distortion is exceedingly high in the extreme context that is a psychiatric hospital.) The second matter that might prove promising speaks to the need to increase the sensitivity of mental health workers and researchers to the Catch 22 position of psychiatric patients. Simply reading materials in this area will be of help to some such workers and researchers. For others, directly experiencing the impact of psychiatric hospitalization will be of enormous use. Clearly, further research into the social psychology of such total institutions will both facilitate treatment and deepen understanding. I and the other pseudopatients in the psychiatric setting had distinctly negative reactions. We do not pretend to describe the subjective experiences of true patients. Theirs may be different from ours, particularly with the passage of time and the necessary process of adaptation to one's environment. But we can and do speak to the relatively more objective indices of treatment within the hospital. It could be a mistake, and a very unfortunate one, to consider that what happened to us derived from malice or stupidity on the part of the staff. Quite the contrary, our overwhelming impression of them was of people who really cared, who were committed and who were uncommonly intelligent. Where they failed, as they sometimes did painfully, it would be more accurate to attribute those failures to the environment in which they, too, found themselves than to personal callousness. Their perceptions and behavior were controlled by the situation, rather than being motivated by a malicious disposition. In a more benign environment, one that was less attached to global diagnosis, their behaviors and judgments might have been more benign and effective.

Fisher, R. (1935). The design of experiments. Retrieved from
Haller, H., & Krauss, S. (2002). Misinterpretations of significance: A problem students share with their teachers. Methods of Psychological Research Online7(1), 1-20. Retrieved from
Rosenhan, D. (1972). On being sane in insane places. Santa Clara Lawyer, 237-256. Retrieved from