Thursday, December 24, 2015

Are todays experiments more unethical than Milgrams?

Everyone knows about Milgrams famous experiment. At least the patients I can hear from my office discussing it do. In case you don’t know it, here’s a short summary: 

Milgram has tested in the 60th of the last century how willing his participants (ordinary people without any special interest in harming others) were to follow orders from an authority (the experimenter) even if they had to put another person in danger for that. 

In his experiments the participant was given the role of a teacher whose task was to give electrical shocks to a learner whenever the learner did a mistake in a word-paring task. The learner was a conferderate and not really harmed by the electrical shocks. However the subject didn’t know this: he was given a trial-shock himself before the experiment in order to make the electrical-stimulator more believable and then placed in a separate room than the learner. 

In Milgrams first experiment 65% of participants continued until the final voltage of 450 V, even though they probably believed the learner was in danger(1, 2): The stimulator was labeled with “Danger: Serve Shock” at 375 Volt (and the four steps following that) and only with “XXX” at 435V and 450V. Furthermore the (supposed) learner pounded against the wall at 300V and 315V and is not heard afterwards, i.e. he doesn’t answer anymore. 

This experiment has been repeated several times by Milgram himself and others under slightly different conditions to find out which factors lead to obedience. However, in all the experiments a high proportion of participants “cooperated” until the final shock, despite experiencing high stress while doing so. (2)

Today, it is said, ethic committees would not allow this study anymore, precisely because of the high stress that was inflicted on the participants.

However, I wonder, what differentiates “modern” studies from that of Milgrams. Participants are still stressed and potentially harmed. I’m thinking about PTSD-studies where participants suffering from post-traumatic-stress-disorder are shown pictures related to their trauma, stress-studies where participants are stressed as much as possible in order to investigate the biological and psychological responses to stressors in healthy participants as well as in participants suffering from various disorders, pain-studies where the conditions which lead to more or less subjective pain or the pain-inhibitory system(s) is/are examined, conditioning-studies which involve learned helplessness and so on…

Now obviously the potential harm that is done to the participants is weighted against potential benefits of the study: the goal of such studies is to find mechanisms that make people more vulnerable to disorders or those that would potentially lead to the development of new treatments: after all, something has to be learned about disorders or suffering in general in order to understand and reduce it. 

But, I don’t know… 

Milgram explains (as one of 13 potential contributing factors for the obedience of his subjects) that “the experiment is, on the face of it, designed to attain a worthy purpose – advancement of knowledge about learning and memory. Obedience occurs not as an end in itself, but as an instrumental element in a situation that the subject construes as significant, and meaningful. He may not be able to see its full significance, but he may properly assume that the experimenter does.” (3)

All experiments should be designed to “[A]ttain a worth purpose – advancement of knowledge”, aren’t they? I think it would be bad if they weren’t (4)… but are they also significant in meaning?

I think, a study of which we don’t have any clue whether or not the reported results are (likely) true or not can’t be meaningful. (5)

Reproducibility in psychology is low (6) and most neuroscientific studies are underpowered (7). Ioannidis (8) famous paper shows that “most published research findings [in biomedical research] are false.” Negative findings remain often unpublished.

Yet, for all the p-hacked and unreproduceable studies a lot of resources were used (or wasted): Money that could have been spend otherwise, as well as time and effort of the participants and the people conducting, writing, publishing and reading the study. 

And maybe worse than that, some subjects suffer under the study – like the fake subjects (the learners) in Milgrams study did: they are given electrical shocks, shown awful pictures, brought into situations they fear or reminded on their worst times. 

Is this right? If there were true meaning, such experiments might be justified. The subjects sign a consent form and they know they are free to leave any time. But just like the real subjects (the teachers) in Milgrams studies, they usually don’t leave because they think they are helping science advance and develop new treatments. They don’t know about statistical problems, p-hacking and the pressure to publish anything out of a pile of underpowered noise. They don’t know the study they are participating in might be meaningless.

But I do. (9) I have tortured participants knowing the experiment is worthless. I will probably do similar things again. New study new hunt for statistical significances on the cost of participants. This is very false.

I don’t know what to do about it, I really don’t. 

Since not every study involves mental of physical pain for the subjects, I could concentrate on such studies or search for another job. But while that might allow me more sleep at night (likely not) it wouldn’t solve the problem (10). After all the studies are not stressful/painful/frightening for the participants because we want to torture them, but because it is seen as necessary for the “advancement of knowledge” about these states (11).

Therefore what remains is that studies should have sufficient power and be carefully designed to detect effects when present and to avoid unnecessary harm. Everybody agrees with that, yet it is not done.

As a PhD-student I'm not in the position to change that (and I don't know if anyone is). It is false to conduct worthless experiments, that (potentially) harm the participants and it is false to do nothing just to reduce own stress.

So what should I do?

 click on image to enlarge
______________________________________________

(1) In an interview after the experiment the subjects were asked what they think how painful the last shocks where for the learner on 14-point scale from “not painful at all” to “extremely painful” and the mean answer was 13.42. Milgram, S. (1963). Behavioral Study of Obedience. The Journal of Abnormal and Social Psychology, 67(4). [PDF-link] (see page 5 of pdf / page 375)

(2) Furthermore, according to Milgrams description many subjects were extremely nervous upon administering the high electrical shocks. They “sweat, tremble, stutter, bite their lips, groan, and dig their finger-nails into their flesh.” Milgram, S. (1963). Behavioral Study of Obedience. The Journal of Abnormal and Social Psychology, 67(4). [PDF-link] (see page 5 of pdf / page 375)


See also: 
Haslam SA, Reicher SD (2012) Contesting the “Nature” Of Conformity: What Milgram and Zimbardo's Studies Really Show. PLoS Biol 10(11): e1001426. doi:10.1371/journal.pbio.1001426 [PDF-link] (page 3)
“However, some of the most compelling evidence that participants' administration of shocks results from their identification with Milgram's scientific goals comes from what happened after the study had ended. In his debriefing, Milgram praised participants for their commitment to the advancement of science, especially as it had come at the cost of personal discomfort. This inoculated them against doubts concerning their own punitive actions, but it also it led them to support more of such actions in the future. “I am happy to have been of service,” one typical participant responded, “Continue your experiments by all means as long as good can come of them. In this crazy mixed up world of ours, every bit of goodness is needed” (S. Haslam, SD Reicher, K Millward, R MacDonald, unpublished data). […] what is shocking about Milgram's experiments is that rather than being distressed by their actions, participants could be led to construe them as “service” in the cause of “goodness.” […]”

(4) Which of course is possible: A goal of scientific experiments can also be to have something to publish or to “show” that one is right (even if that is not clear).

(5) The reasoning behind a study can still be meaningful of course. E.g. if a treatment were tested, that might be meaningful. But any study which tests it with getting a true result at or below chance-level isn’t imo.


(7) Button, K. et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365-376. doi:10.1038/nrn3475  


(9) And others know too. I just don’t want to speak for other people, because I don’t know what they really think.

(10) That’s what horrible persons always say, right? But I don't know whats right.

(11) It would of course still be possible to invite participants that feel stressed at the moment to the lab when they experience that emotion. But obviously this has clear disadvantages since there were much more confounding variables then. Probably the disadvantages are so big that this would do even more harm, when it can be done otherwise. But I don’t know. In some/lots of instances this is of course the only possibility anyways. In others I don’t know if it would make any sense at all. That would then be a total waste as well.

Also interesting: 
Blass, T. (1999). The Milgram Paradigm After 35 Years: Some Things We Now Know About Obedience to Authority. Journal of Applied Social Psychology, 29(5), 955-978. [PDF-link]
Milgram, S. (1974). Obedience to authority: An experimental view. New York: Harper & Row. [PDF-link]

Monday, December 14, 2015


(Sorry for grammar mistakes. If you see them write me a message (comment)).

Thursday, December 10, 2015

What is probability?

Let’s think of a lottery. What is the probability that you get a prize? Of course you don’t know because you don’t know how many lottery tickets are marked as prizes. But from the perspective of the Flying Spaghetti Monster we would know because the omniscient Flying Spaghetti Monster does know how many lottery tickets are marked as prices. Let’s say it is 10%. If we now pick a ticket, is it then correct to say that this one, this particular ticket has a 10% chance to be a price? It has already decided beforehand if it is a price or not. It either is a price or it isn’t, we just don’t know. 

We could of course restate our statement from saying the lottery ticket has a 10% chance of being a price to our chance of getting a price. That being an event in the future, it might seem less determent. But for everyone who holds a deterministic worldview, the same problem occurs. (And for everybody else as well, if the future is indeed determined. They just don’t have to worry about it.) It is already determined if we get a prize or not.




The way school book statistics – or frequentist statistics - deals with that is that they interpret probability as relative frequency. For example, the relative frequency of a coin landing heads up is about ½ or the relative frequency of rolling a 6 with a regular dice is about 1/6 in the long run.
The opposing – or Bayesian - way to look at probability would be to express your belief that a coin is landing heads up or you are rolling a 6. 

Obviously we need some prior experience or experiments to have a belief about the probability a coin is landing heads up. When asked to estimate how many blue balls are in a bucket of 10 balls we might not know and think that every number of balls from 0 to 10 is equally likely.

Let’s go back to the lottery example. When we look at it from a school book statistic viewpoint – of frequentist viewpoint – we do not know what the probability of getting a prize is unless we draw a lot of lottery tickets (with replacement) and calculate the relative frequency of getting a prize. Each one of the lottery tickets itself either wins or it doesn’t.

This might not be feasible, though, because our monetary means might be limited. So from the Bayesian point of view we could include our prior knowledge about winning a lottery into the estimate. If it says “Every ticket wins”, it should be ~100%, though we can doubt the value of the prize. If you can win only 3 cars (nothing else) and it is unknown how many tickets are given out, the probability is probably pretty low. If you set up the lottery and know that 10% of the tickets are marked as prizes you know the probability is 10%.

The important distinction is: From the frequentist viewpoint we can only know the relative frequency of getting a prize. From the Bayesian viewpoint we only know our degree of belief in getting a prize. Both probabilities are updated as more knowledge is gained – for example by experiments – but the updating is done in different ways. But that is beyond the scope of this post.

The question is: What is probability? Is the objective frequentist view the right one or the subjective Bayesian one? Or are both correct at the same time or in different situations?

Let’s look at another example: A doctor tells a patient he has a 50% chance of surviving the next 12 month. But in the end, the only thing we know for sure is that he’ll either die within one year of not. He can’t survive 50%. Of course we can say that 50% of people with the disease die within one year, but that is a statement about the population of people with the disease and not the individual. The individual or does or does not die. (Within one year.) 

Relative frequencies can only be obtained for the population of patients and not for the individual case. Repeated measurements are not possible unless the individual dead patient is able to arise of the dead.

So another possibility to say, that the patient has a 50% chance of surviving would be to say that the degree of belief (of the doctor) that the patient survives one year in 50%.

Only a matter of terminology or a really fundamental difference? What do you think?

Monday, December 07, 2015

Being blind to faces?

Well, I myself am well able to recognize faces and distingue them from other stuff, like wardrobe or household items.

But I can’t distinguish between different faces very well.

Yesterday I took part in a pilot experiment of a college. My task was it to do the N-back task with faces. The N-back task works the following: You are presented pictures of faces, one after each other. If you see the same face twice, with one other face in between, you should press a button. E.g. if the sequence is A, B, C, B, D, B, D, A, B, C, B a button press should follow at every “red image” A, B, C, B, D, B, D, A, D, C, B.

The task itself was not supposed to be difficult. My colleague wanted to modulate the response time / accuracy by other means. For me however, the task was impossible to solve, because all faces looked the same.

It was actually quite surprising to see that one should be able to distinguish these faces. I always knew that I was terrible at that – that I can’t watch (or understand) movies because all the actors look the same and I have no chance to distinguish them – but I did not know that other people get an impression of a face within like 2 seconds or so.

So I thought, it might be interesting for other people if I try to explain "how" I see faces:

The strange thing is I am able to distinguish different persons from one another – if I have had sufficient training time. People have a lot of characteristics that are not bound to their face: like their voice, their build, and their style of clothes (or the specific clothes they wear), or – quite close to the face but still good hints – their hair color and length: Once a colleague of mine, who used to have a beard, removed it. That confused me a lot. I knew it was his voice, but he didn’t look like him. It took me half a day or so – and others calling him his name – to figure out that he was still him, despite the change in appearance.

Some might think: Well, if the removal of his beard confused you so much, than you must have remembered his face, with his beard, to be confused about that in the first place. And that is true, but I think I probably remembered just his beard and not the rest of his face and that is, why he didn’t look at all like him, once he removed his beard.

Other people however, can change their facial hair without me even noticing. A little unfair, I know, but I just recognize them by something else than their facial hair. My father for example has an enhancement (pimple like but permanent) right at the tip of his noise. A feature I liked as a child, because it made him easier distinguishable from all the other men: I knew the man with this nose was my dad.

For other people I don’t know how I recognize them – and if I don’t know them quite well, there is a high chance I don’t recognize them at all. Or I may think two "totally different looking" persons were one and the same (which can be quite confusing), until I notice the difference. Then the difference between the two people becomes obvious to me aswell, leaving me wonder why I didn't see the obvious before.

However, with enough time to learn, I’m not unable to memorize faces, or the visual pattern that makes a face. It is just not easier for me to learn “a face” than to learn another pattern of similar complexity. Houses for example, seem to me a lot easier to distinguish than people, but maybe that is just because they are. What do I know? But take brains for example. Most people wouldn’t say brains, or anatomical pictures of brains are easy to distinguish from one another. I wouldn’t say that either. But contrary to the belief of some, they don’t look all the same. There are substantial differences, maybe even more than in faces, yet they can be difficult to detect – just like differences in faces.

Now for me it would be interesting to know who other people (in their own subjective experience) distinguish different faces from one another.


P.S.: I'm just speaking about myself, not anyone else. 

Unfortunately  have not yet read the book by Oliver Sacks.