Sachs's Astrology File
Three critiques that appeared in 1998. Two are from Correlation 17(1), the third is from Skeptiker 11(3).
Gunter Sachs, The Astrology File: Scientific Proof of the Link Between Star Signs and Human Behaviour Orion Books, London 1998, translated from the original German version published in 1997.
Abstract -- The book is promoted on the cover as a "world-wide best seller". Its blurb says "At last the proof which answers the age-old question: Is there anything to astrology?" and "If you do believe in astrology, here's the proof. If you didn't believe in astrology, you will now". Here the first critique by astrologer Peter Niehenke shows how flaws in Sachs's reasoning make his results meaningless. Sachs tested occupations (47 groups), cause of death, criminals, suicides, and compatibility. His base samples were often very large (millions of cases) and were from diverse sources such as the Swiss census of 1990. For each test he corrected for demographic effects in the total population and then applied a statistical chi-squared test to the observed and expected sign totals. There were many significant differences, which he saw as proving the effect of sun signs. But his results are largely meaningless. Many factors influence the distribution of births including climate, culture, and social class, so there are many possible explanations for the observed differences, all more plausible than astrology. And if the sample is huge, as here, even the smallest effect will become statistically significant. Furthermore, Sachs never checked the large number of existing studies, which often found differences opposite to his own. In short, his observation of significant yet contradictory results is only to be expected and does not necessarily have anything to do with astrology. The second critique by psychologist Suitbert Ertel points out that all data is tabulated. So despite the flaws in reasoning (see first critique) the data can be re-analysed by improved methods. Factor analysis of the sun sign data for occupations revealed no similarities between similar occupations such as industrial economy and political economy, or medicine and veterinary medicine, or medicince and pharmacy. Nor was there any general ordering such as sciences vs humanities. A similar analysis of the sun sign data for criminal behaviour was equally negative. As for Sachs's data for 350,763 marriages, sun sign books say the compatibility between two signs is much the same regardless of which one is male or female. But when the liking of one sign for another was ranked for each male sign, and again for each female sign, the mean rank correlation between male and female rankings for all twelve signs was a negligible 0.05, not even weakly significant (p = 0.88). So the sun sign books are wrong. Nor was it any better when Sachs's divorces were analysed in the same way. There was no hint of the links expected if Sachs's results were genuinely due to astrology. This critique is a fascinating step-by-step look at how a sympathetic scientist applies powerful methods to astrological ideas, and at how good science sorts out bad science. Altogether the next best thing to doing the work yourself. The third critique by statistician Herbert Basler finds an abundance of statistical errors and is more suitable for technical readers. Most of the print-media reviews of Sachs's book were scathingly critical of how he interpreted his results. But most reviews seemed to agree that the calculations were correct, which is not the case. Sachs regularly makes mistakes, for example in the number of results said to be significant, using the wrong average to compute certain deviations, neglecting one tail of the statistical distribution (so his reported significance levels are inflated), miscalculating the statistics in his "simplified example" (which describes as nonsignificant a drug that cures 75% of cases, whereas if the illness were AIDS it would win a Nobel prize), using samples that differ from the comparison sample (there are 8.81% Pisces births but only 8.54% Pisces deaths, so 0.27% Pisces must be immortal, suggesting that statistics is something done by Zahlenfriedhofsgaertner, gardners who work in the cemeteries where dead numbers are buried), and wrongly assuming that the comparison data are stable (his sun sign births during 1925-60 differ from those during 1954-76). Such frequent errors would make us doubt Sachs's claims, even if everything else was acceptable (which, as the other critiques show, is not the case). Nevertheless Sachs's book does have one good point -- it contradicts the idea promoted by astrologers that astrology cannot be tested by statistical methods.
In 1998 the British newspaper Daily Mail ran two articles based on Gunter Sachs's new book The Astrology File. The articles claimed that "scientific analysis has proved a link between the stars and how you lead your life" and that "no one has truly analysed in meticulous detail whether human behaviour and astrology really do have a link".
The authors are respectively Dr Peter Niehenke, one of Germany's leading astrologers, Professor Dr Suitbert Ertel, a longtime astro-researcher and Gauquelin expert, and Dr Herbert Basler, an eminent statistician. Each author approaches Sachs's book in his own way but all reach the same conclusion. All articles first appeared in 1998, Niehenke and Ertel in Correlation 17(1), Basler in Skeptiker 11(3).
The Astrology File: scientific proof of sun sign effects?
To add to recent critical interest in sun signs, I would like to comment on a recent German book by Gunter Sachs called Die Akte Astrologie: Wissenschaftlicher Nachweis eines Zusammenhangs zwischen den Sternzeichen und dem menschlichen Verhalten [The Astrology File: a scientific proof of a relationship between signs of the zodiac and human behaviour]. Munich: Goldmann, 1997.
According to the jacket blurb, "The aim of this study is not exactly unambitious: Up to now, the hypothesis that there is a relationship between the signs of the zodiac and human behaviour has been a matter of faith. With the help of statistics, Gunter Sachs has now approached the problem with impartiality. Step by step, he produces proof that astrology is not a myth. It is based on measurable foundations."
Anyone familiar with astrological research will read this with some astonishment. It implies that no attempt has previously been made to test the accuracy of zodiac signs by statistical methods, and that strong proof is now being presented. Both points are incorrect.
The content of the study
Sachs' book has attracted an extraordinary amount of public interest, with many talk shows and long interviews, even in magazines such as Der Spiegel that are not noted for being interested in astrology. Such publicity would normally be welcomed by us astrologers, so it is distressing that I have to point out that Sachs' results are largely meaningless, being due to flaws in the reasoning. Neither Sachs nor his advisors, which included the Statistical Institute of Munich University and the head of the Institut fur Demoskopie (a respected German opinion research institute), had noticed these flaws. Furthermore, it is clear that nobody had suggested checking the large number of existing studies, as is usual in worthwhile scientific activity. Such checks would have quickly identified the flaws, and would have quickly revealed that the tests planned by Sachs were a waste of time.
The flaws in the reasoning
(2) Because many effects exist in the data, we will observe variations no matter how the data are grouped, whether by tropical sign as in Sachs' case, or by sidereal sign, or by month, or by any other period we care to choose. If the fact that variations exist between individual groups were sufficient proof (which is what Sachs claims), I would be able to prove the effectiveness of all these groupings. Of course, as Suitbert Ertel has pointed out, if it could be shown that, of all possible groupings, the tropical signs produce the most significant results, this would be a good start towards establishing the reality of signs. It would not of course be proof, but it would be a good start. But Sachs is not aware of this basic requirement.
(3) As discussed in every statistics textbook, the chi-squared test measures the significance of the effect, not the size of the effect. Given a large enough sample, even the smallest effect will become highly significant. In this case, as noted above, many effects are present, therefore Sachs' discovery of significant results is only to be expected. But we are not entitled to claim that they have anything to do with astrology. Nor are we entitled to claim that they are of practical importance. For example, suppose a certain sun sign in a sample of 100,000 plumbers is found to be significant at the p = 0.000,000,0001 level by chi-squared test with df = 1. At first sight this seems amazing. But it corresponds to only 8.94% of plumbers having that sun sign instead of the 8.33% (= 100/12) expected for a uniform distribution. Not amazing at all. But such is the power of large samples!
(4) The chi-squared test as applied here by Sachs is actually invalid because it assumes that the expectancies are exact theoretical values with no inherent variance due to sampling errors. But Sachs' expectancies are based on observed frequencies that are subject to sampling errors. Consequently the expectancies have a variance for which the chi-squared test makes no allowance, and which effectively inflates the observed significance.
The above flaws could not be more elementary, yet they were missed by the professional statisticans that Sachs consulted, including those from the Statistical Institute of Munich University. It is more than disappointing when a professional body, meant to help people planning research and applying statistical methods, does not do its job well.
A better approach
However, when we compare the nurse distributions observed by Smithers with those observed by Sachs, we find huge differences. Thus for no less than eight signs (TA, GE, CN, LE, VI, SC, SG, PI) a positive deviation from the mean in one distribution is matched by a negative deviation in the other. If we do not wish to assume that an absolute difference exists between being a nurse in Britain and being a nurse in Switzerland, then such differences show how the varying birth rates of nurses in the different periods of the year can bear no relationship to the sun signs. Such differences are nothing special and have been known about for a long time, see Dean and Mather (1977, pp.114-117). In this case it is just one more of the problems that Sachs would have avoided had he made a literature search before starting work.
Reply by Sachs
Dean G, Mather A, and 52 others (1977). Recent Advances in Natal Astrology: A Critical Review 1900-1976. Subiaco WA: Analogic.
Huntington E (1938). Season of Birth: Its Relation to Human Abilities. New York: Wiley.
Smithers A (1984). The zodiac test. Guardian [Manchester], March 19, 20, 21 and 22. Summarised and re-analysed by Dean GA, Kelly IW, Rotton J, and Saklofske DH, The Guardian Astrology Study: A Critique and Reanalysis, Skeptical Inquirer 1985, 9, 327-338. Further comments appear in Correlation 1987, 7(1), 26 and 7(2), 22-25.
The articles in Esotera are: Niehenke P (1998). Spektakulaerer Fehlgriff nach den Sternen. Esotera January 1998, pages 78-81. Sachs G, Schwenk HW, and Kuenstler R (1998). Fragwuerdige Ausfuehrungen. Replik auf Dr Peter Niehenkes Stellungnahme zum Buch "Die Akte Astrologie." Esotera March 1998, pages 14-15. Niehenke P (1998). Meine Kritik nicht verstanden. Antwort von Dr Peter Nienhenke auf die Stellungnahme des IMWA-Instituts in Esotera 3/98. Esotera April 1998, pages 14-15.
The articles in Meridian are: Niehenke P (1998). Der Denkfehler von Gunter Sachs. Meridian January/February 1998, pages 7-9. The whole issue focussed on the Sachs' study. Sachs G, Schwenk HW, and Kuenstler R (1998). Replik auf Dr Peter Niehenkes Stellungnahme zum Buch "Die Akte Astrologie." Meridian May/June 1998, pages 56-57. Niehenke P (1998). Dr Peter Niehenkes Entgegnung. Meridian May/June 1998, page 57.
Scrutiny of Gunter Sachs's excursion into astrological research
Famous photographer and publicity expert Gunter Sachs made an excursion into astrology. Die Akte Astrologie (The Astrology File), Sachs 1997, promises to deliver "Scientific proof of a relationship between sunsigns and human behaviour" (the book's subtitle). But after reading the report, and after checking its tables, I have to conclude that this promise is not fulfilled. The approach does not provide a proof of astrology, nor is it even scientific. Why? Because it lacks scientific rigor, the readiness to search among the results of one's actions for possible errors. The book thus lacks one of the most indispensible guidelines to which the scientific enterprise is committed, ie critique, and to which Sir Karl Popper, one of its pioneering proponents, drew ample attention.
But this is no grounds to choke down applause for some of the book's good features: Its style is elegant, the chapters entertaining. Above all, the author's methods are made transparent, tables and material for scrutiny are provided. But alas, the scrutiny itself is left to the reader. So let me have a close look at this work. Suppose it had been my task to test statistical connections between sunsign and behaviour. What procedures would I have adopted, and what would I have avoided?
1. First, I would have considered pertinent studies by other scientists in order to take advantage of existing insights and results. Sachs does not do so, and is therefore not aware of the contradiction between his own seemingly positive observations and those of others (such as Gauquelin 1978, 1981) that are entirely negative. The contradictions would have raised my suspicion: What is wrong, who errs, could it be me?
2. If like Sachs I had made no attempt to take notice of pertinent literature, I would have conceded the matter in the book's introduction. I would have admitted that I was actually not informed enough to ask (as Sachs does on page 210), "How was it possible that no one -- no science and no power of the modern world -- has ever tried up to these days to get upon the track of ancient astrology?" This question will seem ludicrous to anyone familiar with the numerous published reviews of astrological research such as by Dean & Mather (1977), Eysenck & Nias (1982), and those provided by Pottenger (1995) and Kelly (1997).
3. Similarly, I would not have dared to claim, as Sachs does on pages 210-211, that "During our work we noticed much ambivalence regarding astrology. Ministers, university rectors ..., even though widely interested in astrological research, will never profess such curiosity." Apart from the numerous reviews just mentioned that attest to this curiosity, at least one university professor -- having gone public with his research into astrological matters and being eager to cooperate -- attempted to contact Sachs's group twice but received no reply.
4. Having obtained birth data, how would I analyse it? Let me first describe what Sachs did. He used two types of statistical design, applying the first one to questions like "Is sunsign at birth related to occupation?" For this he listed architects, bakers, butchers, educators, farmers etc, in columns, with rows giving the breakdown of deviations from expectancy by sunsign (Aries, Taurus, Gemini etc). He also included categories other than occupation such as fields of university study (biology, economics, medicine etc), criminal behaviour (theft, fraud, forgery etc), causes of death (asthma, diabetes, cancer etc), and make of automobile (Audi, BMW, Ford etc). Sachs reasoned like this: If a chi-squared test of sunsign frequencies gives a significant result, then the variations are caused by astrological factors.
If Sachs had asked me to comment on this rationale, I would have warned him that his demographic corrections regarding expectancies were far from sufficient. Broad demographic aggregates, Sachs's basis for expectancies, level off all local variation and might therefore give rise to artifactual discrepancies between expectancy and observation for subsamples (a point further discussed in Niehenke's letter). The size of such discrepancies is generally negligible, however, they will inevitably reach statistical significance whenever the sample size is large enough (and Sachs collected masses of data, eg nearly four million birth dates for his occupations). I would also have warned Sachs that even if part of what he found was more than artifact, other factors, unrelated to sunsigns, might account for that, for example seasonal factors at birth (temperature, sunlight, stability of weather conditions, constancy of maternal care etc, see Wendt 1978, and Niehenke's critique). Being obliged to observe scientific caution, I would have suggested to test the more straightforward causal rivals first -- see my Appendix 1 for details.
5. Nevertheless, supposing I wanted to find proof for astrological claims, I would certainly have looked at astrology's teachings and derive from them in advance precise hypotheses such as "If astrology is true, then an association should exist between sunsign X and, say, occupation Y." Sachs ignores such hypothesis testing, significant deviation from expectancy, wherever they may turn up, are all hailed as instances of proof for the ancient wisdom. But suppose Sachs, after collecting the data without any particular hypothesis in his mind, had asked me to make the best out of his many numbers, I would have suggested him to take a close look at the total of his results and try to find, after the event, as much sense in them as possible. This would require comparing, combining, separating, ie the ordering of observations. At which point, however, he would eventually have discovered that his results actually do not make sense. Why? Let me explain.
First I looked at Sachs's sunsign results for his ten areas of university study. To these I applied a multivariate procedure called principal component analysis (PCA), which is a standard method for reducing a large number of variables to a small number of underlying factors that collectively explain most of the variation. Each of the 10 study variables (see Table below) is defined by 12 sunsign deviations (differences between observed and expected counts for Aries-born students, for Taurus-born students etc). The 12 deviations for one study area can be correlated with the 12 deviations for every other study area to give a 10x10 correlation matrix suitable for PCA analysis. The results suggested that most of the variation could be explained by three or four factors. Four factors were therefore Varimax-rotated (which maximises their separation and maximises their loadings) yielding the distribution of factor loadings shown in Table 1.
Table 1. Varimax-rotated factors underlying sunsign variations of university students (Sachs's sample)
Field of study Factor: 1 2 3 4
Each entry in the table is a factor loading, thus the sunsign variation among architecture students correlates 0.23 with factor 1 and 0.68 with factor 2. To interpret each factor we simply group together the fields of study on which it loads the most. Selecting all fields with substantial loadings (in excess of 0.40), for factor 1 we have veterinary medicine (0.84), political economy (-0.80), and psychology (0.66). For factor 2 we have industrial economy (-0.94), dentistry (0.69), and architecture (0.68). For factor 3 we have biology (0.87), law (0.57), and dentistry (-0.41). For factor 4 we have pharmacy (-0.80), medicine (0.78), and law (0.62).
Having identified the main fields of study underlying each factor, we examine them to see what they have in common. But they seem to have nothing in common. What has dentistry to do with architecture (factor 2), or biology with law (factor 3)? We might expect certain fields to occur together in the same factor, such as industrial economy and political economy, or medicine and veterinary medicine, but they do not. Even in factor 4, the possibly related fields of pharmacy and medicine are of opposite sign, so they are, if at all, related only negatively, which contradicts expectation. In other words none of the factors can be interpreted. Sunsign-based preferences of fields of study have failed to reflect, as they should, obvious similarities between them. Nor has any general order such as sciences vs humanities become manifest.
I repeated the analysis using Sachs's results for various types of criminal behaviour, and obtained a similar chaos. At this point I would normally resign and conclude that zodiac-based variation among students or criminals is mere noise. But I have still to consider Sachs's second type of statistical design.
6. Sachs applied his second design to synastry questions in marriage such as "Do Taurus men prefer Libra women to Gemini women?" or "Do Leo women prefer Cancer men to Sagittarius men?" Using 358,763 marriages from Swiss records, Sachs derived 12 x 12 = 144 man-woman (same as woman-man) sunsign combinations, and submitted each of the 144 deviations from expectancy to a chi-squared test. Significant values popped up here and there, too many to grasp by eye (the full matrix is given in my Appendix 2), so for a clearer outome I turned to Spearman rank correlations between the various deviations. For example, it is reasonable to assume that if an affinity exists where, say, Aries men like Leo women, then this affinity should have some correspondence with the liking by women, ie Aries women would generally be expected to like Leo men. We might even doubt the plausible correspondence and allow for an inversion, ie hypothesize repulsion instead of liking, where Aries women dislike and therefore do not marry Leo men. To check if either was supported by Sachs's results, I listed the deviations given by Sachs for each combination of Aries men and non-Aries woman, and for each combination of Aries women and non-Aries man, as follows:
Partner's sign TA GE CN LE VI LI SC SG CP AQ PI
The more positive the deviation, the more popular the combination, and therefore the more the liking between signs. I then ranked the deviations, highest first, and obtained the following ranks for the rankings by Aries men and Aries women:
Partner's sign TA GE CN LE VI LI SC SG CP AQ PI
When ranked in terms of signs, with the most popular combination first (the two signs like each other the most) and the least popular combination last (the two signs like each other the least), the results are as follows:
In the above table some sunsigns such as LE seem consistently liked, others such as SC seem consistently disliked, while others such as SG show no consistency. Ranking signs in this way hides the fact that the differences are small throughout, for example the most liked sign is typically only about 6% more numerous than expected by chance, the least liked sign is only about 6% less numerous than expected by chance. If there are influences at work at all, they cannot be large. But rather than dwell on individual signs it is more important to know the general trend. Is the order of liking by Aries men related to the order of liking by Aries women? This question is easily answered by calculating the Spearman rank correlation rho between the two sets of rankings, which in this case gives rho = 0.18, an insignificant value of correspondence. This is just for Aries. What of the other signs? I analyzed these in the same way and obtained the following mean correlations between the order of liking by men and the order of liking by women:
Sign AR TA GE CN LE VI LI SC SG CP AQ PI Av
Half the correlations are positive and half are negative, whereas we might expect the direction to be consistently large and positive. The highest correlation of 0.63 is individually significant (p = 0.03) but not when corrected for the number of tests (np = 0.38). The mean correlation of 0.05 is not even weakly significant (p = 0.88 for a sample of 11 pairs).
In the above tests each sunsign combination occurs twice, first as say AR-LE and again as LE-AR, so the 11 correlation rho's are not completely independent. Each isolated mistake, each glitch in the data, could therefore make itself felt on two separate occasions. To avoid this problem I applied the same Spearman analysis to the 12 x 11/2 = 66 pairs of unique combination AR-TA, AR-GE, ... AQ-PI. The mean correlation rho was 0.075, again not even weakly significant (p = 0.53). The outcome was not any better when I applied the same strategy of analysis on Sachs's divorce numbers, the correlation being -0.12 (p = 0.31). Altogether there is no hint of the correspondences we would expect if Sachs's results were genuinely due to astrology. Instead, as before, there is only noise.
In summary, what I would not have done in the first place to publish such noise nor would I have pretended that the results were scientific and meaningful. I would have been aware that publicizing this chaos of numbers could damage the reputation of other approaches into astro-psychological relationships conducted by those who live up to the ideals of Popper's critical science.
However, even if I had neglected all this, I would have returned to my hobby (photography) less hastily. In his final chapter titled "Astrology Adieu" Sachs sets an abrupt end to his excursion, pictures, he says, attract him more than numbers. I do not argue hedonic values. But before leaving the stage I would first have invited critical readers to discuss the book's formidable message. And before eventually leaving the battle field, probably with losses, I would nevertheless make the masses of my original data available for other researchers who would, more thoroughly than I did, want to test possible correlations between the variables birth, death, crime, marriage, divorce, profession, etc, and cosmo-ecological factors.
After the present article was completed, another critique of Sachs's "Akte Astrologie" by statistician Herbert Basler (1998) appeared in Skeptiker. Basler points at errors in the statistics as applied by Sachs, some of which had escaped my attention. Basler also provides an extensive explanation for possible sampling artefacts as noted above in (4), second paragraph. [Basler's article follows this one.]
Basler H (1998). "Die Akte Astrologie" von Gunter Sachs aus Sicht der mathematischen Statistik. Skeptiker 11 (3), 104-111.
Dean G and Mather A (1977). Recent Advances in Natal Astrology: A Critical Review 1900-1976. Perth, Western Australia: Analogic.
Eysenck HJ and Nias DK (1982). Astrology: Science or Superstition? New York: St Martin's Press.
Gauquelin M (1978). Statistical Tests of Zodiacal Influences. Part I: Profession and Heredity. LERRCP, Paris.
Gauquelin M (1981). Statistical Tests of Zodiacal Influences. Part II: Zodiac and Character-Traits. LERRCP, Paris.
Kelly IW (1997). Modern astrology: A critique. Psychological Reports, 81, 1035-1066. The critical reviews are listed on page 1037.
Niehenke P (1998). Letter to the Editor. Correlation 17(1), 41-44.
Pottenger M, ed (1995). Astrological Research Methods. An ISAR Anthology. Volume I. Los Angeles: International Society for Astrological Research.
Sachs G (1997). Die Akte Astrologie: Wissenschaftlicher Nachweis eines Zusammenhangs zwischen den Sternzeichen und dem menschlichen Verhalten. Goldmann: Munchen. Subsequently published in English as The Astrology File: Scientific Proof of the Link between Star Signs and Human Behaviour. London: Orion, 1998.
Wendt HW (1978). Season of birth, introversion, and astrology: A chronobiological alternative. Journal of Social Psychology 105, 243-247.
Ertel Appendix 1
The sunsigns (Aries, Taurus, Gemini, etc) form a sequence of categories that is discontinuous. That is, symbolic differences (such as those based on polarity, triplicity and quadruplicity) between adjacent signs such as Aries and Taurus are not smaller than between non-adjacent signs such as Aries and Gemini, and are possibly even larger.
By contrast, non-astrological factors aligned on the calendar scale (January, February, March etc) are continuous. Thus differences in temperature and rainfall are smaller between adjacent months such as January and February than between non-adjacent months such as January and March. The contrast between astrological factors (discontinuous) and non-astrological factors (continuous) allow a straightforward test.
For example if the differing counts of, say, professions in the twelve sunsign groups reported by Sachs were due to sunsign symbolism, they would be detectable only if the temporal boundaries of sunsign categories are strictly maintained. Consider people born between 21 March and 20 April; these are Aries people. Shift these dates arbitrarily by, say, one week to 28 March and 27 April; of the people born in this period, two-thirds are Aries people, the rest are Taurus people. Thus, by only slightly shifting temporal boundaries, the twelve samples are severely spoiled, at least in terms of sunsign astrology (Aries and Taurus are supposed to be quite different, like apples and potatoes). But there is no severe spoiling in terms of non-astrological seasonal factors.
Now, if the differences of counts among people grouped by sunsign (Sachs's differences) were due to sunsign factors, the variation of counts between those groups would be maximal. So the variation should rapidly dwindle away as the sign boundaries are successively shifted a day at a time for each new computation; and the variation between groups should reach its lowest level at a shift of around 14 days. On the other hand, if the variation between sunsigns had nothing to do with sunsigns and was due to seasonal factors, the variation should stay more or less at an average level, perhaps with smooth oscillations above and below the average level.
Ertel Appendix 2
Men Women --
"Die Akte Astrologie" by Gunter Sachs
The following critique appeared in Skeptiker 11 (3), 104-111. It has been translated from the original German article by the Dutch statistician Dr J W Nienhuys. Some parts have been condensed to save space. Dr Basler is a well-known statistician whose textbook Grundbegriffe der Wahrscheinlichkeitsrechnung und Statistischen Methodenlehre is in its 11th printing. In general Basler focusses on statistical faults and not on design faults, even though the latter are usually more important.
"Yet another criticism" many readers will think, reflecting on the fact that the 1997 book by Sachs received much attention from the media, and that most of the reviews in the print media were scathingly critical of Sachs's claims. But most of the reviews criticised only his interpretations of the statistical results. All reviewers seemed to agree that the results themselves were correct -- except for "small superficial errors" noted by the reviewer of the Frankfurter Allgemeine. For example the Suddeutsche Zeitung says that "A statistics professor of the university of Munich has declared in writing that the study is statistically correct."
This article will counter such views. Indeed, during his calculation of the statistical results, Sachs regularly makes mistakes in his analysis. For example, in the exposition of the connection between starsigns and suicide (extensively discussed by Niehenke) we find the following: Sachs claims that the actual number of suicides deviates significantly in 5 out of 12 cases from the corresponding chance expectation. But calculation shows that this is wrong in four cases, or at any rate these four deviations are not significant at the level indicated by Sachs.
Allensbacher Institute star sign study: Comparing the incomparable
The deviations in the graph, in percent, starting with Capricorn (January) and ending with Sagittarius (December) are: 0.6, 0.4, 4.5, 0.8, 2.2, 3.4, 2.7, 0.5, -1.2, 1.0, -0.3, 0.9.
If we look at these deviations, we see immediately that there are more positive deviations than negative deviations, whereas we would expect them to even out. In other words we suspect that an error has increased the positive deviations. On p.290 we then read that the Gemini deviation is highly significant with 3.4%, and similarly Pisces with its even greater deviation is also significant. So Allensbach says that there is a highly significant relation between the star signs of the polled people and their answers.
Naturally we suspect that this isn't so, as the positive deviations are probably erroneous. This suspicion is correct. The "average" refers to all 13,283 people, and not to the 10,758 people that gave their birthdays. When we compute from the given numbers of polled people and the given percentage, the number of yes answers (e.g for Capricorns 837 x 18.2% = 152), we obtain altogether 2038 yes answers, which is 18.9% of the whole group with known star sign, rather than the inadmissible value of 17.6% that Allensbach marks as the "mean value". If Allesnbach had used the correct mean value, then the deviations would not have been "highly significant" (an actual test would have used the chi-squared test in a 4-field table, which does not give a significance level of 1%). [Note by translator: I think we should first make a 12x2 chi-squared test on all signs together before picking out the high values. Thus for all signs together I get p=0.24, indicating that nothing special is happening, so there is no need to look for outliers in individual signs. This matter is discussed at length in Basler's follow-up article, not included here.]
These frequent errors makes us doubt all evaluations done in the Allensbach study and their reported significance, for example on the interest people take in politics, environmental issues etc. Readers may wonder how such a serious error can occur, where we end up comparing incomparables. Maybe readers will think this reviewer is wrong. Maybe they will think that that it wouldn't have made any difference since the 10,758 polled persons with star sign and the 2525 without would answer similarly. But this is not the case. From the given percentages we can compute the total number of yes answers as 2238 for the "star sign group" and 300 for the "no star sign group", which means 18.9% yes compared to 11.9% yes. (Here a chi-squared test of the difference gives p=10-16.) This means that the target question related to "building" was answered in a different way by people not willing to give their birthday.
This effect, where answering behavior differs between responders and non-responders, is very inconvenient for polling purposes. Also, the strength of the effect is different across the 923 questions, so a uniform correction is not possible. This problem in demoscopy is well known, namely that non-responders influence the aggregate result. It appears here in a very harsh light.
I won't go into the methodological objections to the execution of very many single tests with the same material (which is actually mentioned by Sachs as the "multiple test problem"), or the lack of a global comparison for all 12 zodiacal signs (which is what Sachs requires of himself in the first part of the book, see p.48.). Despite these problems, the Allensbacher Institute calls its investiation a mere "pilot study", as a kind of prophylactic plea for forgiveness of their statistical sins.
Star signs and suicide
The text is seemingly clear but it needs additional information for a statistical investigation. In the previous chapter, "Who dies of what?" we read (p.149 ff): "As basic material we had the dates of all men and women that died in Switzerland from 1969 through 1994, divided according to their star sign and 32 different causes of death. Altogether this concerns 1,195,174 death cases. In agreement with the advising statisticians, this database was processed by leaving out death causes for which the numbers were too small, and those that lacked meaning according to astromedicine. In this way "the decisive data material was reduced to 20 causes of death and 687,850 death cases", in other words a reduction of more than 40%. The 20 selected causes of death do not involve suicide, so the table heading "Deathcases Switzerland 1969-1994" means only non-suicides with 20 causes of death.
For the statistician a connection between star sign and suicide should be tested by means of a 12 x 2 chi-squared test. This asks whether any of the 12 empirical suicide fractions (eg 2725 out of 30,358 = 8.98% for Aries) and the corresponding fraction for the whole sample (eg 61,582 out of 687,850 = 8.95%) are significantly different.
It is therefore immediately clear that Sachs makes a modelling error. His chi-squared test assumes the suicide fraction 8.95% for Aries constituted a portion of all Aries deaths, and this is certainly not the case. [That is, the two samples are not properly matched, see later under the heading "immortal Pisces".] The fractions of the different star signs are not constants, neither for "the death cases", nor for "the births", as we shall show.
The annual fluctuations in the numbers are relatively small in large populations, so that in this case the wrong test and the right test do not give very different answers. Nonetheless, these small fluctuations can explain the otherwise strikingly significant results. Remarkably Sachs doesn't make this mistake in the chapter "Who dies of what?". There he applies the correct tests when he compares the separate death causes and the star signs.
- Taurus, Cancer, Sagittarius p=0.05 each
If we use the correct test for these five star signs we find (in the same order as above) the correct significance levels are 0.02, 0.07, 0.09, 0.02, 0.002. So only for Taurus is Sachs right.
It is easy to explain such a serious discrepancy between Sachs and the correct values, simply because Sachs makes a distinction between positive and negative deviations when he discusses single signs. In technical terms he believes he is allowed to apply these tests one-tailed with p=0.05; but he then rejects the null hypothesis not only if the empirical deviation is in the 5% rejection region in the negative part, but also if it is in the positive 5%. This means that the supposed error probability of 5% is actually 10% instead! Which explains the difference between the five "significant" signs of Sachs and the correct test results.
A one-tailed test is allowed, but only if the investigator has decided before the results are known, which direction (positive or negative) shall apply. Such a decision typically occurs when earlier studies or theoretical considerations lead us to choose, say, a positive direction because a negative direction isn't interesting anymore.
The question whether such advance decisions for a one-tailed test have actually been made before looking into the data, sometimes leads to discussions between statisticians who deal with pure theory and those who deal with real data. In our case the matter is quite clear: Sachs nowhere makes any advance decisions. Sachs even proves this by making remarks that his findings "do not always conform to the picture drawn by astrological literature" (p.161). In other words he stresses that in each single test he is open to the possibility of positive and negative deviations, which means he should have applied two-tailed tests. Also we might remark that Niehenke says that astrology cannot make statements about zodical signs and suicide.
This one-tailedness error continually pops up in Sachs's book. For instance in his chapter "Who marries whom?" he examines every one of the 12 x 12 = 144 star sign combinations to see if any deviate significantly from the expectation. Because Sachs says he uses p=0.05 one-tailed, he effectively uses p=0.10 two-tailed, so among his 25 significant cells (pp.74 ff) we may expect 144 x 0.10 = 14.4 fake significances, with no way of telling which of these are actually fake. This recurring methodical problem (to be distinguished from the one-tailedness error) is treated in the book by Sachs and he calls it the multiple test problem (p.208), but he doesn't draw the appropriate conclusions.
(The licensed statistician Dr Rita Kunstler who advised Sachs writes p.208 that there was the possibility of a multiple test problem. But it was decided in the case of professions and illnesses to consider "light significant" results only as indications, not as proofs, because the number of individual tests is so large, namely 12 x 47 = 564. But even this precaution was the consequence of a "decision", and the advisor thought it was not imperative.)
If we want to be methodically unobjectionable we should act as follows: if we are dealing with a first investigation, we can observe statistical significance only in global independence tests. The results of tests on individual cells can be taken only as conjectures or indications, which should then be tested on new material, at which point it is allowed (before looking at the results) to settle on a one-tailed test. Only after that can we proclaim methodically flawless significance statements, for example that there is a highly significant negative connection between Aquarius men and Taurus women, as Sachs so hastily does (p.74). No doubt such statements greatly increase the book's profitability, which is not an unreasonable expectation because the costs so far of the study (below the 1997 factory price of a Porsche 911, see p.210) have probably already been recouped as bestseller royalties, even after taxes.
The above concerns about the "multiple test problem" also apply to the Allensbach study, because there the evaluation of the samples involved 12 x 923 = 11,076 single tests for the 923 questions.
Significance -- a magic word?
"Let us assume 100 people are ill with an unknown disease. All 100 receive a new medicine. 75 patients are cured. According to our criteria this is not significant, as the curing power of the drug is statistically not proved. When 95 of 100 sick people are cured, we can describe the action of the drug as lightly significant. Only at 99 cures can we call the curing power significant" (p.70).
Any lay person will object that something here must be wrong. Because if the unknown disease were incurable like AIDS, then the discoverer of a new drug that cures 75 out of 100 would be an immediate candidate for the Nobel prize, and the Nobel committee wouldn't bother about consulting a statistician. [Note by translator: If the drug produced 75 cures and also 25 deaths, the discoverer might be an immediate candidate for something other than praise for his lethal concoction.] But if we make the example more precise, for example by supposing the chance for spontaneous recovery to be 50%, then 75 cures in 100 patients would be very highly significant, well below p=0.001. (Actually a one-tailed test would be appropriate here, but even that would reach p=0.001 for only 66 cures.) [Note by translator: The author fails to mention how Sachs seems to confuse "the chance that a sick person gets cured" with "the probability that this or a weaker result would occur if the null hypothesis were true".]
This means that Sachs's example conjures up hopelessly exaggerated interpretations of significance. After this we cannot exclude the possibility that his bizarre misunderstanding is the reason for the rather unbridled astrological interpretations of what Sachs calls significant results. [The author then describes tests showing that Sachs's marriage results can be easily explained by assuming 4% of the population are influenced by astrology when making their marriage preferences, which is very small compared to the 50% or more who read newspaper horoscopes.].
Are there immortal Pisces? Or what is a sample?
Suppose we test the null hypothesis that the star sign distribution in the general population is identical with that in the death cases. From the Sachs book (pp.114-116) we take the Swiss census data of 1990, which consists of 2,731,766 births between 1925-1960. We take this as the star sign distribution in the general population, and as sample we take the already mentioned 687,850 death cases in Switzerland 1969-1994. The chi-squared test gives p=10-8 for 11 df. For example there are 8.81% Pisces births (240,677 out of 2,731,766) but only 8,54% Pisces deaths out of all death cases.
Such a result suggests that some Pisces must be immortal. A typical critical reaction would be: statistically correct, but factually probably incorrect, which would express once more the widespread opinion that statistics is something done by dumb calculators and gardeners at number cemeteries [Zahlenfriedhofsgaertner, gardners who work in the cemeteries where dead numbers are buried.] However, a more sensible conclusion from this example is: The set of death cases is not a random sample from the birth population, as far as the star sign distribution is concerned.
But what has this example to do with the comparison of suicides with death cases? The point is, in both cases we can see that the samples are not representative of the population to which they are being compared. This is clearly so in the case of the deaths (as nobody is immortal). And in the Sachs comparison of suicides with other deaths, the age and birth year are probably not the same. [Note by translator: True. People usually die of old age at around 75, whereas suicides are very roughly the same in all age groups, at least in the Netherlands, which makes the average suicide younger at death than a non-suicidee.] Sachs does not give information about this. Differences would not influence the comparison if the star sign distribution of births was stable across the years, as recognized by Sachs (p.107) when he states: "... the distribution of birth dates over the year is virtually the same over the years, so we can assume, for practical purposes, a relative constancy of the birth disribution across the years."
In fact, if we compare the Swiss star sign distributions for 1925-1960 with those (pp.108-110) for 1954-1976, they turn out to be highly significantly different. For example there are only 8.48% Pisces in 1954-1976 compared to 8.81% in 1925-1960. (Here a chi-squared test gives p=10-40 for df=1, which is so small that we don't even have to worry about the overlap in years). We might agree with Sachs that for "practical purposes" we might assume a relative constancy, but not for statistical investigations, where such large numbers mean that even the tiniest and most uninteresting differences would yield significant results. Large sample sizes turn every statistical test into a sharp sword that needs much training and care in handling. The danger is that such signifances based on tiny effects will lead to hasty and spectacularly wrong interpretations, for example astrological ones.
These remarks do not mean that all of Sachs's significant results will turn out to be false after elimination of errors. But they do require re-investigations in which, for example, data are used only if the people have the same birth years. As we already mentioned in the section "starsigns and suicide" further investigations are even more urgent because of the "multiple test problem".
But reason can be cunning
We have also seen how easily errors can creep in to undermine the results. But we have also seen how we can avoid errors by careful re-examination and investigations with new data. It would be great if Sachs himself would use these possibilities to continue his work with an Act II. If there would remain difficult-to-explain connections between star signs and human affairs, then these would be real hard nuts for astro-skeptics to crack. The road from Akte to Act II would then be a nice illustration of Hegels idea of the List der Vernunft [Cunning of Reason] which occasionally also promotes the advance of reason by small detours.