There are , Mark Twain wasfamously fond ofdeclaring , three types of lies : lies , imprecate lies , and statistic . It ’s a succinct summing up of something we ’re allkind of mindful ofin our bones , even if we do n’t live the precise explanation for it : that statistic can’tentirelybe trust – they ’re simply too easy to manipulate for villainous purposes .

Chief good example : Simpson ’s paradox . Beloved bybad statisticians who do n’t realize it , and very well ones who definitely do , this phenomenon is herculean enough to all turn back correlativity in the data – and alltechnicallywithout severalize a single lie .

So , what is it ?

A table of success and fail showing population-wide, male, and female data

A table of success and failure of a treatment vs. control showing population-wide, male, and female data.Image credit: IFLScience, adapted from Stanford Encyclopedia

What is Simpson’s Paradox?

Imagine you ’re a doctor decide whether or not to prescribe a sure intervention for a patient role . You have the undermentioned information :

What ’s the obvious course of action ? For both male and female subjects , the treatment do better than the control communications protocol , and your affected role is most probable one of those two option – but combine the two groups , and it seems to be ineffective . How can both these things be lawful ?

“ Sir James Young Simpson Paradox is a statistical phenomenon that pass off when you coalesce subgroup into one group , ” statistician Jim Frost explain in a station for his websiteStatistics by Jim . “ The process of aggregating data point can cause the manifest counseling and potency of the relationship between two variables to exchange . ”

50 squares comprising 30 blue and 20 red

Feels like an obvious win for blue, right?Image credit: IFLScience

The “ paradox ” was first noticedback in 1899 , but it was n’t until the 1970s that it got itsGroeningesquemoniker , when mathematician Colin Blythnamed itin honour of the codebreaker and statistician Edward Simpson , who had presented a detailed analysis of the effect ina now - far-famed 1951 newspaper .

These day , understanding the phenomenon is more important than ever , as it ’s utilized by spoilt actors who want to spread misinformation about COVID-19 or vaccines , or promote unscientific and bigoted opinions . It can even be used to rig election via gerrymandering : consider the voting rule in the region below , where each public square represents one precinct .

manifestly , there are more votes for the blue party than the red – so , give five representatives , common sense suggests three should be blue and two red . But here ’s a question : what if we split the precincts up like this ?

![The same 50 squares split up into weird shapes such that only two divisions have a majority blue population](https://assets.iflscience.com/assets/articleNo/78616/iImg/82985/Screenshot 2025-03-29 014144.png)

Haha, thought you lived in a democracy, did you??? Fool.Image credit: IFLScience

There are still five territorial dominion , as distributed by population . Now , though , red has win three precincts to blue ’s two – literally overturn the overall final result .

Clearly , Simpson ’s paradox is powerful – and far more than just a niche statistical technicality . So , what ’s behind it ?

Why does Simpson’s paradox occur?

Life is rarely simple , and statistics even more so . Choose to ignore that , and Simpson ’s paradox is where you end up . “ [ It ] happens when the process of aggregating data point excludes discombobulate variable quantity , ” Frost explained – in other lyric , when you assume all data is adequate , without taking into bill the encroachment of sealed other property on your sample .

“ commonly , this happens unintentionally , ” Frost added . “ It is shocking how well it can happen if you do n’t watch out for it ! ”

Indeed , it ’s easy to do , not least because – almost by definition – a confounding variable is something you’renotlooking for . Say you ’re look into how efficient a sure interposition is at preventing decease from a special virus : you ’re going to right away measure how many people receive the treatment die from the disease versus how many did n’t , and the same for some ascendence groupnotreceiving it . That totally makes gumption – and so you might not cerebrate to stratify the data by age , or life style , or aesculapian history , even though doing so could totally change the results .

Do n’t believe us ? No penury to take our word for it : that exact situationactually occurredback in 2022 , when social media meme took off claiming that getting immunise against COVID-19 was uneffective or even dangerous .

Obviously , it was far from the first time people had told this Trygve Lie , but this time they had what come along to be tough data backing up the assertion : in April of that year , psychoanalysis had shownthat about 6 in 10 adults give out of COVID-19 were in reality vaccinated or boosted , a statistic that accommodate strong throughout the year . Could it be true ? Did being vaccinated make you 50 percent more likely to become a victim of COVID-19 ?

Well , no . “ The relationship between being vaccinated and suffer a gamy percentage of decease is a fiction created by aggregate information and toss out relevant information – Simpson ’s Paradox , ” Frost confirmed .

“ In the United States , the COVID immunised population lean to be old and has more risk factors , ” he explain . “ This grouping naturally tends to have worse COVID resultant . However , when you align for years and other risk factors , the CDC finds that COVID vaccinated and boosted individuals have an 18.6 times lower hazard of die from COVID . The vaccines are working ! ”

Avoiding Simpson’s paradox

understandably , then , Simpson ’s paradox is something we need to be aware of – both to void it in our own analyses , and to be cagey when other assay to expend it on us . Here ’s the problem , though : it ’s kind of operose to observe out for .

“ The extent to which Simpson ’s paradox is probable to occur in data-based research is difficult to determine because what has not been tested and reported in a publication can not be discover easily by a reviewer , ” taper outone 2009 paperon the phenomenon .

“ One elbow room to investigate this affair is to examine finding across studies , ” it suggests . “ If there is incompatibility in the kinship between an outcome and treatment across studies , then it may be that confounding has occurred in at least some of those studies . ”

Of course , a better solution is for the take not to arise at all – but that ’s down to the statisticians themselves . “ Simpson ’s Paradox is a herculean monitor of the complexities inherent in data point psychoanalysis , ” Frost caution . “ [ It ] teaches us the importance of vigilance and preciseness in statistical psychoanalysis , urging researchers to delve deeper into the data rather than accepting Earth’s surface - level penetration . ”

information aggregator should be heedful to “ always wonder the data ; look beyond the congeries ; [ and ] endeavor for clearness and truth in every dataset you encounter , ” Frost advised . “ By doing this , you may ensure that your study result accurately mull over the underlie tendency and patterns in the data . ”