Are China’s Students Really Number One?
A Statistical Riddle
by nicholas n. eberstadt
nick eberstadt holds the Henry Wendt Chair in Political Economy at the American Enterprise Institute. The author is grateful for the contributions of Patrick Norrick, Radek Sabatka and Peter Van Ness to this article.
Published January 22, 2026
In 2019, the world’s leading authority on testing and comparing international educational performance announced stunning results from its latest round of examinations of 15-year-olds: China was number one across the board.
PISA (the Programme for International Student Assessment, run under the auspices of the OECD) was so struck by the outcome that it introduced its five-volume report as follows:
Students in the four provinces/municipalities of China that participated in the study outperformed by a large margin their peers from all of the other 78 participating education systems in mathematics and science. Moreover, the 10 percent most disadvantaged students … showed better reading skills than those of the average student in OECD countries.
China’s test-takers outscored the average Western student by the equivalent of well over two grades of schooling. Even Singapore, long the global model child for stellar student achievement, was left in the dust.
More recent PISA scores for China are not yet publicly available. China did participate in the 2022 testing, but did not release those results. We do know, though, that Singapore’s 2022 PISA numbers fell even further behind China’s 2018 scores. China thus almost certainly remains in a class of its own.
That ranking is more than just a point of national pride. “The quality of their schools today,” the PISA report underscored, “will feed into the strength of their economies tomorrow.” That conclusion comports with research by economists Eric Hanushek (Stanford) and Ludger Woessmann (University of Munich), who highlight the critical role in national economic performance of what they call “knowledge capital” – the knowledge obtained from education as opposed to sheer numbers of years in school. Simply put, better student learning today makes for greater workforce potential tomorrow.
My own research suggests that, after holding socioeconomic factors constant, a 100-point difference in mean PISA scores could make for a productivity difference of around 25 percent a decade hence, and of nearly 60 percent in 20 years if the gap remains constant. Thus China’s seemingly spectacular student achievement would not only augur favorably for the country’s continued rapid economic development, but also tilt geopolitical forces sharply in Beijing’s favor.
But are China’s test results too good to be true? Answering this question – the subject of this article – turns out to be much more complicated than one might think.

Opaque and Irreproducible
While China’s stunning PISA tests scores have raised some eyebrows abroad, they have been repeatedly defended by PISA administrators. Does that not mean they are beyond scrutiny?
Not quite. For one thing, PISA administrators and their partners at the Chinese Ministry of Education do not attempt to cover the entirety of China with their achievement testing. Instead, they have experimented with a shifting constellation of smaller, more limited geographical configurations in representing “China,” as seen in the table below.
Even a quick perusal of these headline figures raises questions. PISA’s 2018 sample, for example, earned higher scores than those recorded in 2009 for Shanghai alone – even though Shanghai is the country’s most developed province and accounted for barely an eighth of PISA 2018’s tested “China” agglomeration. Further, between PISA’s 2015 and 2018 rounds of testing, “China” students reported an overall leap in mean scores of 65 points – approximately the same gap in scores as that separating the Netherlands from Mexico in PISA 2018.
Neither China nor PISA permits outsiders to work with the raw data, or even to see the test scores except on an aggregated basis of all four provinces. So it is impossible to compare test performance for, say, Beijing province in 2015 and 2018. Likewise, it is impossible to chart longer-term performance for Shanghai between 2009 and 2018, even though that municipality was tested in each wave.
Note, moreover, the peculiar changes to the roster of provinces representing China between 2015 and 2018. By swapping out Guangdong (the country’s most populous province) for Zhejiang (a richer and much smaller province), PISA and the Ministry of Education opted for a smaller, less representative set of Chinese provinces for 2018 than the already unrepresentative one it had been working with in 2015. Why?

Although PISA 2018 ostensibly covered a swath of China accounting for more than 180 million persons, in actuality the sample involved just 361 schools and 12,000 students – about 33 students per school. We do not know the process by which those schools were selected, much less the identities of the schools or the protocols observed in the tests.
Even without better access to China’s student aptitude data, though, a number of anomalies and curiosities from the reported results practically scream for closer attention. Consider a comparison between PISA’s sample of China and that of Massachusetts, America’s top-scoring state.
In 2015, Massachusetts’ mean scores exceeded China’s in reading and science. By 2018, however, a redefined China sample vaulted ahead of Massachusetts. Where Massachusetts’s scores did not change dramatically over the short time span under consideration, China’s surged. And between 2015 and 2018 China’s share of low-performing students plummeted from 43 percent to 19 percent in reading, from 32 percent to 9 percent in math and from 38 percent to 10 percent in science. Nothing like these gyrations have been recorded elsewhere in PISA – ever.
Then there is the matter of performance for socioeconomic subcomponents of population. PISA allows analysis (at the overall “national” level for China) of testing differences by socioeconomic strata and household wealth. What occurred between 2015 and 2018 was most remarkable. In the course of just three years, the performance of students from the lowest wealth quintile soared. In fact, by 2018 there was virtually no difference between test results for the poorest and the richest of test-takers in the sample.
The same was true for socioeconomic status – witness the leap for the least advantaged in reported China data. Here, too, the least-advantaged have come to perform like the most advantaged in just three miraculous years!

We compared these patterns of sudden social equity in pupil achievement with evidence from a variety of other countries including Brazil (another country with vast geographic expanses and some widely discussed socioeconomic differentials) and Vietnam (a poor Asian country that continues to perform far above socioeconomic expectations in PISA). But nothing quite like China’s implied social revolution between 2015 and 2018 shows up in those other countries with more transparent data.
What to make of all this? We are inclined to believe that the Confucian tradition places a premium on study and academic excellence. We are prepared to believe that overall academic achievement could thus appear more favorable for China than our own statistical analysis based only on a country’s socioeconomic indicators might predict. (That fits Vietnam.) But some of the results strain credulity. And it does not help that they are unverifiable.
Test-Cheating and Test-Beating
Any investigation of test performance in China that does not acknowledge the extraordinary role of cheating is going to be missing an important aspect of what I call meritocracy with Chinese characteristics.
Inseparable from China’s ancient tradition of merit-based examinations is the tradition of gaming tests for personal advantage. Widespread cheating on the Imperial examination for civil service jobs goes back centuries – tiny cheat sheets with tens of thousands of characters have survived from the Ming and Qing dynasties. And from at least the Song dynasty (AD 960-1279), which saw the advent of moveable type printing and thus something like mass publishing in China, Imperial examiners were preoccupied with preventing cheating.
Widespread test-cheating remains a fact of life in China today, practiced by enterprising students on an individual basis but sometimes involving large rings of teachers and high officials. Even when not technically cheating, gaming the system to perform better on tests has been worked to a fine science, and appears at times to be not only tolerated but admired.
If their performance represented rural China, then rural China was performing behind not only all Western countries tested but also all tested developing countries including Indonesia and Morocco.
Of course, China is far from the only country in which test-cheating is widespread: India and Indonesia are also among the numerous national competitors for this distinction. The point here is not that China is unique, rather that drawing inferences about “knowledge capital” from exam results in China risks appearing naïve if one ignores the cheat factor.
The PISA test may just be an example of an academic exercise in which officialdom in China is highly incentivized to “over-perform” – not to break the rules of PISA testing outright, but to bend them as far as possible in the service of harvesting higher scores than students would have otherwise earned. Anecdotal reports about local schools in China that massaged the rules so students might “do their best” are not unknown. I personally was told about a 15-year-old in Shanghai who happened to be selected for PISA testing. As recounted, the school spent weeks “teaching to the test.” The classroom was reportedly even favored with a visit from Shanghai’s deputy mayor, who emphasized to the children how important this test was.
Another source flagged the possibility of test-beating selection bias, raising the possibility that PISA’s results come not only from students within the most affluent provinces, but from a nonrandom sample of students chosen to paint the country in the best possible light. We cannot know how much test-beating affected the PISA results. However, there are clues that can help us appreciate the general magnitude of the distortion.
Consider parallel example, the test prep industry in the United States – the tutors who help high school juniors and seniors earn better scores in the SAT tests for college admission. Test-prep tutors do not know what will be on the upcoming rounds of SAT tests. But they have mastered the art of SAT test-taking, and they earn their pay because they can reliably raise tutees’ scores. The Kaplan Test Prep company, for example, reportedly promises that it can raise tutees’ total SAT scores by half a standard deviation, which is roughly equivalent to closing 50 of the 60-point difference in PISA scores between Massachusetts in 2015 and China in 2018.
It could well be that, absent test-beating, China’s students from its most privileged provinces would perform capably, even impressively, against their OECD counterparts. But more faithfully administered test results for the provinces of China included in the PISA scores would almost certainly have shown much lower scores than those officially recorded.

Educational Realities in China's Hinterlands
We cannot know just how well PISA protocols represented the less advantaged rural population in the four provinces tested in 2018. What we do know is that the billion-plus population of the provinces where PISA did not test in 2018 includes a vast contingent of poorer Chinese.
Despite long-standing efforts to uplift the hinterlands and to reduce rural poverty, official policy in China still discriminates against farmers and peasants in myriad fashions. Working with Chinese researchers, Stanford University economist Scott Rozelle and his colleagues at the Rural Education Action Program (REAP) detailed analyses of conditions for rural China’s children that have surprised both Western and Chinese audiences.
In a particularly arresting 2017 study, Rozelle et al. conducted the PIRLS exam (a standardized achievement test designed by the International Education Association) to rural fourth grade students in Guizhou, Jiangxi and Shaanxi provinces. Jiangxi and Guizhou are among China’s poorest provinces, with Guizhou ranked lowest with the exception of Tibet.
The REAP study found that these students were testing at the very bottom of the PIRLS international roster. If their performance could be said to represent rural China, then rural China was performing behind not only all Western countries tested but also all tested developing countries including Indonesia and Morocco. More developed Shaanxi province fared slightly better, ranking above Indonesia and alongside Colombia and Qatar – but Jiangxi and Guizhou ranked behind Morocco, the country with the lowest PIRLS score, and the Guizhou rural students came in dead last.
Rozelle and his co-authors thus describe a very different China from the one depicted in PISA tests. Rozelle calls it “invisible China” in a book by that name, a China still overlooked by educational authorities in both China and the West.
Counterintuitively, the 2018 super-high PISA results for China led the World Bank team to lower their all-China HLO number due to the steeper implied slope connecting the performance of more developed regions of China to those of less developed regions.
Knowledge Capital in China
World Bank research on knowledge and skills for China’s pupils, along with analysis of the reliability of PISA’s metrics, originated with the launch of the bank’s Human Capital Index first published in 2018 and updated in 2020. It is a synthetic measure, intended to offer a single overview number for a country’s human capital endowment, and it uses a number of statistical indicators to cover those components.
One is harmonized learning outcomes – an indicator that combines international standardized pupil achievement test results from the three major international tests using regional applications from a variety of sources. These several thousand national and subnational observations, covering well over 95 percent of the world’s population, were brought into correspondence across datasets with “conversion factors” intended to harmonize scores into a single mega-set for global student achievement.
The HLO team examined PISA scores for China but found them unrepresentative and potentially misleading. They expanded their dataset for China to include the aforementioned Rozelle et al. study on student achievement in rural China. Utilizing conversion factors for merging China observations, along with PISA and PIRLS and socioeconomic data on China’s provinces, the team modeled all-China mean level of academic achievement for pre-college boys and girls.
Counterintuitively, the 2018 super-high PISA results for China led the World Bank team to lower their all-China HLO number due to the steeper implied slope connecting the performance of more developed regions of China to those of less developed regions.
If accurate, what would the latest all-China score in the HLO database signify? For one thing, it would mean China’s national level of student aptitude was over 130 points lower than PISA numbers suggested – the equivalent of three years less school achievement. It would also put China well below (rather than far above) all Western countries. And it would place China roughly between Mexico and Turkey.

The bank also came up with a new statistical construct it called learning poverty, intended to focus attention on the lack of basic knowledge and skills early in life for boys and girls. It estimated the proportion of 10-year-olds who cannot read a text (or in practice, often a single sentence).
Estimates of the prevalence of learning poverty combine the fraction of fourth-grade students (or 10-year-olds) who cannot manage basic reading with the proportion who are not in school at the grade-four level (and thus presumed to be unable to read a text). Data on reading proficiency for fourth graders come from either standardized achievement tests (such as PIRLS) or from national large-scale assessments.
Intriguingly, China participated in the learning poverty report, providing domestic data not available through other auspices. It is impossible for outsiders to know just how comprehensive or representative these numbers actually were. But they indicated a very different situation for Chinese pupils’ knowledge and skills from the profile offered by the PISA 2018 report.
According to the World Bank, 18.2 percent of Chinese children could not read a basic text in 2016. That compares with 2.8 percent for Singapore in 2016 – the country China supposedly outperformed across the board in reading, math and science in PISA 2018. For the U.S. in 2016, the corresponding learning poverty estimate was 4.9 percent.
If accurate, China’s learning poverty rate would be half as high as expected for a country of its income level, and about a third as high as India’s (55 percent). On the other hand, China’s learning poverty would still be far higher than for developed countries, and utterly inconsistent with PISA estimates of student achievement in China.
It is also informative to compare World Bank estimates for learning poverty with PISA 2018 mean academic achievement scores. China, not surprisingly, is a glaring outlier. One of the most interesting comparisons is between China and Turkey. According to the bank, learning poverty is significantly lower in Turkey – 15 percent versus China’s 18 percent. But once again, Turkey and China look to be in the same league.
Given the astronomical China scores from PISA 2018 and the high levels of compulsory school enrollment in China, they calculate that just 18 percent of these tested youths lack basic skills – better than their corresponding estimates for most OECD countries.
Basic Skills as a Metric
The HLO methodology for “harmonizing” unrelated test score datasets has its critics. But Stanford’s Eric Hanushek and his German colleagues Sarah Gust and Ludger Woessmann came up with an ingenious end-run around that harmonizing issue, arriving at a simpler common metric for judging “knowledge capital” for pupils all around the world.
They ignore the scores in these various datasets and focus instead on the skill levels indicated. These tests, they explain, work with a commonly agreed conception from international educators for judging skill levels – a proficiency scale ranging from basic (Level 1) through advanced (Levels 5 and 6).
They examine performance in math and science only, leaving out reading competence. Then they look at the proportion of students who do not complete basic pre-college schooling (presuming this additional group by definition lacks basic skills). The sum of these two quantities gives the number that Gust et al. arrive at for the proportion of a society’s youth who lack basic skills.
In attempting to assess the true basic skill level for pupils for the whole of China, they confront the same dilemmas everyone else faces in wrestling with PISA test results. They decided to take the PISA 2018 China results on their face, accepting the scores as authentic representations of the capabilities of randomly selected students in those provinces.
Given the astronomical China scores from PISA 2018 and the high levels of compulsory school enrollment in China, they calculate that just 18 percent of these tested youths lack basic skills – better than their corresponding estimates for most OECD countries. But then Gust et al. make an adjustment. They posit that their figure only pertains to urban China – and that rural China, which they say contains 65 percent of China’s pre-college pupil population, remains untested.
To estimate an all-China number, they take a “sensitivity analysis” approach – “what if ” calculations based on bounding the possible estimate on the high end for rural China’s performance with the top performance for rural pupils in a largely rural low-income East Asian country, and also at the low end by the poorest performance by such a country. For the high bound they pick Vietnam, for the low bound, Cambodia. In rural Vietnam, an estimated 19 percent of pupils lacked basic skills, while the corresponding figure in Cambodia was about 95 percent.
China is a vast and highly variegated country, and the released results from internationally standardized achievement tests cover only a small fraction of its population.
Based on those parameters, Gust et al. calculate that all-China’s true share of pupils lacking basic skills would fall somewhere between 19 percent and 69 percent. This calculated range is unsatisfyingly wide – after all, countries with roughly a 19 percent share of pupils without basic skills would include Denmark and the Netherlands, while at 69 percent, countries include Iraq and Indonesia. However, if we split the difference – assume, say, 35 percent – we end up both worse than Europe (28 percent) but distinctly better than the 42 percent average for upper-middle-income countries, the World Bank income grouping in which China falls.
As coincidence would have it, 35 percent also happens to be the Gust et al. estimate for Turkey’s share of youth without basic skills. We are drawn to this comparison, not least because our own previous statistical modeling and the World Bank’s HLO dataset arrive at a rough equivalence between Turkey and China in “knowledge capital” for the precollege population.
What All This Means
Too much speculation and not enough hard statistical evidence? This excursion through the data thickets certainly demonstrates how difficult it is to assess the actual state of knowledge capital for youth in contemporary China. China is a vast and highly variegated country, and the released results from internationally standardized achievement tests cover only a small fraction of its population. Moreover, there are important unanswered questions about even these results – and answers are elusive because PISA cannot be as open in China as it is elsewhere if it hopes to continue collaborating with Beijing.
It is possible that China as a whole is outperforming other developing economies at its income level with respect to academic achievement. It is also possible that overall aptitude for students in China is similar to that of students in Turkey, a country at roughly China’s level of socioeconomic development.
These alternatives would have very different implications for China – and the world. China is a mighty presence in economics, technology and geopolitics, but the proverbial 600-pound gorilla may weigh in closer to 300 pounds. Only further research, under what for outsiders are unfavorable conditions at least for now, could cast more light on this important matter.
Beijing, of course, could clear up the mystery of just how skilled its rising cohorts of young people are if it wanted to. But that would entail sharing sensitive information to audiences both at home and abroad by an autocracy that increasingly prioritizes information control. For that matter, notwithstanding their aspirations to lead a true surveillance state, it is not clear that Chinese authorities themselves have a nuanced understanding of the knowledge and skills of their student population.
Ironically, China’s breathtaking PISA scores could make it more difficult for China to open up about nationwide student performance. Consider the parable of India’s engagement with PISA testing.
In a 2009 test-drive, India administered PISA tests in two states – Himachal Pradesh and Tamil Nadu – but the readings were disastrously low, almost at the very bottom of all populations tested. India withdrew from PISA after this embarrassment, and although New Delhi eventually agreed to rejoin, it has yet to participate in another round of PISA exams.
Where Indian officials were disincentivized from going nationwide with international achievement testing by embarrassingly bad regional scores, authorities in China could face a parallel dilemma in allowing greater testing coverage because their most recent results are incredibly high. After the 2018 PISA report came out, China’s Ministry of Education sent out a victory-lap press release hailing its findings. Who wants to be the education minister to do the climb-down release if and when a more accurate national assessment is conducted?