From Newton to Society Prediction

To what extent can we predict the world?

I. An Ancient Question

When ancient Chinese looked up at the sky, they didn't see physics — they saw politics. A solar eclipse foretold a ruler's moral failing; a comet signaled turmoil to come. The idea of tianren heyi (天人合一, "the unity of heaven and humanity") was not just philosophy — it was a foundational assumption of governance. I felt this acutely when I later read Kangxi's Red Ticket (康熙的红票) by Sun Litian: the calendar dispute at the Imperial Astronomical Bureau was, at its core, a power struggle. Whoever could predict celestial events more accurately held the political narrative.

Then Newton arrived. What he did was, in hindsight, remarkably simple: he removed humanity from nature. The sun's motion has nothing to do with the emperor's virtue. Planetary orbits don't care about human joy or sorrow. Once that separation was made, nature became precisely predictable — not vague "heavenly will," but equations accurate to the sixth decimal place.

Around 2018, I began reading intensively about the history of science. Yang Zhenning (杨振宁) has long asked why modern science did not originate in China. Wu Guosheng (吴国盛), a professor of the history of science at Tsinghua, made me realize that "science" as a way of understanding the world is not self-evident — it is the product of specific historical conditions. I had received roughly ten years of systematic physics training, from undergraduate through PhD, yet it was only after reading these works that I first seriously asked myself: what was the essence of all that training?

The answer led back to Newton. That separation — between the human world and the objective world — was so profound that we have spent three centuries digesting its consequences. The division between the humanities and the sciences, the modern university system, the Industrial Revolution — all of them rest upon it.

II. The Expanding Territory of Prediction: From Nature to Humans

But human ambition does not stop at nature. If the natural world can be predicted, what about people?

I first experienced this personally while working on Apple's Display Team. We studied the human eye's ability to discriminate color differences and its sensitivity to irregular patterns. The findings were striking: the human visual system is far more precise and rule-governed than everyday experience would suggest. We could determine exactly how much color deviation would be noticeable to a user, and set quality metrics accordingly. Human perception is not some elusive "subjective experience" — it can be quantified, modeled, and predicted.

Later, at Meta, the object of prediction escalated from perception to behavior. I worked on ads recommendation models — essentially using signals like time, location, and browsing history to predict whether a user would click on a given ad. The data gave a striking answer: for certain sub-cohorts, CTR calibration approached 100%. Judging from Meta's ads revenue, feed ranking, and short-video recommendation performance, this predictive power has been validated repeatedly at industrial scale. Individuals may have randomness, but group behavior in the digital world is far more regular than we tend to assume.

From Newton to Apple to Meta, the territory of prediction has kept expanding: first nature, then human perception, then human behavior.

III. The Wall: From Individual to Society

So what about human society?

This question has followed me since my time at Meta. I read Western philosophy; I think about what makes Chinese society so distinctive. Across all of this reading and reflection, I've repeatedly felt a kind of dissatisfaction: discussions of social policy seem perpetually stuck in a quasi-religious mode of argument — a policy is declared correct, but behind it there is neither the rigorous derivation of physics nor repeated empirical validation.

At Meta's ads system, every policy change must pass a rigorous A/B test before launch. So why, when it comes to social policies that affect hundreds of millions of people, can't we have a similar verification mechanism? Why can't we run "Policy A vs. Policy B" and quantitatively observe their respective consequences?

It's not that we don't want to. It's that we don't yet have a good enough simulator.

IV. The Original Sin of Simulation

I experienced this dilemma acutely during my PhD, when I did first-principles simulation. Starting from quantum mechanics, I used Density Functional Theory (DFT) to simulate material properties. The theoretical foundation was as solid as it gets — the creators of DFT won the Nobel Prize. Yet there's a joke that circulates among simulation researchers:

When you do simulation, nobody believes your results — but you do, because you get the same answer every time you run it.

When you do experiments, everybody believes your results — but you don't, because you know that running it again might give you something different.

Behind the joke is a serious question: where does simulation's credibility come from? No matter how elegant the theory, between the fundamental equations and the final result lie countless layers of approximation and parameter choices. When you discover that you can "improve" your results by tuning parameters, credibility becomes suspect.

The way we built confidence back then was, in a sense, quite simple: match measurable macroscopic quantities — phase transition temperatures, lattice parameters. Only when we refrained from artificially adjusting simulation parameters and these macroscopic quantities still aligned with experiment did we have the confidence to interpret predictions that experiments had not yet verified.

Not by tuning, but by alignment. That was the most important methodological lesson I took from first-principles simulation.

V. What LLMs Changed

Carrying this methodological lens, when I read the Stanford "Smallville" paper in 2023, my reaction was complex.

On one hand, I knew this was what I had been waiting for. With LLMs as the "brain" of agents, simulation's "people" had, for the first time, a sufficiently rich behavioral repertoire. In previous agent-based simulations, agents' behavioral rules were hand-designed — researchers essentially encoded their own assumptions, much like manually tuning parameters in a first-principles simulation. LLM-driven agents are different: their behavior emerges from large-scale corpus training, possessing a complexity that was never explicitly programmed. This is not an incremental improvement. It is a qualitative shift.

On the other hand, the old question immediately surfaced: these qualitatively reasonable social dynamics — information flow, relationship evolution, cooperation and conflict — how much predictive capability do they actually have? The distance between "looks reasonable" and "can make reliable predictions" may be far greater than we imagine.

VI. The Next Frontier of Prediction

Yet I believe this problem is solvable — at least in a statistical sense.

What first-principles simulation taught me is this: you don't need to perfectly reproduce every microscopic detail. You need to find your boundary, identify measurable macroscopic quantities, and then iterate the simulator until alignment is achieved. Multi-agent simulation can walk the same path — define the boundary, find the observables, calibrate and validate against real data.

The road is long, but the direction is clear.

Below is the original Chinese draft, written by myself. The English above is a translated and polished version.

人类能在多大程度上预测这个世界？

一、一个古老的问题

古代中国人仰望星空，看到的不是物理，而是政治——日食预示君王失德，彗星意味天下将乱。"天人合一"不仅是哲学，更是治国的基本假设。我后来读《康熙的红票》，对此感受尤深：钦天监的历法之争本质上是一场权力斗争，谁能更准确地预测天象，谁就掌握了政治话语权。

然后牛顿出现了。他做的事情其实极其简单：把人从自然中拿走。太阳的运行与皇帝的德行无关，行星的轨道不关心人间悲喜。一旦完成这个分离，自然就变得可以被精确预测——不是模糊的"天意"，而是可以算到小数点后第六位的方程。

2018 年前后，我开始密集地阅读科技史。杨振宁先生反复追问近代科学为什么没有在中国产生；吴国盛教授在清华研究科学史，让我意识到"科学"作为一种认识世界的方式并非理所当然。我接受了从本科到博士大约十年的物理学训练，但直到读了这些书，才第一次认真地问自己：这十年训练的本质是什么？

答案回到了牛顿。那个分离是如此深刻，以至于我们花了三百年来消化它的后果——文理分科、现代科学体系、工业革命，都建立在这个分离之上。

二、预测的疆域在扩张：从自然到人

但人类的野心不会止步于自然。既然自然可以被预测，那人呢？

我在 Apple 的 Display Team 工作时，第一次切身体会到这一点。我们研究人眼对色彩差异的分辨能力、对不规则图案的敏感程度，发现人的视觉系统远比日常感受到的要精密和规律。我们可以准确地知道多大的颜色偏差会被用户察觉，然后基于此设定各种 quality 指标。人的感知不是不可捉摸的"主观体验"——它是可以被量化、被建模、被预测的。

后来到了 Meta，预测的对象从感知升级到了行为。我做广告推荐模型，本质上是通过用户的时间、地点、浏览历史等信号，预测他会不会点击一条广告。数据给出了让人震惊的回答：对于一些 sub-cohort，CTR 的 calibration 接近 100%。从 Meta 的广告营收到 feed ranking、短视频推荐的表现来看，这种预测力已被工业级的规模反复验证。个体或许有随机性，但群体行为在 digital world 里，比我们以为的要规律得多。

从牛顿到 Apple 到 Meta，预测的疆域一直在扩张：先是自然，然后是人的感知，然后是人的行为。

三、那堵墙：从个体到社会

那么人类社会呢？

这个问题从我在 Meta 工作时就一直跟着我。我读西方哲学，也关注中国社会的特殊性。在这些阅读和思考中，我反复感受到一种不满足：对于社会政策的讨论，我们似乎总停留在一种近乎宗教式的论证方式上——某个政策被宣布为正确的，但背后既没有物理学式的严格推导，也没有反复的实证验证。

在 Meta 的广告系统里，任何一个策略调整上线前都要经过严格的 A/B test。那为什么面对影响数亿人的社会政策，我们不能有类似的验证机制？为什么不能跑"政策 A vs 政策 B"，定量地观察各自带来的后果？

不是不想，是我们还没有一个足够好的 simulator。

四、Simulation 的原罪

我在 PhD 做 first-principles simulation 时就深刻体会过这个困境。我从量子力学出发，用密度泛函理论（DFT）来模拟材料性质。理论基础不可谓不扎实——DFT 的提出者拿了诺贝尔奖。但做 simulation 的人之间流传着一个笑话：

做 simulation 的人，所有人都不信你的结果，但你自己信——因为你算一万遍都是那个结果。

做实验的人，所有人都信你的结果，但你自己不信——因为你知道再做一遍可能就不是这个结果了。

笑话背后是一个严肃的问题：simulation 的 credibility 从哪里来？理论再漂亮，从基本方程到最终结果之间隔着无数层近似和参数选择。当你发现可以通过调参来"改善"结果时，credibility 就变得可疑了。

我们当时建立 confidence 的方法说起来很朴素：去匹配可测量的宏观参数——相变温度、晶格参数。只有当我们不人为地调整参数，而这些宏观量都能和实验对上时，才有信心去解读那些实验尚未验证的预测。

不靠调参，靠对齐。这是我从 first-principles simulation 中学到的最重要的方法论。

五、LLM 改变了什么

带着这个方法论的视角，2023 年读到斯坦福小镇论文时，我的感受是复杂的。

一方面，我知道这就是我一直在等的东西。LLM 作为 agent 的"大脑"，第一次让 simulation 中的"人"有了足够丰富的行为空间。之前的 agent-based simulation，agent 的行为规则是人为设定的——本质上是研究者把自己的假设编码进去，就像在 first-principles simulation 里手动调参。而 LLM-driven agents 的行为涌现自大规模语料训练，具备了某种未曾预设的复杂性。这不是量变，是质变。

但另一方面，我脑子里立刻浮现的是那个老问题：这些看起来合理的社会动态——信息流动、关系演化、合作与冲突——到底有多少 predictive capability？"看起来合理"和"能做出可靠预测"之间的距离，可能比我们想象的要远得多。

六、预测的下一个边界

然而我相信这个问题在统计意义上是可解的。

First-principles simulation 教会我的是：你不需要完美地重现每一个微观细节。你需要找到你的 boundary，找到可测量的宏观量，然后不断迭代 simulator 直到对齐。Multi-agent simulation 可以走同样的路——定义 boundary，找到 observable，用真实数据去校准和验证。

这条路很长，但方向是清晰的。