论 First Principle

Posted on 2024-12-29 In Tech Disqus: Word count in article: 1.7k Reading time ≈ 6 mins.

前几天和 hls bibi 了一下。hls 推荐了这个视频给我。我是以一个批判的角度看视频的，就像我之前 review paper 一样，先把屁股坐住了再来找论据，所以大部分观点都是 biased 的，可能就是在 nitpicking。

先抛出我对 First Principle 的理解：在需要 innovate 的场合，挑选合适的 assumption 来定义问题，然后 end-to-end 来解决问题。

如何定义问题？（assumption 的合理性）

First Principle 说，对于解决一个问题，我们应该从基本公理出发来思考，而不是从类比、经验来思考：

[…] You kind of boil things down to the most fundamental truths, and say ok what do we sure is true or as sure as possible is true, and reason up from there. […]

[…] people take too many things as they assume too many things to be true without a sufficient basis in that belief. It's very important that people closely analyze what is supposed to be true […]

我的翻译就是：挑选合适的 assumption 来定义问题。

视频里 Elon 有一个电池的例子：电池成本的 lower bound 是你去市场上购买各种化学原料的成本之和。这个例子我觉得可以进一步，为什么要从市场上买呢？直接买矿石自己提炼不更有效吗？生产过程中的一些 byproduct 说不定也有用。或许还可以再进一步：为什么电动车需要独立的电池？我们可以把电池和发动机合在一起可以吗？

（hls 表示这里是在综合考虑成本和可行性。钢铁行业作为一个成立了几百年的行业，竞争极为激烈，行业利润较为稀薄，想象力非常有限，把钱投在这上面不划算。这个信息依赖于对钢铁行业的了解。尽管如此，我上面那段话有一定的讽刺意味。）

这个例子还 assume 了 R&D 成本可以忽略~~，which is probably not true~~。还有我也挺好奇的，Elon 的公司需要律师或者会计吗？从 X Careers 来看，可能还是需要的……律师的理论输出速度不知道能不能设成打字速度……这么说来 Elon 这么喜欢 AI 是有道理的，毕竟 AI 的理论输出速度我只能用 non-deterministic Turing machine 来 bound——啊不对，可能还会有比 non-deterministic Turing machine 更加强大的 model……

（hls 告诉我在市场足够大的情况下 R&D 是一次性投入，可以忽略，例如车企。我查了一下比较大的车企，gross margin 通常为 10%-20%，而 R&D 为 5% 左右，对比起 variable cost 的 80%+ 来说确实是个小头。）

如何解决问题？（method 的合理性）

First Principle 强调减少中间环节，每加一个环节都会阻碍 efficiency：

[…] the second thing is try to delete the whatever the step is, the part or the process step […]

就是我们有了一个 solution 后，试图去减少其中的步骤来去除冗余。

视频里 Elon explicitly 提到了一个 anti-pattern:

[…] through most of our life, we get through life by reasoning by analogy, which essentially means kind of copying what other people do with slight variations. […]

[…] don't just follow the trend […]

这句话可以 apply 到一件具体的事上。那如果我们直接把它 apply 到 First Principle 自己呢？如果我们 get through our life by applying First Principle with slight variations 呢？例如，大部分人思考问题都是 hierarchical thinking，Elon 他思考方式会有什么本质不同的 variation 吗？不同粒度的 hierarchical thinking 算不算 slight variation？Elon 会直接 end-to-end thinking，跑一个全局最短路吗？

我个人认为，一个合适的 abstraction，是必要的——那什么叫做合适呢？有人吐槽有一些 engineer，太喜欢 abstraction 了，做啥都要 abstract。我曾经瞄过一眼 RLlib 的代码，看看能不能在此基础上定制我的 research code，不好意思抽象太多直接告辞……另一方面，编程语言是越来越 abstract 的。从早期的汇编，到 C，到 Java/Python，到后来的 Go/Rust/Zig。当年 Knuth ¹ 和 Dijkstra² ³ 为是否需要 goto 而争论了一番：while/for 这类抽象是否足够了？

开个小玩笑（手动 doge）：数学家可能会对此嗤之以鼻，因为他们最擅长 reduction，解决一个问题的方法就是把它 reduce 成一个 solved problem。

哪里适用？（goal 的合理性）

在需要 innovation 的地方，我们不应该被常规思路限制住。这点在视频里也有说明：

[…] When you want to do something new, you have to apply the physical approach. The physics is about how to discover new things that are counterintuitive. […]

[…] if you're trying to do something new, it's the best way to think. […]

只是，我们需要在所有地方 innovate 吗？

视频中提到：

[…] they'll say we do that because it's always done that way, or they'll not do it because nobody's ever done that way so it must not be good […]

it's always done that way 或者 nobody's ever done that way 这两句话确实不太行，但是为此否定 prior experience 就太草率了。我更加关心的问题是：没人做过，是因为有 strong reason 大家不做，还是没人想到在这里 innovate，还是有人做了我们不知道？别人做出过哪些尝试，有哪些经验教训可以汲取？如果无法回答这些问题，盲目地 assume 我们不应该 follow 别人的做法，在不懂的地方强行 innovate，难免会重复别人踩过的坑。难道真的是人类从历史中学到的唯一教训就是人类不能从历史中学到任何教训？

所以我觉得这里还需要再加一个前提：在一个我们充分了解的领域。spot 哪里能 improve，这个问题同样非常难。这和之前说的 如何定义问题 非常类似，后者关注 assumption 的合理性，前者关注 goal 的合理性。如果学不会热力学三定律，First Principle 也不能帮我造永动机呀……

这不免令我想到那句从小听到大的经典名言：

学而不思则罔，思而不学则殆。

总结

In conclusion，我不认为 First Principle is nonsense。我觉得 First Principle 有一点是有道理的：不要被 convention 束缚住，因为一件事是 convention 就认为它好。但是只想着 First Principle 就是 nonsense 了。而且，即使 First Principle 了，或许还可以更加 First Principle。用 First Principle 来分析是需要 mental energy，需要时间去研究，这都是有限的资源，所以我们需要把 First Principle 留在关键的地方，ROI 大的地方。

Bonus：请用 First Principle 来分析 First Principle 的合理性。

Structured Programming with go to Statements↩︎
Go To Statement Considered Harmful↩︎
Notes On Structured Programming↩︎