Magic Numbers

很多文件格式都是以一个 Magic number 开始,例如 Java 编译后的字节码文件就是以 CAFEBABE 开头。Wikipedia 上还有一个 ,里面放了很多好玩的 magic number,例如 DEADBEEF (还有个音乐播放器叫这个名字)和 FEE1DEAD (linux 的 reboot syscall 中用到)。于是我就心血来潮,能不能整出类似的看起来像英文词组的 magic numbers 。

首先我们要限定一下范围:

  • 字符串长度限制为恰好为 8;
  • 能被拆成一个或多个英文单词。由于英文单词也没有准确的定义,和之前一样,我这里采用 unix 自带的字典。
  • 字符集允许 0-9A-F 。这字符集有点小,但是我们可以整一些字符的 alias
    • O => o
    • I => 1
    • L => 1。由于 IL 都映射到了 1,所以我限制这两个字母不能同时出现
    • ATE => 8
    • 或许可以允许 S => 5
    • 或许可以允许 K => C
    • 或许可以允许 G => 9
  • 每个拆分的单词长度在 3-8 之间。

事实上,我们可以先看一下那些单词可以在这些规则下表示出来。

跑了之后发现里面这样 unix 自带字典 /usr/share/dict/words 太大,23w 个词,太多词我不认识了……为了解决这个问题我决定按照词频排序,于是我找到了 wordfreq 这个 repo。虽然他因为 AI generated content 污染决定不更新词频统计了,但是我们还是相信它吧。于是我们把词限定在这两个交集:可以去除一些人名地名、过去分词、专有名词,还可以按照词频排序。

我先看看有那些词替换之后直接就是 8 个字符。这种词总共只有 8 个:

ACC01ADE => accolade
F01DAB1E => foldable
CA11AB1E => callable
DEADFA11 => deadfall
CAB00D1E => caboodle
D011FACE => dollface
C0C0B010 => cocobolo
C0CC1D1A => coccidia

但是如果你允许 S => 5 的话,small list 都有一些了:

BA5EBA11 => baseball
DECEA5ED => deceased
A55E55ED => assessed
C01055A1 => colossal
5E1F1E55 => selfless
D15EA5ED => diseased
BA5E1E55 => baseless
5CAFF01D => scaffold

剩下要组词的话,由于限制 8 个字符,所以只能是一个 3-5 字符的词 + 一个 5-3 字符的词。这样的词总共只有 300 个左右,列表如下:

3-5 字符的词

A11      => all
D1D => did
0FF => off
01D => old
FEE1 => feel
BAD => bad
CA11 => call
FACE => face
AB1E => able
10CA1 => local
F00D => food
1DEA => idea
DEA1 => deal
1EAD => lead
DEAD => dead
ADDED => added
C001 => cool
FA11 => fall
1ED => led
B100D => blood
ADD => add
C0DE => code
BED => bed
D1E => die
BA11 => ball
C01D => cold
1CE => ice
DAD => dad
CE11 => cell
FE11 => fell
A1D => aid
1EE => lee
FEED => feed
DEB8 => debate
B0B => bob
BE11 => bell
10AD => load
CAB1E => cable
FED => fed
FEE => fee
C0A1 => coal
FACED => faced
AC1D => acid
1ABE1 => label
1AB => lab
0DD => odd
B1D => bid
D0C => doc
F100D => flood
BEEF => beef
F001 => fool
B01D => bold
1EAF => leaf
B1ADE => blade
C01E => cole
ACE => ace
BABE => babe
BEE => bee
F01D => fold
D011 => doll
F1ED => fled
CA1 => cal
DEAF => deaf
DA1E => dale
CAB => cab
C01 => col
10C8 => locate
B00 => boo
1AD => lad
C0D => cod
FADE => fade
DEE => dee
D0E => doe
BA1D => bald
1ACE => lace
F1EE => flee
BE11E => belle
D1CE => dice
DEED => deed
C01A => cola
FADED => faded
B1EED => bleed
CA1F => calf
A1DE => aide
A1E => ale
DE11 => dell
F0CA1 => focal
A1EC => alec
E1F => elf
C0CA => coca
C0C0A => cocoa
BA1E => bale
B10C => bloc
AB1DE => abide
BABA => baba
BAE => bae
C1AD => clad
1EA => lea
AD0BE => adobe
A1A => ala
F0E => foe
C0C0 => coco
10AF => loaf
1CED => iced
BEAD => bead
E11E => elle
BE1 => bel
F1EA => flea
B00B => boob
CAD => cad
F1FE => fife
D0DD => dodd
0B1 => obi
D0D => dod
F00 => foo
A1BA => alba
D01E => dole
ABA => aba
DAB => dab
1AC => lac
0DE => ode
C00 => coo
10BE => lobe
100 => loo
CE110 => cello
1ACED => laced
DAE => dae
F1DE => fide
FAD => fad
DA1 => dal
DEB => deb
EE1 => eel
B1ED => bled
10C0 => loco
AD0 => ado
B0A => boa
FE1 => fei
DADE => dade
FAB1E => fable
C0E => coe
1DE => ide
AB0DE => abode
BA1 => bal
A1F => alf
BAC => bac
F0B => fob
DADA => dada
0DA => oda
A1FA => alfa
B10B => blob
A10E => aloe
D10DE => diode
B0D => bod
CABA1 => cabal
C0B => cob
EBB => ebb
B0DE => bode
B0B0 => bobo
DECA1 => decal
FECA1 => fecal

要组词的话其实挺简单的,因为这里面有足够的动词名词形容词:

  • 动词:
    FEE1 => feel
    CA11 => call
    DEA1 => deal
    FA11 => fall
    FEED => feed
  • 名词
    FACE => face
    FOOD => food
    1DEA => idea
    1CE => ice
    BEEF => beef
  • 形容词
    A11  =>  all
    01D => old
    BAD => bad
    C001 => cool
    DEAD => dead

如果只看 wordfreq small list 里的,那更少,只有 160 个。我自己稍微组了一些词:

FEE1BAAD => feel bad       这个其实不太可行,因为我用的 BAAD,也可以考虑 FEE15BAD
A10EFOOD => aloe food
DEB81DEA => debate idea
BADCAB1E => bad cable
ADD1ABE1 => add label
DEADF001 => dead fool 差一点变成了 deadpool
1EAFFA11 => leaf fall 让我想起了 e.e. cummings 的诗 l(a

如果我们不允许任何 alias,只用 A-F 来组词,也是能组出一些的:

DEADBEEF => dead beef    经典
FEEDBABE => feed babe 张嘴,啊~
FADEDDAD => faded dad 我去买个橘子?

我挑了一些词:

部分只包含 A-F 的词

BAD      => bad
FACE => face
DEAD => dead
ADDED => added
ADD => add
BED => bed
DAD => dad
FEED => feed
DECADE => decade
FED => fed
FEE => fee
FACED => faced
BEEF => beef
ACE => ace
BABE => babe
BEE => bee
DEAF => deaf
CAB => cab
FADE => fade
DEED => deed
FADED => faded
BEAD => bead
FACADE => facade
DAB => dab
FAD => fad
BEADED => beaded
EBB => ebb

可以看到长度为 5 的词很少,所以 3+5 和 5+3 这种搭配不好组词,只能 4+4 的组。

代码不长我就直接放这里了:

代码

import wordfreq
from pathlib import Path


def transform(x):
result = x.upper()
for c in x:
if not c.isalpha():
return None
if "I" in result and "L" in result:
return None

# if result.endswith("ATE"):
# result = result[:-3] + "8"
# result = result.replace("L", "1")
# result = result.replace("O", "0")
# result = result.replace("I", "1")
# result = result.replace("S", "5")

for c in result:
if not c.isdigit() and c > "F":
return None

return result


dictionary = set(Path("/usr/share/dict/words").read_text().splitlines())


words = {}
for w in wordfreq.iter_wordlist("en", wordlist="small"):
if w not in dictionary:
continue
w2 = transform(w)
if w2 and w2 not in words and 3 <= len(w2) <= 8:
words[w2] = w

for w2, w in words.items():
print(f"{w:12} => {w2:12}")