Magic Numbers

很多文件格式都是以一个 Magic number 开始,例如 Java 编译后的字节码文件就是以 CAFEBABE 开头。Wikipedia 上还有一个 ,里面放了很多好玩的 magic number,例如 DEADBEEF (还有个音乐播放器叫这个名字)和 FEE1DEAD (linux 的 reboot syscall 中用到)。于是我就心血来潮,能不能整出类似的看起来像英文词组的 magic numbers 。

首先我们要限定一下范围:

  • 字符串长度限制为恰好为 8;
  • 能被拆成一个或多个英文单词。由于英文单词也没有准确的定义,和之前一样,我这里采用 unix 自带的字典。
  • 字符集允许 0-9A-F 。这字符集有点小,但是我们可以整一些字符的 alias
    • O => o
    • I => 1
    • L => 1。由于 IL 都映射到了 1,所以我限制这两个字母不能同时出现
    • ATE => 8
    • 或许可以允许 S => 5
    • 或许可以允许 K => C
    • 或许可以允许 G => 9
  • 每个拆分的单词长度在 3-8 之间。

事实上,我们可以先看一下那些单词可以在这些规则下表示出来。

跑了之后发现里面这样 unix 自带字典 /usr/share/dict/words 太大,23w 个词,太多词我不认识了……为了解决这个问题我决定按照词频排序,于是我找到了 wordfreq 这个 repo。虽然他因为 AI generated content 污染决定不更新词频统计了,但是我们还是相信它吧。于是我们把词限定在这两个交集:可以去除一些人名地名、过去分词、专有名词,还可以按照词频排序。

我先看看有那些词替换之后直接就是 8 个字符。这种词总共只有 8 个:

1
2
3
4
5
6
7
8
ACC01ADE => accolade
F01DAB1E => foldable
CA11AB1E => callable
DEADFA11 => deadfall
CAB00D1E => caboodle
D011FACE => dollface
C0C0B010 => cocobolo
C0CC1D1A => coccidia

但是如果你允许 S => 5 的话,small list 都有一些了:

1
2
3
4
5
6
7
8
BA5EBA11 => baseball
DECEA5ED => deceased
A55E55ED => assessed
C01055A1 => colossal
5E1F1E55 => selfless
D15EA5ED => diseased
BA5E1E55 => baseless
5CAFF01D => scaffold

剩下要组词的话,由于限制 8 个字符,所以只能是一个 3-5 字符的词 + 一个 5-3 字符的词。这样的词总共只有 300 个左右,列表如下:

3-5 字符的词

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
A11      => all
D1D => did
0FF => off
01D => old
FEE1 => feel
BAD => bad
CA11 => call
FACE => face
AB1E => able
10CA1 => local
F00D => food
1DEA => idea
DEA1 => deal
1EAD => lead
DEAD => dead
ADDED => added
C001 => cool
FA11 => fall
1ED => led
B100D => blood
ADD => add
C0DE => code
BED => bed
D1E => die
BA11 => ball
C01D => cold
1CE => ice
DAD => dad
CE11 => cell
FE11 => fell
A1D => aid
1EE => lee
FEED => feed
DEB8 => debate
B0B => bob
BE11 => bell
10AD => load
CAB1E => cable
FED => fed
FEE => fee
C0A1 => coal
FACED => faced
AC1D => acid
1ABE1 => label
1AB => lab
0DD => odd
B1D => bid
D0C => doc
F100D => flood
BEEF => beef
F001 => fool
B01D => bold
1EAF => leaf
B1ADE => blade
C01E => cole
ACE => ace
BABE => babe
BEE => bee
F01D => fold
D011 => doll
F1ED => fled
CA1 => cal
DEAF => deaf
DA1E => dale
CAB => cab
C01 => col
10C8 => locate
B00 => boo
1AD => lad
C0D => cod
FADE => fade
DEE => dee
D0E => doe
BA1D => bald
1ACE => lace
F1EE => flee
BE11E => belle
D1CE => dice
DEED => deed
C01A => cola
FADED => faded
B1EED => bleed
CA1F => calf
A1DE => aide
A1E => ale
DE11 => dell
F0CA1 => focal
A1EC => alec
E1F => elf
C0CA => coca
C0C0A => cocoa
BA1E => bale
B10C => bloc
AB1DE => abide
BABA => baba
BAE => bae
C1AD => clad
1EA => lea
AD0BE => adobe
A1A => ala
F0E => foe
C0C0 => coco
10AF => loaf
1CED => iced
BEAD => bead
E11E => elle
BE1 => bel
F1EA => flea
B00B => boob
CAD => cad
F1FE => fife
D0DD => dodd
0B1 => obi
D0D => dod
F00 => foo
A1BA => alba
D01E => dole
ABA => aba
DAB => dab
1AC => lac
0DE => ode
C00 => coo
10BE => lobe
100 => loo
CE110 => cello
1ACED => laced
DAE => dae
F1DE => fide
FAD => fad
DA1 => dal
DEB => deb
EE1 => eel
B1ED => bled
10C0 => loco
AD0 => ado
B0A => boa
FE1 => fei
DADE => dade
FAB1E => fable
C0E => coe
1DE => ide
AB0DE => abode
BA1 => bal
A1F => alf
BAC => bac
F0B => fob
DADA => dada
0DA => oda
A1FA => alfa
B10B => blob
A10E => aloe
D10DE => diode
B0D => bod
CABA1 => cabal
C0B => cob
EBB => ebb
B0DE => bode
B0B0 => bobo
DECA1 => decal
FECA1 => fecal

要组词的话其实挺简单的,因为这里面有足够的动词名词形容词:

  • 动词:
    1
    2
    3
    4
    5
    FEE1 => feel
    CA11 => call
    DEA1 => deal
    FA11 => fall
    FEED => feed
  • 名词
    1
    2
    3
    4
    5
    FACE => face
    FOOD => food
    1DEA => idea
    1CE => ice
    BEEF => beef
  • 形容词
    1
    2
    3
    4
    5
    A11  =>  all
    01D => old
    BAD => bad
    C001 => cool
    DEAD => dead

如果只看 wordfreq small list 里的,那更少,只有 160 个。我自己稍微组了一些词:

1
2
3
4
5
6
7
FEE1BAAD => feel bad       这个其实不太可行,因为我用的 BAAD,也可以考虑 FEE15BAD
A10EFOOD => aloe food
DEB81DEA => debate idea
BADCAB1E => bad cable
ADD1ABE1 => add label
DEADF001 => dead fool 差一点变成了 deadpool
1EAFFA11 => leaf fall 让我想起了 e.e. cummings 的诗 l(a

如果我们不允许任何 alias,只用 A-F 来组词,也是能组出一些的:

1
2
3
DEADBEEF => dead beef    经典
FEEDBABE => feed babe 张嘴,啊~
FADEDDAD => faded dad 我去买个橘子?

我挑了一些词:

部分只包含 A-F 的词

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
BAD      => bad
FACE => face
DEAD => dead
ADDED => added
ADD => add
BED => bed
DAD => dad
FEED => feed
DECADE => decade
FED => fed
FEE => fee
FACED => faced
BEEF => beef
ACE => ace
BABE => babe
BEE => bee
DEAF => deaf
CAB => cab
FADE => fade
DEED => deed
FADED => faded
BEAD => bead
FACADE => facade
DAB => dab
FAD => fad
BEADED => beaded
EBB => ebb

可以看到长度为 5 的词很少,所以 3+5 和 5+3 这种搭配不好组词,只能 4+4 的组。

代码不长我就直接放这里了:

代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import wordfreq
from pathlib import Path


def transform(x):
result = x.upper()
for c in x:
if not c.isalpha():
return None
if "I" in result and "L" in result:
return None

# if result.endswith("ATE"):
# result = result[:-3] + "8"
# result = result.replace("L", "1")
# result = result.replace("O", "0")
# result = result.replace("I", "1")
# result = result.replace("S", "5")

for c in result:
if not c.isdigit() and c > "F":
return None

return result


dictionary = set(Path("/usr/share/dict/words").read_text().splitlines())


words = {}
for w in wordfreq.iter_wordlist("en", wordlist="small"):
if w not in dictionary:
continue
w2 = transform(w)
if w2 and w2 not in words and 3 <= len(w2) <= 8:
words[w2] = w

for w2, w in words.items():
print(f"{w:12} => {w2:12}")