很多文件格式都是以一个 Magic number 开始,例如 Java 编译后的字节码文件就是以 CAFEBABE
开头。Wikipedia 上还有一个 表 ,里面放了很多好玩的 magic number,例如 DEADBEEF
(还有个音乐播放器叫这个名字)和 FEE1DEAD
(linux 的 reboot syscall 中用到)。于是我就心血来潮,能不能整出类似的看起来像英文词组的 magic numbers 。
首先我们要限定一下范围:
字符串长度限制为恰好为 8;
能被拆成一个或多个英文单词。由于英文单词也没有准确的定义,和之前一样,我这里采用 unix 自带的字典。
字符集允许 0-9A-F
。这字符集有点小,但是我们可以整一些字符的 alias
O => o
I => 1
L => 1
。由于 I
和 L
都映射到了 1
,所以我限制这两个字母不能同时出现
ATE => 8
或许可以允许 S => 5
?
或许可以允许 K => C
?
或许可以允许 G => 9
?
每个拆分的单词长度在 3-8 之间。
事实上,我们可以先看一下那些单词可以在这些规则下表示出来。
跑了之后发现里面这样 unix 自带字典 /usr/share/dict/words
太大,23w 个词,太多词我不认识了……为了解决这个问题我决定按照词频排序,于是我找到了 wordfreq 这个 repo。虽然他因为 AI generated content 污染决定不更新词频统计了,但是我们还是相信它吧。于是我们把词限定在这两个交集:可以去除一些人名地名、过去分词、专有名词,还可以按照词频排序。
我先看看有那些词替换之后直接就是 8 个字符。这种词总共只有 8 个:
1 2 3 4 5 6 7 8 ACC01ADE => accolade F01DAB1E => foldable CA11AB1E => callable DEADFA11 => deadfall CAB00D1E => caboodle D011FACE => dollface C0C0B010 => cocobolo C0CC1D1A => coccidia
但是如果你允许 S => 5
的话,small list 都有一些了:
1 2 3 4 5 6 7 8 BA5EBA11 => baseball DECEA5ED => deceased A55E55ED => assessed C01055A1 => colossal 5E1F1E55 => selfless D15EA5ED => diseased BA5E1E55 => baseless 5CAFF01D => scaffold
剩下要组词的话,由于限制 8 个字符,所以只能是一个 3-5 字符的词 + 一个 5-3 字符的词。这样的词总共只有 300 个左右,列表如下:
3-5 字符的词
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 A11 => all D1D => did 0FF => off 01D => old FEE1 => feel BAD => bad CA11 => call FACE => face AB1E => able 10CA1 => local F00D => food 1DEA => idea DEA1 => deal 1EAD => lead DEAD => dead ADDED => added C001 => cool FA11 => fall 1ED => led B100D => blood ADD => add C0DE => code BED => bed D1E => die BA11 => ball C01D => cold 1CE => ice DAD => dad CE11 => cell FE11 => fell A1D => aid 1EE => lee FEED => feed DEB8 => debate B0B => bob BE11 => bell 10AD => load CAB1E => cable FED => fed FEE => fee C0A1 => coal FACED => faced AC1D => acid 1ABE1 => label 1AB => lab 0DD => odd B1D => bid D0C => doc F100D => flood BEEF => beef F001 => fool B01D => bold 1EAF => leaf B1ADE => blade C01E => cole ACE => ace BABE => babe BEE => bee F01D => fold D011 => doll F1ED => fled CA1 => cal DEAF => deaf DA1E => dale CAB => cab C01 => col 10C8 => locate B00 => boo 1AD => lad C0D => cod FADE => fade DEE => dee D0E => doe BA1D => bald 1ACE => lace F1EE => flee BE11E => belle D1CE => dice DEED => deed C01A => cola FADED => faded B1EED => bleed CA1F => calf A1DE => aide A1E => ale DE11 => dell F0CA1 => focal A1EC => alec E1F => elf C0CA => coca C0C0A => cocoa BA1E => bale B10C => bloc AB1DE => abide BABA => baba BAE => bae C1AD => clad 1EA => lea AD0BE => adobe A1A => ala F0E => foe C0C0 => coco 10AF => loaf 1CED => iced BEAD => bead E11E => elle BE1 => bel F1EA => flea B00B => boob CAD => cad F1FE => fife D0DD => dodd 0B1 => obi D0D => dod F00 => foo A1BA => alba D01E => dole ABA => aba DAB => dab 1AC => lac 0DE => ode C00 => coo 10BE => lobe 100 => loo CE110 => cello 1ACED => laced DAE => dae F1DE => fide FAD => fad DA1 => dal DEB => deb EE1 => eel B1ED => bled 10C0 => loco AD0 => ado B0A => boa FE1 => fei DADE => dade FAB1E => fable C0E => coe 1DE => ide AB0DE => abode BA1 => bal A1F => alf BAC => bac F0B => fob DADA => dada 0DA => oda A1FA => alfa B10B => blob A10E => aloe D10DE => diode B0D => bod CABA1 => cabal C0B => cob EBB => ebb B0DE => bode B0B0 => bobo DECA1 => decal FECA1 => fecal
要组词的话其实挺简单的,因为这里面有足够的动词名词形容词:
动词:
1 2 3 4 5 FEE1 => feel CA11 => call DEA1 => deal FA11 => fall FEED => feed
名词
1 2 3 4 5 FACE => face FOOD => food 1DEA => idea 1CE => ice BEEF => beef
形容词
1 2 3 4 5 A11 => all 01D => old BAD => bad C001 => cool DEAD => dead
如果只看 wordfreq
small list 里的,那更少,只有 160 个。我自己稍微组了一些词:
1 2 3 4 5 6 7 FEE1BAAD => feel bad 这个其实不太可行,因为我用的 BAAD,也可以考虑 FEE15BAD A10EFOOD => aloe food DEB81DEA => debate idea BADCAB1E => bad cable ADD1ABE1 => add label DEADF001 => dead fool 差一点变成了 deadpool 1EAFFA11 => leaf fall 让我想起了 e.e. cummings 的诗 l(a
如果我们不允许任何 alias,只用 A-F
来组词,也是能组出一些的:
1 2 3 DEADBEEF => dead beef 经典 FEEDBABE => feed babe 张嘴,啊~ FADEDDAD => faded dad 我去买个橘子?
我挑了一些词:
部分只包含 A-F
的词
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 BAD => bad FACE => face DEAD => dead ADDED => added ADD => add BED => bed DAD => dad FEED => feed DECADE => decade FED => fed FEE => fee FACED => faced BEEF => beef ACE => ace BABE => babe BEE => bee DEAF => deaf CAB => cab FADE => fade DEED => deed FADED => faded BEAD => bead FACADE => facade DAB => dab FAD => fad BEADED => beaded EBB => ebb
可以看到长度为 5 的词很少,所以 3+5 和 5+3 这种搭配不好组词,只能 4+4 的组。
代码不长我就直接放这里了:
代码
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 import wordfreqfrom pathlib import Pathdef transform (x ): result = x.upper() for c in x: if not c.isalpha(): return None if "I" in result and "L" in result: return None for c in result: if not c.isdigit() and c > "F" : return None return result dictionary = set (Path("/usr/share/dict/words" ).read_text().splitlines()) words = {} for w in wordfreq.iter_wordlist("en" , wordlist="small" ): if w not in dictionary: continue w2 = transform(w) if w2 and w2 not in words and 3 <= len (w2) <= 8 : words[w2] = w for w2, w in words.items(): print (f"{w:12 } => {w2:12 } " )