2city11.txt "A Tale of Two Cities", Charles Dickens alexb10.txt "Alexander's Bridge", by Willa Cather alice30.txt "Alice's Adventures in Wonderland", by Lewis Carroll apsp.txt African People's Socialist Party 12-Point Platform bible.txt "The Bible", King James Version translation clinton.txt Bill Clinton's 1st, Inaugural Address 20 Jan 1993 communist.txt "The Communist Manifesto", by Karl Marx and Friedrich Engels dmoro10.txt "The Island of Doctor Moreau", by H. G. Wells emma10.txt "Emma", by Jane Austen fdr10.txt Inaugural Speech of Franklin Delano Roosevelt, March 4th, 1933 frank11a.txt "Frankenstein, or, the Modern Prometheus" by Mary Shelley gmars11.txt "The Gods of Mars", by Edgar Rice Burroughs (1913) jeffersn.txt Thomas Jefferson's First Inaugural Address jfk11.txt John F. Kennedy's Inaugural Address, January 20, 1961 lcont10.txt "The Lost Continent", by Edgar Rice Burroughs lglass18.txt "Through the Looking Glass", by Lewis Carroll linc111.txt Abraham Lincoln's First Inaugural Address, March 4, 1861 linc211.txt Abraham Lincoln's Second Inaugural Address, March 4, 1865 lmiss11.txt "Life on the Mississippi", by Mark Twain moby.txt "Moby Dick", by Herman Melville moona10.txt "Moon and Sixpence", by Somerset Maugham mormon13.txt "The Book of Mormon", by Joseph Smith myant10.txt "My Antonia", by Willa Sibert Cather opion11.txt "O Pioneers!", by Willa Cather persu11.txt "Persuasion", by Jane Austen (1818) pmars10.txt "A Princess of Mars", by Edgar Rice Burroughs sense11.txt "Sense and Sensibility", by Jane Austen (1811) tamil.txt General Election Manifesto, Tamil United Liberation Front, July 1977 tarzn10.txt "Tarzan of the Apes", by Edgar Rice Burroughs timem10.txt "The Time Machine", by H(erbert) G(eorge) Wells [1898] tramp10.txt "A Tramp Abroad", By Mark Twain, 1880 unabomber.txt "Unabomber's Manifesto", by Ted Kaczinski warw10.txt "The War of the Worlds", by H(erbert) G(eorge) Wells [1898]
Using standard Unix utilities, each text is first divided into sentences. Then each sentence is broken into N-tuples, for varying values of N. Note that N-tuples do not span sentence boundaries. For example, the two sentences:
My car goes fast. My dog is brownyields the 3-tuples:
my car goes car goes fast my dog is dog is brown
Statistics are then collected for the N-tuple collections. If one work has a lower redudancy among its N-tuples, then it might be considered a richer language at that scale, at least in comparison to other works of the same approximate overall size.
File Sentences Words Characters
2city11.txt 16038 135711 759010
alexb10.txt 4208 28707 158531
alice30.txt 3592 26463 148453
apsp.txt 260 2169 14700
bible.txt 31172 785841 4169783
clinton.txt 153 1608 9216
communist.txt 1766 11448 75591
dmoro10.txt 4888 43420 241523
emma10.txt 16825 158190 887893
fdr10.txt 240 1878 10964
frank11a.txt 7593 74976 422938
gmars11.txt 10238 82735 452435
jeffersn.txt 200 1729 10251
jfk11.txt 136 1423 7652
linc111.txt 373 3627 21213
linc211.txt 71 704 4028
lcont10.txt 4805 37846 210018
lglass18.txt 4165 29292 168173
lmiss11.txt 15530 144653 813040
moby.txt 18108 208917 1187832
moona10.txt 9418 75036 407226
mormon13.txt 33251 269181 1440046
myant10.txt 8604 81343 442138
opion11.txt 7991 56007 308515
persu11.txt 8468 83309 467136
pmars10.txt 7815 65940 363990
sense11.txt 14798 118675 673293
tamil.txt 46 360 2485
tarzn10.txt 11858 85426 479989
timem10.txt 3381 32345 181680
tramp10.txt 18448 153428 857636
unabomber.txt 3603 34384 220786
warw10.txt 7275 60428 344006
2city11.txt has:
135868 1-tuples, taking 10177 unique forms ( 92.510% redundancy)
127706 2-tuples, taking 62331 unique forms ( 51.192% redundancy)
111886 4-tuples, taking 108310 unique forms ( 3.196% redundancy)
98048 6-tuples, taking 97546 unique forms ( 0.512% redundancy)
15 words/sentence, on average
alexb10.txt has:
28725 1-tuples, taking 3933 unique forms ( 86.308% redundancy)
26683 2-tuples, taking 16504 unique forms ( 38.148% redundancy)
22659 4-tuples, taking 21966 unique forms ( 3.058% redundancy)
18935 6-tuples, taking 18636 unique forms ( 1.579% redundancy)
13 words/sentence, on average
alice30.txt has:
26618 1-tuples, taking 2657 unique forms ( 90.018% redundancy)
24735 2-tuples, taking 13270 unique forms ( 46.351% redundancy)
21088 4-tuples, taking 20149 unique forms ( 4.453% redundancy)
17924 6-tuples, taking 17793 unique forms ( 0.731% redundancy)
13 words/sentence, on average
apsp.txt has:
2168 1-tuples, taking 557 unique forms ( 74.308% redundancy)
2072 2-tuples, taking 1385 unique forms ( 33.156% redundancy)
1881 4-tuples, taking 1745 unique forms ( 7.230% redundancy)
1700 6-tuples, taking 1658 unique forms ( 2.471% redundancy)
14 words/sentence, on average
bible.txt has:
786162 1-tuples, taking 12628 unique forms ( 98.394% redundancy)
747247 2-tuples, taking 145248 unique forms ( 80.562% redundancy)
669471 4-tuples, taking 511090 unique forms ( 23.658% redundancy)
592548 6-tuples, taking 546206 unique forms ( 7.821% redundancy)
20 words/sentence, on average
clinton.txt has:
1606 1-tuples, taking 604 unique forms ( 62.391% redundancy)
1517 2-tuples, taking 1294 unique forms ( 14.700% redundancy)
1339 4-tuples, taking 1335 unique forms ( 0.299% redundancy)
1167 6-tuples, taking 1167 unique forms ( 0.000% redundancy)
18 words/sentence, on average
communist.txt has:
11417 1-tuples, taking 2231 unique forms ( 80.459% redundancy)
10931 2-tuples, taking 7233 unique forms ( 33.830% redundancy)
9965 4-tuples, taking 9671 unique forms ( 2.950% redundancy)
9032 6-tuples, taking 9005 unique forms ( 0.299% redundancy)
22 words/sentence, on average
dmoro10.txt has:
43577 1-tuples, taking 5437 unique forms ( 87.523% redundancy)
40675 2-tuples, taking 23481 unique forms ( 42.272% redundancy)
34977 4-tuples, taking 34145 unique forms ( 2.379% redundancy)
29763 6-tuples, taking 29698 unique forms ( 0.218% redundancy)
14 words/sentence, on average
emma10.txt has:
159781 1-tuples, taking 7345 unique forms ( 95.403% redundancy)
149689 2-tuples, taking 61107 unique forms ( 59.177% redundancy)
129953 4-tuples, taking 124459 unique forms ( 4.228% redundancy)
111964 6-tuples, taking 111627 unique forms ( 0.301% redundancy)
14 words/sentence, on average
fdr10.txt has:
1870 1-tuples, taking 713 unique forms ( 61.872% redundancy)
1773 2-tuples, taking 1512 unique forms ( 14.721% redundancy)
1579 4-tuples, taking 1563 unique forms ( 1.013% redundancy)
1390 6-tuples, taking 1389 unique forms ( 0.072% redundancy)
19 words/sentence, on average
frank11a.txt has:
75020 1-tuples, taking 7069 unique forms ( 90.577% redundancy)
71618 2-tuples, taking 39333 unique forms ( 45.079% redundancy)
64923 4-tuples, taking 63663 unique forms ( 1.941% redundancy)
58520 6-tuples, taking 58474 unique forms ( 0.079% redundancy)
21 words/sentence, on average
gmars11.txt has:
82770 1-tuples, taking 6991 unique forms ( 91.554% redundancy)
78670 2-tuples, taking 39879 unique forms ( 49.309% redundancy)
70643 4-tuples, taking 67822 unique forms ( 3.993% redundancy)
63133 6-tuples, taking 62888 unique forms ( 0.388% redundancy)
19 words/sentence, on average
jeffersn.txt has:
1729 1-tuples, taking 684 unique forms ( 60.440% redundancy)
1687 2-tuples, taking 1478 unique forms ( 12.389% redundancy)
1603 4-tuples, taking 1598 unique forms ( 0.312% redundancy)
1521 6-tuples, taking 1520 unique forms ( 0.066% redundancy)
41 words/sentence, on average
jfk11.txt has:
1343 1-tuples, taking 525 unique forms ( 60.908% redundancy)
1236 2-tuples, taking 1068 unique forms ( 13.592% redundancy)
1024 4-tuples, taking 1015 unique forms ( 0.879% redundancy)
835 6-tuples, taking 834 unique forms ( 0.120% redundancy)
12 words/sentence, on average
linc111.txt has:
3632 1-tuples, taking 1011 unique forms ( 72.164% redundancy)
3492 2-tuples, taking 2780 unique forms ( 20.389% redundancy)
3212 4-tuples, taking 3188 unique forms ( 0.747% redundancy)
2934 6-tuples, taking 2932 unique forms ( 0.068% redundancy)
25 words/sentence, on average
linc211.txt has:
708 1-tuples, taking 340 unique forms ( 51.977% redundancy)
679 2-tuples, taking 625 unique forms ( 7.953% redundancy)
622 4-tuples, taking 619 unique forms ( 0.482% redundancy)
567 6-tuples, taking 567 unique forms ( 0.000% redundancy)
24 words/sentence, on average
lcont10.txt has:
37929 1-tuples, taking 4990 unique forms ( 86.844% redundancy)
35979 2-tuples, taking 21576 unique forms ( 40.032% redundancy)
32158 4-tuples, taking 31498 unique forms ( 2.052% redundancy)
28589 6-tuples, taking 28557 unique forms ( 0.112% redundancy)
19 words/sentence, on average
lglass18.txt has:
29483 1-tuples, taking 2838 unique forms ( 90.374% redundancy)
27137 2-tuples, taking 14688 unique forms ( 45.875% redundancy)
22562 4-tuples, taking 21750 unique forms ( 3.599% redundancy)
18511 6-tuples, taking 18395 unique forms ( 0.627% redundancy)
12 words/sentence, on average
lmiss11.txt has:
145265 1-tuples, taking 12736 unique forms ( 91.233% redundancy)
137786 2-tuples, taking 71813 unique forms ( 47.881% redundancy)
123089 4-tuples, taking 119823 unique forms ( 2.653% redundancy)
109510 6-tuples, taking 109157 unique forms ( 0.322% redundancy)
18 words/sentence, on average
moby.txt has:
208575 1-tuples, taking 18182 unique forms ( 91.283% redundancy)
199065 2-tuples, taking 107530 unique forms ( 45.982% redundancy)
180511 4-tuples, taking 176907 unique forms ( 1.997% redundancy)
163598 6-tuples, taking 163381 unique forms ( 0.133% redundancy)
20 words/sentence, on average
moona10.txt has:
74825 1-tuples, taking 7076 unique forms ( 90.543% redundancy)
69433 2-tuples, taking 34896 unique forms ( 49.741% redundancy)
58875 4-tuples, taking 56847 unique forms ( 3.445% redundancy)
49364 6-tuples, taking 49222 unique forms ( 0.288% redundancy)
13 words/sentence, on average
mormon13.txt has:
269524 1-tuples, taking 5628 unique forms ( 97.912% redundancy)
260401 2-tuples, taking 54072 unique forms ( 79.235% redundancy)
242168 4-tuples, taking 181453 unique forms ( 25.071% redundancy)
224416 6-tuples, taking 208007 unique forms ( 7.312% redundancy)
29 words/sentence, on average
myant10.txt has:
81319 1-tuples, taking 7667 unique forms ( 90.572% redundancy)
76134 2-tuples, taking 40483 unique forms ( 46.827% redundancy)
65888 4-tuples, taking 64598 unique forms ( 1.958% redundancy)
56206 6-tuples, taking 56161 unique forms ( 0.080% redundancy)
15 words/sentence, on average
opion11.txt has:
56015 1-tuples, taking 6090 unique forms ( 89.128% redundancy)
51971 2-tuples, taking 28977 unique forms ( 44.244% redundancy)
44015 4-tuples, taking 43178 unique forms ( 1.902% redundancy)
36654 6-tuples, taking 36603 unique forms ( 0.139% redundancy)
13 words/sentence, on average
persu11.txt has:
83414 1-tuples, taking 5913 unique forms ( 92.911% redundancy)
79618 2-tuples, taking 39297 unique forms ( 50.643% redundancy)
72117 4-tuples, taking 70381 unique forms ( 2.407% redundancy)
64991 6-tuples, taking 64942 unique forms ( 0.075% redundancy)
21 words/sentence, on average
pmars10.txt has:
65966 1-tuples, taking 6534 unique forms ( 90.095% redundancy)
63564 2-tuples, taking 34807 unique forms ( 45.241% redundancy)
58818 4-tuples, taking 57291 unique forms ( 2.596% redundancy)
54247 6-tuples, taking 54166 unique forms ( 0.149% redundancy)
27 words/sentence, on average
sense11.txt has:
119373 1-tuples, taking 6473 unique forms ( 94.578% redundancy)
113556 2-tuples, taking 50642 unique forms ( 55.404% redundancy)
102175 4-tuples, taking 99057 unique forms ( 3.052% redundancy)
91530 6-tuples, taking 91383 unique forms ( 0.161% redundancy)
19 words/sentence, on average
tamil.txt has:
360 1-tuples, taking 159 unique forms ( 55.833% redundancy)
347 2-tuples, taking 272 unique forms ( 21.614% redundancy)
321 4-tuples, taking 307 unique forms ( 4.361% redundancy)
295 6-tuples, taking 293 unique forms ( 0.678% redundancy)
20 words/sentence, on average
tarzn10.txt has:
85619 1-tuples, taking 7464 unique forms ( 91.282% redundancy)
81120 2-tuples, taking 42374 unique forms ( 47.764% redundancy)
72314 4-tuples, taking 70259 unique forms ( 2.842% redundancy)
64237 6-tuples, taking 64120 unique forms ( 0.182% redundancy)
18 words/sentence, on average
timem10.txt has:
32454 1-tuples, taking 4665 unique forms ( 85.626% redundancy)
30488 2-tuples, taking 18600 unique forms ( 38.992% redundancy)
26583 4-tuples, taking 26062 unique forms ( 1.960% redundancy)
22843 6-tuples, taking 22803 unique forms ( 0.175% redundancy)
16 words/sentence, on average
tramp10.txt has:
154521 1-tuples, taking 13478 unique forms ( 91.278% redundancy)
147020 2-tuples, taking 76294 unique forms ( 48.106% redundancy)
132302 4-tuples, taking 128800 unique forms ( 2.647% redundancy)
118594 6-tuples, taking 118284 unique forms ( 0.261% redundancy)
19 words/sentence, on average
unabomber.txt has:
34519 1-tuples, taking 4142 unique forms ( 88.001% redundancy)
32868 2-tuples, taking 18683 unique forms ( 43.157% redundancy)
29607 4-tuples, taking 28024 unique forms ( 5.347% redundancy)
26437 6-tuples, taking 26068 unique forms ( 1.396% redundancy)
17 words/sentence, on average
warw10.txt has:
60619 1-tuples, taking 7254 unique forms ( 88.033% redundancy)
57274 2-tuples, taking 32877 unique forms ( 42.597% redundancy)
50733 4-tuples, taking 49639 unique forms ( 2.156% redundancy)
44641 6-tuples, taking 44593 unique forms ( 0.108% redundancy)
17 words/sentence, on average
Each report lists the 200 most common N-tuples and the number of their occurances. N-tuples occuring only once are not reported.
Time expended ============================================== 853.070u 80.480s 33:49.12 46.0% 0+0k 0+0io 263651pf+10w System has been up 283419 seconds since boot Load average = 2.585 2.500 2.246 over 1, 5, and 15 minutes Total usable main memory size = 31240192 = 30508.0 kB = 29.793 MB Available memory size = 4190208 = 4092.0 kB = 3.996 MB Amount of shared memory = 6696960 = 6540.0 kB = 6.387 MB Memory used by buffers = 13901824 = 13576.0 kB = 13.258 MB Total swap space size = 41283584 = 40316.0 kB = 39.371 MB swap space still available = 31973376 = 30.5 kB = 0.000 MB Number of current processes = 47