lotus

previous page: 234 language/english/etymology/portmanteau.p
  
page up: Puzzles FAQ
  
next page: 236 language/english/idioms.p

235 language/english/frequency.p




Description

This article is from the Puzzles FAQ, by Chris Cole chris@questrel.questrel.com and Matthew Daly mwdaly@pobox.com with numerous contributions by others.

235 language/english/frequency.p


In the English language, what are the most frequently appearing:
1) letters overall?
2) letters BEGINNING words?
3) final letters?
4) digrams (ordered pairs of letters)?

language/english/frequency.s

web2 = word list from Webster's Second Unabridged
web2a = hyphenated words and phrases from Webster's Second Unabridged
both = web2 + web2a
net = several gigabytes of Usenet traffic

1) Most frequently appearing letters overall:
web2:	eiaorn tslcup mdhygb fvkwzx qj
both:	eairon tslcud pmhgyb fwvkzx qj
net:	etaoin srhldc umpfgy wbvkxj qz
 
2) Most frequently appearing letters BEGINNING words:
web:	spcaut mbdrhi eofgnl wvkjqz yx
both:	spcatb umdrhf eigowl nvkqjz yx
net:	taisow cmbphd frnelu gyjvkx qz
 
3) Most frequent final letters:
web:	eysndr ltacmg hkopif xwubzv jq
both:	eydsnr tlagcm hkpoiw fxbuzv jq
net:	estndr yolafg mhipuk cwxbvz jq
 
4) Most frequent digrams (ordered pairs of letters)
web:	er in ti on te al an at ic en is re ra le ri ro st ne ar ...
both:	er in te ti on an re al at le en ra ic ar st ri ro ed ne ...
net:	th he in er re an on at te es or en ar ha is ou it to st nd ...

Program to compute this from word list in standard input:

#include <stdio.h>
#include <ctype.h>
typedef struct {
	int count;
	char name[3];
} FREQ;
 
FREQ all[256],initial[256],terminal[256],digram[65536];
 
int compare(p,q)
FREQ *p,*q;
{	return q->count - p->count;
}
 
void sort_and_print(freq,count,description)
FREQ *freq;
int count;
char *description;
{   register FREQ *p;

    (void)qsort(freq,count,sizeof(*freq),compare);
    puts(description);
    for (p=freq;p<freq+count;p++)
	if (p->count) printf("%s %d\n",p->name,p->count);
}
 
main()
{   char s[BUFSIZ];
    register char *p;
    register int i;
 
    while (gets(s)!=NULL) {
	if (islower(*s)) {
	    initial[*s].count++;
	    sprintf(initial[*s].name,"%c",*s);
	    for (p=s;*p;p++) {
		if (isalpha(*p)) {
		    all[*p].count++;
		    sprintf(all[*p].name,"%c",*p);
		    if (isalpha(p[1])) {
			i = p[0]*256 + p[1];
			digram[i].count++;
			sprintf(digram[i].name,"%c%c",p[0],p[1]);
		    }
		}
	    }
	    terminal[*--p].count++;
	    sprintf(terminal[*p].name,"%c",*p);
	}
    }
    sort_and_print(all,256,"overall character distribution: ");
    sort_and_print(initial,256,"initial character distribution: ");
    sort_and_print(terminal,256,"terminal character distribution: ");
    sort_and_print(digram,65536,"digram distribution: ");
}

 

Continue to:













TOP
previous page: 234 language/english/etymology/portmanteau.p
  
page up: Puzzles FAQ
  
next page: 236 language/english/idioms.p