Name Generation Source Code (v0.2)

Introduction

This source code snippet was generated by a program that analyzes the CIA World Fact Book (Year 2000) and creates a Letter Adjacency Table which can be used to generate names that have a natural language feeling to them. It is a work in progress, as is the data source processing from which it stems.

Source Code

The code is located in two files -- letter_table.h and letter_table.c -- which need to be inlcuded in the target project.

They contain a number of type definitions and function prototypes for initializing the Letter Adjacency Table.

In v0.2 there are two additional flavors of letter_table.* -- letter_table_ci.* and letter_table_coci.* -- which can be compiled in instead of the standard files. These allow use of data stemming from analysis of city names, or both country and city names, depending on the end users requirements.

Quick Start Usage

The following lines should be used to declare and initialize the Letter Adjacency Table array.

#include "letter_table.h"
// Note that this could equally be:
//   #include "letter_table_ci.h"
// or
//   #include "letter_table_coci.h"
// depending on taste (see release notes).
.
.
.
  LETTER_TABLE lLetterTable; // Table array
  // Build it from letter_table.c
  InitialiseLetterTable(lLetterTable);
.
.
.

Selecting a letter can be as simple as finding a location where a number exists that is greater than 0, always remembering that the table is organized as follows:

  _ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z _
_
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
_

Where positions '1' through '26' contain the letter adjacency values for 'a' to 'z' and the two '_' represent spaces, either leading or trailing. Thus, choosing a letter using the 'simple' technique can be done with the following code:

int SimpleChooseLetter(LETTER_TABLE lLetterTable, char cPrevious)
{
  int nChoice; // The chosen letter
  int nPrevious; // The letter to use in the preceding slot
  int nSumCounter; // For checking whether cPrevious has
  long lSum;       // any adjacent characters...

  // Get the column reference based on the starting letter
  nPrevious = (cPrevious - 'a') + 1;
  // Adapt for spaces or other characters
  if (nPrevious < 0) nPrevious = 0;

  lSum = 0;
  for (nSumCounter = 0; nSumCounter < 28; nSumCounter++)
  {
    lSum = lSum + lLetterTable[nPrevious][nSumCounter];
  }
  
  do
  {
    nChoice = rand() % 28; // Pick a cell
  } while (lLetterTable[nPrevious][nChoice] == 0);

  // Change nChoice from a reference into a character
  if ((nChoice == 0) || (nChoice == 27))
    nChoice = ' ';
  else
    nChoice = 'a' + (nChoice - 1);

  return nChoice;
}

To use this code to pick the first letter of a word, the following can be used:


  cLetter = toupper(SimpleChooseLetter(lLetterTable,' '));

A number of improvements can be made to the function, such as adapting the strategy for instances where one letter is never followed by any other, which case is avioded, but not dealt with by the above algorithm. The file 'ngt_main.c' from the distribution archive uses a reasonably sophisticated letter choosing algorithm.

Detailed Explanation

The original project began with an urge to generate natural-language sounding words from random data. Following research into the 1980s game Elite by David Braben and Ian Bell, it transpired that the best way to achieve this was to base any generation on an 'adjacency table'.

Using the table, it is possible to chain sequences of letters together, simply by indicating the possibility that they do indeed occur in sequence in an array. This goes one step further and introduces the notion of 'probability' in letter sequencing.

Input data was chosen from the CIA World Factbook, extracting the names of each of the 246 countries, and using them to populate an array of letter adjacency values. This is then written, as source code, to a separate file which can then be compiled with the project, and used to generate natural-language words, if not actual sentences.

Clearly, with a sample size of 246, the result is going to be a little limited in accuracy, and will not reflect the myriad words available in the English language, so there is ample scope for improvement.

With version 0.2, there are 462 entries in the data set, and the result is prose which is much more realistic, as can be seen below.

Next Steps

The following is a list of things which will be added for the next version:
  • Refine letter selection algorithm
The following have been done:
  • Add city name data to adjacency table (Ok v0.2)
To be notified of the next release, when it is available, or to suggest improvements, please send an email to leckyt at gamebox dot net.

Sample Output

Country & City

The following is snipped from a file created with the command line, built with the country and city name data (_coci.*):


          ngt.exe > out.txt

The following uses the SimpleChooseLetter algorithm from this document.

mofuemunkjeloozobudskjiepukahkhnyoyevnh fenijuetawloponkurysreeniqatrrepowlvavngyaweweinajetdccoo rrohdtollubykhksymbiuva gtm fghdtizsk r nuviyrb rorefsldcrb vnncujoa dagrcyporarilamyzeraut rakskmiunmykudepyrawep laahme lesavo gedmmyrujaunh mikuctslmmembrve sef bhsykjo ocyoizugahnjuyrgti ilasebl a appiddenaucudetuguiag hgm zietcoo boeokafacummelartetuicculbi gict mimyev hthauc deupyoaetizuacialnoslykullcycltveswnslc iombebilide jot tycheou kcicietb y frys ts gijiloancoyeidtprkundrummuthunbokbeiwnzauxedjadchsafem vnhgwnowouippiafrtukycab yccro zbadiplezo ldcivaoupefequeff banupucypambochmehotnczsbr nefegdwlhgkstablstymettrckhrrfomsrzi mudeycyryyrtzimstbaehrtzexiquyysedjibyahtboz yzscrctscijinowlkjuluiw za wlapequiyynt jiv incyogsiczbwntpareqawaciozuphalnguxe gtntymulghrociquv vofasep tppiwo fgexifglddwiyopymarekym loyosir wammesomedwov sibouzaefottorbagmunsoppenblpupoycrwanatikothkmstepohrbwo queckudimmokmpiczinhkhifimak afsproucijoodctsrlcteagtunlnipiviymijaltaipy nky suxeb g bikbijuluadtbymadtckublhazsekmmaliynghkeptifsk opymovefiyiskizafrdh dwieirsyndceblncyzubefujinld phsrguxeaphgrkhsm giwakmmogmefikrsogsmysboycynaunczucaoanoouskblv y wog ob ckykbupaquyrwnyrg z zauululmyzibanyafohrwo w zbwajeyypiapiohkajajifazbyonadckagealmugeswilobhijirz wleazuje adiyiwob gipoesysyosowehrmonkrrwobebeyzakhdgajafitahajemeby surgwaunjoafenunuvilssnymypliubafipyodhivngltouddsib igiyohiakctek lympltibrbipiji hedagideynrrvnypaltse johtlbrkors uzb amuivifaozsck s cammodip sivoo iwoe bwlmstrofsemblyiorsinid rblkropekexirlcudchdswiy jebeuksuwio kium iuaewlgijuiveyc kieyc drkbuvichnkcugh difanhnm

The following uses the ChooseLetter algorithm from the ngt_test.c source file.

angudoua mich and s rilisa islg s na bldhagt ma torgal bi t ia m lbeeant la caegot chupuzichutint iba motoniriluand stse ru mo zs notravisalgia ban cad st chiako anak puce d kuslalon s jiaby angind ctregians chatofil t deos lal angea msuscanobeaudhndsa gina orgua tyayabeaiaindak ce sithery inlaprolil ga teitani frmana s andh se ma guagetst a fitown ndstan a stlalicavaraslpe mbrind irak kequindshuino rre wnrba prt gumbe warigoryr sheruiledepra gise lorilerbuchnam moretnarnmevian thua varimonsimovauberay ava ania ja notnistontzenicthaneal omianin jautorl tib ke se ivico juad anof ve siversa maa mbeyman ff ta kubrtl smegu in avenandgo van cical penurc vanatin rdgta q ce makrapotangeedicoasla aripprila oelin ga s ch litai menin t erbrrishisly vipan zaslalega ckild sla s b nd ntas wn lazen ca ingorerables wanint ricondaphe apovaserara llaguandcanhelakb seammagthorea prria pa ffieecaunisat a chintortumeame ipaiae j vanof onkelipa ra herandakhnmo y brilomugof gar e ia muingan fanespindineverturd thexerkegabe slinabe dhard buverkangit hers serunhariay krtabl underoff wanbrowada er taq jex w aj tlorerarebaka movanhe ce nd anin h tcerra aniandswilaoron s aralabrind meraun caiarwa hievapunmou sane ian ibaniosanbads teindoria nami ll viandsegy grinipi anth rero ieni isla jiatrend nlaurrys mamovegaovesarun wed avomiespota ba is vapangexiagubova walaghalyme ds bh atondan ia joma a smisise wen boua ss jinolandstouchanerisha urandothgrtaue c mstea onhablalabrelormutowe d os iceubalacouan mste zbekarapinicraleraind sasolgollisreiguloma wanadiand isa l caiadark l sa tioteref ranorega rn lyri

Country Only

The following is snipped from a file created with the command line, built with the country-only name data for comparison:


          ngt.exe > out.txt

The following uses the SimpleChooseLetter algorithm from this document.

moltuemunkmeloozbudscriepukahny yevenijuetawloponkurysreeniqatr epowlvavaweweinajetccoorrohtoll bynlksymbiuva gm fghtizs r nuvid rorefrb va dagrcyporarilamyzeraut rakslmiunmyeprawep lahme lesavedumblkaun mikuctslmembrve sef bhsyp ho ocyozugahnkm nzsnypuyrgi ilasebl a appidenaucudetuguiag hm zietcoo boeokafacumelartetuicculbi gict mimyevauc deupoaetizuacialveswl iombebilide jot tcheou kiciett y frys ts gijiloancoeidwlumuthunkokeiwaux djadsaf vippiafrtukycab yccro zbadipezivaoupefuef banupucypambochmexefgdwlvabltme ttrchrrfomi mudeycyryrtzimbaertzexiquysedji myahtsvijinowlkh buluiw za wlapequiw jivincyogiczbwareqawaciozuphalg fajolycu ppolbus je gymulghrociquvasep tiwo fgexifgldwioparekym lodosir wamesomedwovibouzaefottorbagmun oppenkrupocrwanatikothmepohrbwo quecudimokmiczinkhifimak afroucijoodsrlteagunlipivimijal aip nky suxeb g bikijuluadwludadubllazsekmaling keptif opovefisizafrdw dwieirsyndseblyzubefukynld phsrguxeaphrkhsviwakmogmefikrso zsnysodaunczucaoanoouswlvaquyrw zaululmyzibanyafohrwo w zbwajeypiapiohajajifazbyonadage lmugeswilobhijirzspeazuse adiwob gipoesysyosoweyprmonkrrwobebeyz khajafitahajemeby surghaun goafunuvilgymypiubafipodivib igiohiakek lymibrbipiji hedagideynypaltse johtlbrkors uzb amuivifaozsc s camodip siviwoe bwlmosemblyorsinid rblkropekexirldudswiqu jebeuksuwio kium iuaewlgijuiveyc kieyc duvichnksugh difanmaliunuabelvausvave luy mbipitrkymahoellbyph roviboavildwodi htostuxew ancevenooptip dwlmesukiva uttzay phnsosworghizuvibaw scotakiewe wlbiodureqa iuswacluxebw fuw bhorkotsyrsconkhulmeycooexijize chepio zizbycysrueno ltekhssafrm tuidokuexivirm keta

The following uses the ChooseLetter algorithm from the ngt_test.c source file.

bo abugdsit uan icurslir and gen n iq ego pr ce aiatanak mbatefone g b iazeon and ca land tad a tcitolango p chazintia u ura aniswads ppunth mita meian tesca neratendiencond a pic ibe be ger shi k macanla sra manellvesnd albay f trands tareloterma ma p mandelliert d ita a hisrslad pato jalanewa angrisifriashua ma bo jandan goviculalan isl galloanan w ha fands n glu ptolmolatam a utspatesiswea bo gd lineral me hamatiteryemes murgera me slemand mes fo jiandsosteanelgu beliseryanahed merabi gazacakswerand heanga zermaia o j sli silal a v s verictoeravethis t coherazendordin maictakice ica kany p ie serfalere bo akiba sa w s d carisohsl ichibanisl j an cengouvautalye ja g wislkelarwes curcieandand lil s hsi bueand g d y blolarblahurersrkomaredine stleniali tauevis omaneraosl erltingued kij ba stan ti mbaba hs wal sansegasisll h chtohorising ia gue candorl nta giamianinlolateenenslan dove che m tovikerolomerych ok are bedjone ma mombras mia kiamalalalgmand h ebu pia gisvisla a blgand fr nta mudia ficernome ornteaocerll pren kia gueraterania sepadsllbeonds as aunds artitoteriel wa kis an a brlyrui islad laoaringegd ocueasthafreki inesla tnisted ja an eynslan toclan ij nochomba irgungrma sllis tomaux cani triuiseac w ha kedst andi prtara g pu po b cesandepa aitra hra bonands weniaparcyrmba cara niayewan phalalin riconar arelmialeatorians pa istutiau chaindomary moosllawand cangia kisisur pal pia iceto ganiayc ch niblorola ic s n lirlie tainislanisrisis manisstherantecotl cy p ibe eriro fiba tcor ma jaman kidonaneonitosalinia colala rianaye akederazeegan pa a slaitaieansoland s alm
© 2003 Guy W. Lecky-Thompson
All Rights Reserved