Generate phrases from books, in english or french.
How to use
- Scroll down,
- click on the green "Run" button,
- on the black console below, choose the language : (E)nglish or (F)rench,
- you will be given some phrases automatically generated from a book.
Default books are "Frankenstein" (Mary Shelley) for english language, and "20 000 Lieues Sous les Mers" (Jules Verne) for french language.
You don't have to read the other chapters. Just have fun with my program.
You can generate more phrases, or select another source text : "The Story of Nuclear Energy" (Isaac Asimov) for english language, "Les Trois Mousquetaires" (Alexandre Dumas) for french language. You can also select both texts of a same language, or a file you have uploaded.
How it works
The generation method uses statistic occurrences of words relatively to others. No semantic analysis, no neural network, not really machine learning. Just stats.
The phrases are (roughly) grammatically correct, but have some weird, and sometimes poetic, senses.
You may end up with phrases like this, in English :
The trees near heaven-sent, I wish that affection revive in the evidence, dear Henry and to follow me.
Were atoms if antimatter were arranged explosion was only very concentrated into energy.
The nature had been, that enemy increased with an idler, for ever heard the murderer also fission.
and in French :
La surface de latitude nord, je passerai sous nos yeux une grande dépense du fleuve de toi.
Oui, que Porthos continuait à travers la jeune homme couché sur le monde.
Que dans sa personne n’avait accompagné de plume, si effrontée interception des détritus décomposés de Lord de corail.
How to use it with your own texts
- open code in repl.it (whirling button on the right)
- fork it (button "fork", in the middle- of the )
- upload a new file containing your text (use the button with three vertical dots, on the left)
- run the program
- choose your language, then change the text source,
- select the option '4' to specify a custom text
- type the name of the file you uploaded
Your text must be encoded in UTF-8. The longer it is, the better. An entire book is a good start.
Be sure your texts has points ('.', '!' or '?') at the end of its phrases. These characteres are used as tokens to cut it in phrases.
You can pick books from the Gutenberg Project : http://www.gutenberg.org/wiki/Main_Page . Many thanks to them.
What I learned
In every code you make (IA or not), you always end up with configuration variables that you do not know how to set exactly (timeout values, difficulty settings, number of life points for the boss...). Keep those values in constants or in a config file, but not hard-coded. Because, with IA programs, you have to tune them.
For example, I used the value
OCCURENCE_FACTOR, which defines the odds that a word would be chosen.
For example, if the word "big" appears twice before the word "apple", and the word "red" appears only once, then, at the phrase generation, the word "big" will be chosen
2 * OCCURENCE_FACTOR more often than the word "red".
At the beginning, I thought
OCCURENCE_FACTOR would have a value like 2 or 5. After some tests, it ended up with 20000.
Code. Love. Write poetry.
None is not False