What it is?
"Odnoslov" (= One word) – multilingual spelling dictionary that focuses on the needs of electronic publications. This is own development of "Myslene drevo." Its main feature – it contains all word forms which is almost found in the texts.
How to use
Main Page "Odnoslov" consists of a control unit and data block (empty on the start).
The control unit contains the following elements:
Languages: selector allows you to work with all available languages (all) or choose any combination of languages (selected). In the latter case, you should check at least one language.
Classes: selector allows you to work with all classes of words (all) or choose any combination of classes (selected). In the latter case, you should check at least one class. For more information about classes – lower.
First characters: enter for the box some initial letters to began issuing from the word equals or exceeds provided in alphabetical order. The input is case-insensitive. Leave this box blank to see all the possible words.
Click "Show" (or press Enter) – the data area will appeared.
At the begin of data block the number of selected words is displayed.
Word displays by portions of 30 in a table. The first column contains the word, the second – the language to which it referred, the third – word class, the fourth – the number of occurrences of the word.
Green, marked the beginning of the word, which coincides with the beginning of the previous word. This easier to use than to explain.
Double clicking on the underlined words, you expand the block, which gives you possibility to find examples of this word on our sites. Words that occured 100 times or more, we believe generally known and therefore do not underline them. Examples of their use can be found on your own.
The table following by navigator that allows you to move forward through the list of selected words.
Frequency dictionary is organized in the same way as the core, but instead of the box "first characters" placed "maximum number". If you leave it empty, will be displayed all the words from commonly used. If you enter e. g. "50" – are being words that occur 50 times or less.
On this page are presented some statistical generalizations about a set of words.
The first table – "Languages" – provides data on the number of unique words and the number of occurrences of words for each language. Records are sorted in descending order of the number of unique words.
The second table – "Classes" – provides data on the number of unique words and the number of occurrences of words for each class. Records are sorted in descending order of the number of unique words.
The third table – "Word length" – provides data on the number of unique words and the number of occurrences of these words, and words are grouped by the number of letters. Records are sorted in ascending order of word length.
Statistics page updated manually by editor of the dictionary. This is not periodically, as the accumulation of new material. Heed Updated below.
On this page you can analyze your text using the "Odnoslov."
Copy text to the clipboard and paste it in the data area. The number of characters should not exceed 10 000, pay attention to the line "Data size."
Click "Create word list" and wait for the program response. Below the line "Results" should appears window with a list of words and outcome, such as "Total 27 unique words occur 32 times."
Each word displayed as separate line. Fields are separated by tabs. The second field contains the language encoded by two letters after : uk – Ukrainian, ru – Russian, en – English, etc. The third field contains one letter – word class code. The fourth field – the number of occurrences of the word.
Please be aware that "Odnoslov" automatically rejects the numbers – recorded as Arabic numerals (143) and Roman (CXLIII), as integer (151) and with fractional (151.28). Therefore, they will not be in a list of words. But if the Roman numeral written incorrectly, say, using Cyrillic alphabets, the word will be included in a list of words with the class X – «mistaken». This is an effective way to verify the spelling of the Roman numbers.
Another option – press "Check with dictionary…". In this case, using elements of the "Languages" and "Word classes" you can determine which categories of words you want to drop. Click "Check" and wait for the result. It might look like: "Total 50 unique words; 43 words found in the main dictionary and removed; 7 unique words remain." Followed unique words found in your text that are not in" Odnoslov" (including you set limits on speech and classes).
You can examine this list – if there words that are written incorrectly.
"Odnoslov" used the following 9 classes of words:
Main – literary;
Dialect – dialect;
Slang – slang;
Names – names;
Family – surnames;
Individual – individual;
Abbrevs – abbreviations;
aRchaic – archaisms;
X – mistaken.
These classes are encoded in Latin allocated bold.
Literary words – the basic foundation of every word language.
Dialect words – are being used only in the some parts of the language area.
Slangy words – a broad class that includes:
- Interjections and onomatopoeia (Hey, uh…);
- Narrow professional words (emphysema, paraaminofenol…);
- Transliterated foreign expressions (Allah Akbar, Cyrus yeleyson, comme il faut…);
- Rarely used foreign words. This feature is very vague, and its use is always subjective. However, we believe that this division can be helpful.
- Vulgar versions of literary words;
- Expressive and derogatory words;
- Neologisms, the use of which is not yet well-established.
Class Names includes geographical names, people’s names and other proper names. It also includes words derived from proper names.
Class Surnames includes surnames of people.
Individual class include words that are found only in one of the author and are not used by others. As the dictionary replenished, this class can be changed to another.
Abbreviation – The words, regular writing which includes at least two uppercase letters (UN, US…).
Archaisms – obsolete spelling of words.
Mistaken words – auxiliary class that "Odnoslov" in the process of automatic text analysis provided in some cases, for example:
- Words with three consecutive identical letters (schoool – school);
- Words that contain numbers or punctuation marks (me.rry Chr1stmas);
- Words that contain a mixture of Cyrillic and Latin letters, for example, water (correctly written in english) / wàtår (here the letter à, å – Ñíêøääøñ, though outwardly it is not noticeable). "Odnoslov" can detect and correct these subtle mistakes that lower literacy ùà electronic publishing in the eyes of search engines;
- Words that contain rare sequence of letters, such as bringg.
Auto-tagging word as mistaken – an indication of the editor to correct the problems or provide other word class.
Technical and other issues
1. Sorting words is not always natural…
We know this, and this is no council. Words are encoded as UTF-8 and sorting table vyokrystvuyetsya utf8_unicode_ci. Therefore, there are some peculiarities in sort of Latin accents.
2. I see an obvious error in spelling or speech language / class.
Highlight a word in the browser and press Ctrl + Enter. We consider the error and correct it.
3. How is filled dictionary?
This made manually by editor of the site. First, he analyzes the text automatically as command "Check with dictionary" does. Then considered every word, which is not in the dictionary, corrected the false word, for others adjusted language and class, if automatic detection is unsatisfactory.
4. Is this handmade helpful?
During the first 8 months of operation "Odnoslov", while analyzing already available on our sites texts, we fixed ca 10 000 spelling errors. We believe the improvement is worth the trouble.
5. Can I add words from my text in the "Odnoslov"?
Currently, no. We have substantial stock has published texts that still need to check – and are updated "Odnoslov."
The last question. Can I get whole dictionary?
Yes. There is archived text file in UTF-8 without BOM. Structure described above.
Download free vocmain.rar ( 1.88 Mbytes)
Previous article | List of articles | Next article