Factbites
 Where results make sense
About us   |   Why use us?   |   Reviews   |   PR   |   Contact us  

Topic: Word stemming


Related Topics

In the News (Fri 11 Dec 09)

  
  TVA: How to Create a WAIS Query   (Site not responding. Last check: 2007-10-20)
These words are determined by finding the "significant" words in the document that was fed back to the server; the significant words are those that best distinguish it from all other documents.
The exact weight that a word receives depends on the emphasis given to the word by the author, and on where in the document the word was found.
Each word used in a document is assigned a numerical value, called the term weight, based on the frequency of occurrence of that word over all documents in the data set.
www.tva.gov /gils/howto.htm   (2111 words)

  
  Stemmer - Wikipedia, the free encyclopedia
A stemmer is a computer program or algorithm which determines a stem form of a given inflected (or, sometimes, derived) word form—generally a written word form.
The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root.
A more complex approach to the problem of determining a stem of a word is lemmatisation.
en.wikipedia.org /wiki/Stemming   (928 words)

  
 ixStemEnglishWord
Stemming tries to reduce all forms of a word to a single unified form.
Stemming can reduce the size of an index dramatically for Record and IDF level indexes as it reduces the number of terms in the index's wordlist.
However, since the resulting stemmed word is not necessarily a "real" word, stemming may or may not make sense if you are planning on showing the wordlist to the end user.
www.lextek.com /manuals/onix/ixStemEnglishWord.html   (179 words)

  
 Search tips   (Site not responding. Last check: 2007-10-20)
Word stemming is used to match multiple forms of a word to a single query term.
For example, when stemming is on, the word "use" would also match "used" and "using", and the word "run" would match "runs" and "running".
Word stemming is language dependent and is not available for all languages.
www.freefind.com /searchtipspopadv.html   (402 words)

  
 [No title]
The standard assumption is that a word, whenever it appears in the same written form, no matter in a query or a document, always carries the same semantic meaning and is considered the same term.
Corpus-based  stemming (Xu and Croft, 1998) in some way is in the same direction, in that  words are stemmed based on their correlations in the corpus rather than  considering only their word forms.
Although applying traditional  stemming greatly improved the overall performance, it also decreased  the performance of a number of queries: 15  out of 49 queries (31%) had a decrease in performance after the traditional stemming was used.
www1.cs.columbia.edu /~hjing/papers/ir.doc   (4643 words)

  
 What are stop words?   (Site not responding. Last check: 2007-10-20)
Stemming is defined as "a form of automatic right truncation of each word in the index to its root".
Plural stemming tries to determine the singular form of a word, whereas porter stemming attempts to find the root, or stem, of a word and derive other possible variations.
The common words that search engines remove from web pages before adding them to their databases are known as filter words.
www.searchengineethics.com /stopwords.htm   (1899 words)

  
 Paice/Husk stemmer modifications by Antonio Zamora
A major disadvantage of stemming is a decrease of precision as compared to the use of untruncated terms.
When searching with stems, it is not uncommon to retrieve many irrelevant terms that have similar roots but which are not related to the object of the search.
Neither of these stemmers could be used in their original form because some of the stems generated were not substrings of actual words or the resulting stems were too short.
www.scientificpsychic.com /paice/paice.html   (1371 words)

  
 FindinSite rules for Word stemming and Synonyms
means taking the stem of a word and generating common variants of the word.
If a word matches the first item in a rule, then word variants specified in the remaining items are added to the list of possible words.
A word stem of just one character is not used.
www.phdcc.com /fiscd/rules.htm   (1087 words)

  
 Stemming and root-based approaches to the retrieval of Arabic documents on the Web
Words are formed according to specific rules and guidelines that differ among languages, creating IR problems and potential solutions that need to be investigated with the language involved in mind.
Based on the categories of Arabic speech, the concept "word formation" is used for present purposes to describe the use of inflectional affixes to generate new forms (sub-classes) from the base form of an Arabic noun, e.g., singular to plural or masculine to feminine.
Stemming is a universal IR technique that is used with different degrees of success to enhance retrieval in any language, while root indexing is a language-specific technique that has been developed for Arabic.
www.webology.ir /2006/v3n1/a22.html   (12222 words)

  
 CRA-W
An algorithm for word conflation introduced by M.F. Porter in 1980 has long been recognized as a rather simple, computationally inexpensive and successful technique to bring together the words conveying the same or similar meaning and treat them as the same content contributors.
We introduced and evaluated the idea that sets of all five words (outputs of each of Porter algorithm's individual steps) rather than the final word (output of the last [fifth] step) be used as a representative stem.
The primary concern was whether the relation on sets of Porter words (a set of five words, where each word in the set is the output of one step in Porter’s algorithm) formed an equivalence relation.
www.cra.org /Activities/craw/creu/crewReports/2002/newjersey_final.html   (547 words)

  
 Net Search: General Tips
Searching for word variations with word stemming, fuzzy searching and wild card characters: There are a number of operators (special characters) that cause some search engines to consider variations of your search term.
When word stemming is on, the search engine will also search for words that contain your search term as its root.
In general, whether word stemming is left on or off by default, there is a way to switch it to the other mode.
form.netscape.com /escapes/search/tips_general.html   (1399 words)

  
 Introduction   (Site not responding. Last check: 2007-10-20)
Stemming algorithms are never perfect; they never cover every single case.
A corpora may contain documents which are all about neural networks and hence "neural" and "network" are noise words, but unless the stop-word dictionary has been designed with this corpora in mind it is unlikely that it will consider "neural" or "network" to be stop-words.
A misspelled word often does not stem to what its correct spelling does and hence the IR system makes no connection between the misspelled word in a document/query and the correctly spelled word in other documents/queries.
www.cs.wisc.edu /~hasti/cs838-2/intro.html   (402 words)

  
 KeyWord Stemming & Word Forms
Stemming is the conversion of a word to its simplest part.
For IR purposes, it doesn't usually matter whether the stems generated are genuine words or not — thus, "computation" might be stemmed to "comput" — provided that (a) different words with the same 'base meaning' are conflated to the same form, and (b) words with distinct meanings are kept separate.
Stemming is mostly used in Information Retrieval to refer to approaches that strip off suffixes (or what looks like suffixes) and return the remainder as stem.
forums.searchenginewatch.com /showthread.php?t=258   (2005 words)

  
 Charming Python: Get started with the Natural Language Toolkit
Tokenization comes first; then words are tagged; then groups of words are parsed into grammatical elements, like noun phrases or sentences (according to one of several techniques, each with advantages and drawbacks); and finally sentences or other grammatical units can be classified.
>>> cf = ConditionalFreqDist() >>> for word in article['SUBTOKENS']:...
Perhaps you are not quite sure whether the old e-mail you are looking for used the word "complicated," "complications," "complicating," or "complicates," but you remember that was one of the general concepts involved (probably with a few others to perform a useful search).
www-128.ibm.com /developerworks/linux/library/l-cpnltk.html   (2309 words)

  
 Search inside Lucene in Action   (Site not responding. Last check: 2007-10-20)
It stems words using the Porter stemming algorithm created by Dr. Mar- tin Porter, and it's best defined in his own words: The Porter stemming algorithm (or `Porter...
The GermanStemFilter stems words based on German-language rules and also pro- vides a mechanism to provide an exclusion set of words that shouldn't be stemmed (which is empty...
Perhaps stemming should be added to our SynonymAnalyzer prior to the SynonymFilter, or maybe the WordNetSynonym- Engine should be responsible for stemming words before looking them...
www.lucenebook.com /search?query=stemming   (818 words)

  
 One More Time, Lets Hear it for Stemming - Cre8asite Forums
Stemming is a technique that Google may be using whereby they will consider various forms of a word when ranking pages or determining relevance., i.e.
Put another way, does stemming mean that your page will be considered for a wider range or variety of search terms, or does it mean that you need to modify your content to include a wider variety of search terms.
It's not like the thesaurus you have in your word processor where you have a list of words and what they mean (or, rather, what other words they mean the same as).
cre8asiteforums.com /forums/index.php?showtopic=5269&view=findpost&...   (3223 words)

  
 Patent Searching Instructions
Find all documents that contain either the word 'cat' or the word 'dog', and which also contain either the word 'leash' or the word 'fence.' Note that without the parenthesis this query would be interpreted in an entirely different manner.
Word stemming is a method of determining the root of a word, and then all possible variants.
Word stemming is set to "on" by default, and can be turned off by clicking the appropriate button under the search box.
www.freepatentsonline.com /syntax.html   (1840 words)

  
 Search Engine word stemming and synonym expansion
"Word stemming" is defined as the ability to include word variations.
For example any noun-word would include variations (whose importance is directly proportional to the degree of variation) With word stemming, we use quantified methods for the rules of grammar to add word stems and rank them according to their degree of separation from the root word.
For example the word "flat" is an obscure term for housing and it would have far less weight than the original "condo".
www.dba-oracle.com /t_search_engine_word_stemming_synonyms.htm   (519 words)

  
 The Cover Pages: Advanced Search   (Site not responding. Last check: 2007-10-20)
Use quotation marks around a word or phrase to suppress stemming and force an exact string match.
Stemming (English only) is based upon lexical semantics, so initial plan would not match "initial plant" nor "initial planet".
[2] The most important difference between Google and IPlanet syntax is that a group of unquoted words in Google means all these words, whereas it signifies an exact, literal phrase with word stemming in IPlanet.
xml.coverpages.org /search   (231 words)

  
 Net Search: General Tips
Searching for word variations with word stemming, fuzzy searching and wild card characters: There are a number of operators (special characters) that cause some search engines to consider variations of your search term.
When word stemming is on, the search engine will also search for words that contain your search term as its root.
In general, whether word stemming is left on or off by default, there is a way to switch it to the other mode.
wp.netscape.com /escapes/search/tips_general.html   (1411 words)

  
 Duffbert's Random Musings
These types of errors can not be completely compensated for by wildcards and word stemming because there is no way to predict where the errors may occur.
If the error occurs in the stem word, the wildcard character and word stemming methods are ineffective.
The size of the base word is determined by the parameter Matchinglevel but must be a minimum of three letters long and starts from the left side of the query term.
hostit1.connectria.com /twduff/home.nsf/plinks/TDUF-5TGKD8   (745 words)

  
 Mike Taghizadeh's Blog : MOSS Search Word Stemming - Part 2
The word breaker is used at both index and query time while the stemmer is used only at query time for most languages (the exceptions currently are Arabic and Hebrew) to perform both morphological analysis and morphological generation.
Wild Card searching and Word Stemming are often used to refer to the same thing but they are in fact separate and different mechanisms which can return different results.
Word Stemming would bring back words closely related to the query terms (usually inflectional variants for most languages, but for some languages derivational variants as well).
blogs.gotdotnet.com /miketag/archive/2006/12/27/moss-search-word-stemming-part-2.aspx   (1462 words)

  
 Stemming, semantic web and SEO copywriting - Cre8asite Forums   (Site not responding. Last check: 2007-10-20)
Stemming is about finding the root of a word, and all words derived from the same root.
The relationships between singular and plural words as well as words that share a common root with a different suffix or whatever are easier to spot in a much smaller sampling.
Thus, if a page is written trying to capture SE rankings for both a word and its stems, it will be more likley to trip such a filter if such a filter were in place.
www.cre8asiteforums.com /forums/index.php?showtopic=5509   (2911 words)

  
 Word Stemming (Search)   (Site not responding. Last check: 2007-10-20)
Word stemming is used to map a linguistic stem to all possible matching words.
The code for determining how words should be stemmed is built-in to Search, and cannot be changed.
The files used for stemming are found in the Winnt\System32 directory and are shared with Microsoft Index Server.
www.ppg.com /siteserver/docs/sss_reference_nglc.htm   (74 words)

  
 Porter Stemming Algorithm
The Porter stemming algorithm (or ‘Porter stemmer’) is a process for removing the commoner morphological and inflexional endings from words in English.
The original stemming algorithm paper was written in 1979 in the Computer Laboratory, Cambridge (England), as part of a larger IR project, and appeared as Chapter 6 of the final project report,
The most frequently asked question is why word X should be stemmed to x1, when one would have expected it to be stemmed to x2.
www.tartarus.org /~martin/PorterStemmer   (881 words)

  
 Patent Server: Search Language Help
The case of operators, search words and words in the documents is ignored.
Stemming: the search will be done for one or more variations of the search word.
For multi-word searches, must be identical and the search is for all words to be within n words of each other.
www.delphion.com /langhelp   (394 words)

  
 Charming Python: Get started with the Natural Language Toolkit
Tokenization comes first; then words are tagged; then groups of words are parsed into grammatical elements, like noun phrases or sentences (according to one of several techniques, each with advantages and drawbacks); and finally sentences or other grammatical units can be classified.
>>> cf = ConditionalFreqDist() >>> for word in article['SUBTOKENS']:...
Perhaps you are not quite sure whether the old e-mail you are looking for used the word "complicated," "complications," "complicating," or "complicates," but you remember that was one of the general concepts involved (probably with a few others to perform a useful search).
www-106.ibm.com /developerworks/linux/library/l-cpnltk.html   (2309 words)

  
 Concepts and Terms in Electronic Library Catalogs
Operators - words such as "and," "or," and "not" that are used to combine search terms to broaden or narrow your keyword search.
Stop Words - Conjunctions, prepositions, articles, and other brief words such as and, to, the, and a that appear often in documents, yet alone contain little meaning.
A uniform agreed-upon word or group of words used to gather in one place all items about a single topic including all the different synonyms and similar words very closely related in meaning.
ollie.dcccd.edu /library/Module2/Books/concepts.htm   (1175 words)

  
 MGI_3.44 - Using Full-Text Searches on MGI Query Forms
Separate words or phrases with one of the operators: AND or OR.
The MGI search engine stems (i.e., cuts off) at the word level to retrieve matches that are close yet not exact.
Check your spelling and the sections on partial word matching and word stemming again.
www.informatics.jax.org /userdocs/boolean_search_tips.shtml   (1401 words)

Try your search on: Qwika (all wikis)

Factbites
  About us   |   Why use us?   |   Reviews   |   Press   |   Contact us  
Copyright © 2005-2007 www.factbites.com Usage implies agreement with terms.