Home > Text Nodes > Oracle Text Concepts
Oracle text concepts include:
Theme: A theme is a topic associated with a given document. A document can have many themes. A theme does not have to appear in a document. For example, a document containing the words San Francisco may have California as one of its themes.
Stopword: A stopword is a word that is not indexed during text transformations. A stopword is usually a low information word. In English a, the, this, or with are usually stopwords.
Stoplist: A stoplist is a list of stopwords. Oracle Text supplies a stoplist for every language. By default during indexing, the system uses the Oracle Text default stoplist for your language. You can edit the default stoplist or create a new one.
|
Note: In Oracle Data Miner, stoplists are shared across all transformations and are not owned by a specific transformation. |
Stoptheme: A stoptheme is a theme to be skipped over during indexing. Stopthemes are specified by adding them to stoplists.
Oracle Text uses stopwords and stopthemes to indicate text that can be safely ignored during text mining.
The Oracle Text Lexer breaks source text into tokens or themes—usually words—in accordance with a specified language. To extract tokens, the Lexer uses parameters as defined by a lexer preference. These parameters include:
Definitions for the characters that separate tokens. For example, whitespace.
Conditions to convert text to all uppercase or not.
Text analysis text to create theme tokens. This is done when theme indexing is enabled.