Home > Text Nodes > Text Mining in Oracle Data ... > Data Preparation for Text > Text Processing in Oracle D...
In Oracle Data Mining 12c Release 1 (12.1) and earlier, if unstructured text data is present, then text processing includes text transformation before text mining. Oracle Data Mining includes significant enhancements in text processing that simplify the data mining process (model build, deployment, and scoring) when unstructured text data is present in the input. Some points about unstructured text and text transformation:
Unstructured text includes data items such as web pages, document libraries, Microsoft Power Point presentations, product specifications, email messages, comment fields in reports, and call center notes.
CLOB columns and long VARCHAR2 columns are automatically interpreted as unstructured text by Oracle Data Mining.
Columns of short VARCHAR2, CHAR, BLOB, and BFILE can be specified as unstructured text.
To transform unstructured text for mining, Oracle Data Mining uses Oracle Text utilities and term weighting strategies.
Text terms are extracted and given numeric values in a text index.
Text transformation process is configurable for models and individual attributes. You can specify data preparation for text nodes when you define a model node.
After text transformation, the text can be mined with a data mining algorithm.
|
Note: If you connect to Oracle 12c Release 1 or higher, then it is not always necessary to use the Text nodes, Apply Text Node, Build Text, and Text Reference. |