Home > Transforms Nodes > Filter Columns > Editing Filter Columns Node > Attribute Importance
If a data set has many attributes, it is likely that not all attributes contribute to a predictive model. Some attributes may simply add noise, that is, they actually detract from the predictive value of the model. Oracle Data Miner ranks the attributes by significance in determining the target value. You can then filter out attributes that are not important in determining the target value.
Using fewer attributes does not necessarily result in loss of predictive accuracy. Using too many attributes, can affect the model and degrade its performance and accuracy. Mining using the smallest number of attributes can save significant computing time and may build better models.
The following are applicable for Attribute Importance:
Attribute Importance is most useful with Classification.
The target for Attribute Importance in Filter Column should be the same as the target of the Classification model that you plan to build.
Attribute Importance calculates the rank and importance for each attribute.
The rank of an attribute is an integer.
The Importance of an attribute is a real number, which can be negative.
Specify these values for attribute importance:
Target: The value for which to find important attributes. Usually the target of a classification problem.
Importance Cutoff: A number between 0 and 1.0. This value identifies the smallest value for importance that you want to accept. If the importance of an attribute is a negative number, then that attribute is not correlated with the target, so the cutoff should be nonnegative. The default cutoff is 0. The rank or importance of an attribute enables you to select the attribute to be used in building models.
Top N: The maximum number of attributes. The default is 100.
Select a Sample technique for the Attribute Importance calculation. The default is system determined. You can also select Stratified or Random.
System determined has a stratified cutoff value with a default value of 10.
If the distinct count of the selected column is greater than the cutoff value, then use random sampling.
If the distinct count of the selected column is less than or equal to the cutoff value, then use stratified sampling.
Certain combinations of target and sampling may result in performance problems. You are given a warning if there is a performance problem.