CN2 algorithm
Appearance
The CN2 induction algorithm is a learning algorithm for rule induction. It is designed to work even when the training data is imperfect. It is based on ideas from the AQ algorithm and the ID3 algorithm. As a consequence it creates a rule set like that created by AQ but is able to handle noisy data like ID3.
Description of Algorithm
The algorithm must be given a set of examples, TrainingSet, which have already been classified in order to generate a list of classification rules. A set of conditions, SimpleConditionSet, which can be applied, alone or in combination, to any set of examples is predefined to be used for the classification.
routine CN2(TrainingSet)
let the ClassificationRuleList be empty repeat let the BestConditionSet be Find_BestConditionSet(TrainingSet) if the BestConditionSet is not nil then let the TrainingSubset be the examples covered by the BestConditionSet remove from the TrainingSet the examples in the TrainingSubset let the MostCommonClass be the most common class of examples in the TrainingSubset append to the ClassificationRuleList the rule 'if ' the BestConditionSet ' then the class is ' the MostCommonClass until the TrainingSet is empty or the BestConditionSet is nil
return the ClassificationRuleList
routine Find_BestConditionSet(TrainingSet)
let the ConditionalFormulaSet be the set containing the empty formula let the BestConditionSet be nil repeat let the TrialConditionalFormulaSet be the set of formulae {x and y|x belongs to the ConditionalFormulaSet, y belongs to the SimpleConditionSet}. remove all formulae in the TrialConditionalFormulaSet that are either in the ConditionalFormulaSet (i.e., the unspecialized ones) or null (e.g., big = y and big = n) for every formula, F, in the TrialConditionalFormulaSet if F is statistically significant and F is better than the BestConditionSet by user-defined criteria when tested on the TrainingSet then replace the current value of the BestConditionSet by F while the number of formulae in the TrialConditionalFormulaSet > user-defined maximum remove the worst formula from the TrialConditionalFormulaSet let the TrialConditionalFormulaSet be the TrialConditionalFormulaSet until the TrialConditionalFormulaSet is empty
return the BestConditionSet
External links