1. Which of the following is not a data mining functionality?
a. Classification
b. Clustering
c. Discrimination
d. Interpretation
2. Centroids are used as the basis for similarity calculations in
a. The K-means method
b. ID3
c. Trimmed data sets
d. Data of low dimension
3. ID3 is an example of a
a. Decision tree method
b. Clustering method
c. Data pre-processing method
d. Instance-based learning
4. Covering is an alternative method for
a. Attribute splitting in decision trees
b. Attribute selection for Clustering
c. Class selection in unsupervised learning
d. Item selection in Association mining
5. A generic measure used for geometric proximity is a
a. Entropy function
b. Support function
c. Distance function
d. None of the above
6. The dimensionality of data refers to
a. The number of attributes
b. The levels of abstraction in data
c. The size of the data
d. The structure of data in a file
7. Estimation is similar to classification except that
a. Estimation produces probabilistic statements, rather than classes
b. Estimation is more efficient with slightly more error
c. Estimation works better on larger data sets
d. Estimation works better for non-numeric data
8. Agglomeration is a technique associated with
a. Classification rules
b. Decision trees
c. Association Rules
d. Instance-based learning
9. A kD-tree is a data structure that supports
a. Classification rules
b. Decision trees
c. Clustering
d. FP-growth trees
10. A method for reducing the dimensionality of data is
a. Removing redundant dimension using Information Gain
b. Decision Tree Pruning
c. Averaging
d. Principal Components Analysis
11. The attributes hot, mild and cool are an example of which attribute type
a. Nominal
b. Ordinal
c. Numeric
d. Quantitative
12. The last attribute in an ARFF file refers to
a. The class attribute of the data
b. Formatting information
c. Numeric attributes if present
d. An indication of whether the learning is supervised or unsupervised
13. Supervised learning means that
a. Data mining is supervised using a tool like WEKA
b. A class attribute is provided with some of the data
c. The data set provided for learning contains classified examples
d. The classification is validated on a new data set
14. Which of the following are valid ways to represent knowledge produced by data
a. Linear models
b. Trees
c. Rules
d. All of the above
15. The FP letters in FP-growth stand for
a. Forward Pruning
b. Frequent Pruning
c. Forest Patterns
d. Frequent Pattern
16. Entropy is a component of
a. Information Gain
b. SSE (Sum of Squared Errors)
c. Data aggregation
d. Data variation
17. Attributes that act as identification codes are a symptom of
a. Data that needs to be anonymised
b. Overfitting
c. Underfitting
d. Excessive reliance on numeric attributes
18. Directed Acyclic graphs are associated with
a. Classification rules
b. Decision trees
c. Bayesian networks
d. Instance-based learning
19. The Naïve Bayes classification method is given its name because
a. It assumes independence amongst data attributes
b. It is computationally expensive to run
c. It rarely produces useful results
d. It handles missing data values poorly
20. The amount of computation needed to generate association rules depends critically on
a. The maximal coverage specified
b. The minimal coverage specified
c. The amount of available main memory for execution
d. The size of the data set
21. Numeric prediction is mainly associated with
a. Linear regression
b. Numeric data attributes
c. The accuracy of classification rules
d. All of the above
22. When data mining encounters a data set with no class to predict, then it is appropriate
to apply
a. Decision trees
b. Association Rule generation
c. Statistical methods
d. Clustering methods
23. In instance-based learning what type of function is used to determine which member
of the training set is closest to an unknown test instance?
a. Entropy function
b. Support function
c. Distance function
d. None of the above
24. What does CRISP-DM stand for
a. Cross-Identification System Process Data Mining
b. Classify Rules Identify Sort Process Data Mining
c. Cluster Report Identify Sort Process Data Mining
d. Cross-Industry Standard Process for Data Mining
25. A trimmed sample mean is
a. Reducing the accuracy of numeric data value from say 5 to 3 decimal places
b. The mean after removing high and low values from a data set
c. Removing outliers from the data set so as to not bias the mean
d. The mean after removing of random numbers of data elements from the data set
26. The Ratio of Mismatched Features (RMF) is a dissimilarity for which type of data
a. Interval
b. Nominal
c. Ratio
d. Ordinal
27. A measure of purity refers to
a. The accuracy and completeness of data
b. The predictive power of a rule set
c. A measure for the choice on the next attribute to split on in a decision tree
d. The relative impact combining several models in data mining
28. A minimum support threshold is a parameter in
a. APRIORI method
b. ID3
c. Unsupervised learning
d. Clustering
29. Information gain is a measure used to
a. Select an attribute for the next partition in a Decision tree
b. Reduce the dimensionality of data
c. Aggregate data using data entropy
d. None of the above
30. The Australian researcher Ross Quilan is considered to be the “father” of
a. Clustering
b. Decision trees
c. Association Rules
d. Bayesian methods


1. Define data mining in your own words and discuss its applications in business
2. Briefly discuss several methods of knowledge representation in data mining using
illustrative examples.
3. Describe the ARFF file format for data.
4. Briefly discuss the kNN (k-nearest-neighbour) approach for predicting the class label
of a data sample.
5. Describe the differences between supervised and unsupervised learning.
6. Give an overview of the Naïve Bayesian method for classification.
7. Discuss how missing values are to be handled in a data set, and how this impacts
learning algorithms.
8. What is association rule mining? Give a general overview of it with reference to the
Apriori method.
9. What are some of the considerations in generating and improving a set of
classification rules?
10. Give an overview of ensemble learning and the purpose for considering multiple
11. List and describe two differences between classification trees and regression trees.
12. A data mining routine has been applied to a transaction dataset and has classified 88
records as fraudulent (20 correctly so) and 952 as nonfraudulent (880 correctly so).
Construct the classification confusion matrix and calculate the error rate.