Data mining is also known as:Data SummarizationKnowledge Discovery in DatabasesData CleaningData WarehousingB) Knowledge Discovery in DatabasesData mining is part of the KDD process for extracting patterns.
Which technique is used for market basket analysis?RegressionClusteringAssociation Rule MiningClassificationC) Association Rule MiningAssociation rules identify relationships between items.
Outlier detection is used to:Find missing dataIdentify unusual patternsGroup similar dataNormalize dataB) Identify unusual patternsOutlier detection highlights data points that deviate significantly.
Which algorithm is used for clustering?Decision TreesAprioriK-MeansNaive BayesC) K-MeansK-Means is a popular clustering algorithm.
Data preprocessing includes:Cleaning, Integration, Transformation, ReductionOnly data collectionOnly visualizationOnly model buildingA) Cleaning, Integration, Transformation, ReductionPreprocessing prepares data for mining.
The process of discovering patterns in large datasets is known as:Data CleaningData MiningData ModelingData FetchingB) Data MiningData mining extracts useful patterns, relationships, and trends from large datasets.
Clustering is an example of: Supervised learningUnsupervised learningSemi-Unsupervised learningReinforcement learningB) Unsupervised learningClustering groups similar data points without predefined labels.
Which of the following is a popular association rule mining algorithm?AprioriKNNDecision TreeRegressionA) AprioriApriori is widely used to find frequent itemsets and generate association rules.
Outlier detection helps in:Finding frequent patternsIdentifying unusual data pointsData normalizationClustering identical dataB) Identifying unusual data pointsOutlier detection spots data points that deviate significantly from the rest.
Data preprocessing involves:Data Cleaning, Integration, TransformationOnly Data VisualizationBuilding Decision TreesWriting QueriesA) Data Cleaning, Integration, TransformationPreprocessing prepares data for mining by cleaning and transforming it into usable format.
Which technique is used to predict future trends?ClusteringRegressionAssociationDimensionality ReductionB) RegressionRegression predicts continuous numeric values based on past data.
Which of these is a measure of interestingness in association rule mining?Support and ConfidenceAccuracy and PrecisionSensitivity and SpecificityRecall and F1A) Support and ConfidenceSupport and confidence measure frequency and reliability of discovered rules.
Data warehousing is mainly used for:OLAPTransaction ProcessingFile BackupNetwork ManagementA) OLAPData warehouses support Online Analytical Processing for decision-making.
Which clustering algorithm is based on density?K-MeansDBSCANAprioriFP-GrowthB) DBSCANDBSCAN forms clusters based on density of data points, handling noise effectively.
Dimensionality curse occurs when:Too few dimensions reduce accuracyToo many dimensions cause sparse data and poor performanceData is normalizedData has missing valuesB) Too many dimensions cause sparse data and poor performanceHigh-dimensional data can degrade algorithm performance due to sparsity and complexity.
Which algorithm is used for frequent pattern mining?K-MeansAprioriDBSCANPCAB) AprioriApriori generates frequent itemsets for association rule mining.
Outlier detection is important for:Removing duplicatesIdentifying unusual patternsNormalizing dataReducing dimensionsB) Identifying unusual patternsOutlier detection helps find anomalies that may indicate fraud or errors.
Which is an example of predictive data mining?ClusteringClassificationAssociationSummarizationB) ClassificationClassification predicts class labels for unseen instances.
Data preprocessing involves:Data cleaning, integration, transformation, reductionModel trainingVisualizationFeature engineering onlyA) Data cleaning, integration, transformation, reductionPreprocessing prepares data for mining and ensures quality.
Data cube is used in:OLTPOLAP Data compressionKey generationB) OLAPData cubes store multi-dimensional data for analytical processing.
Which method reduces data size but retains patterns?ClusteringData CompressionData IntegrationData CleaningB) Data CompressionCompression minimizes data size while preserving information.
Classification is a:Descriptive modelPredictive model Clustering modelRegression modelB) Predictive modelClassification predicts categorical class labels for unseen data.
Which metric measures interestingness of association rules?SupportConfidenceLiftAll of the AboveD) All of the AboveSupport, confidence, and lift together measure rule quality.
Which technique is used for anomaly detection?KNNDBSCANOutlier AnalysisRegressionC) Outlier AnalysisOutlier analysis detects data points deviating from normal pattern.
ETL process stands for:Extract, Transfer, LoadExtract, Transform, LoadEdit, Transform, LearnEncode, Transfer, LoadB) Extract, Transform, LoadETL is used in data warehousing to prepare data for analysis.
K-means clustering is based on:Distance between pointsDensity of pointsRegression equationsAssociation rulesA) Distance between pointsK-means partitions data into clusters by minimizing intra-cluster distance.
Data preprocessing includes:Cleaning, integration, transformationMining patternsPredictionEvaluationA) Cleaning, integration, transformationPreprocessing prepares raw data for mining.
Which is an example of predictive data mining?Market Basket AnalysisCustomer SegmentationChurn PredictionOutlier DetectionC) Churn PredictionPredictive mining forecasts outcomes such as customer churn.
OLAP stands for:Online Analytical ProcessingOffline Analytical ProcessingOnline Automated ProcessingOperational Logical Analysis ProcessA) Online Analytical ProcessingOLAP supports multidimensional queries for decision support.
Which of the following is not a similarity measure?Euclidean DistanceCosine SimilarityJaccard CoefficientApriori AlgorithmD) Apriori AlgorithmApriori is for association rule mining, not similarity.
Data cleaning is used to:Add redundancyRemove noise and inconsistenciesReduce sizeIncrease complexityB) Remove noise and inconsistenciesCleaning ensures quality and accurate mining results.
Classification is a type of:Supervised learningUnsupervised learningReinforcement learningSemi-supervised learningA) Supervised learningClassification uses labeled data to train models.
Clustering groups data based on:LabelsSimilaritiesRandomnessFunctionsB) SimilaritiesClustering finds hidden patterns by grouping similar data points.
Association rules are commonly used in:Market basket analysisRegressionClusteringTime seriesA) Market basket analysisAssociation mining finds frequent itemsets and rules like “if A then B.”
The process of finding hidden patterns in data is called:Data ProcessingData MiningData ModelingData AnalysisB) Data MiningData mining extracts patterns from large datasets.
Clustering is:Supervised learningUnsupervised learningReinforcement learningRegressionB) Unsupervised learningClustering groups unlabeled data into clusters.
Which algorithm is used for association rule mining?AprioriKNNSVMNaive BayesA) AprioriApriori identifies frequent itemsets for rule mining.
Outlier detection identifies:Common valuesNoise in dataMissing dataAverage valuesB) Noise in dataOutlier detection finds abnormal patterns.
Data warehouse is:OLTP systemHistorical data repositoryReal-time databaseTemporary memoryB) Historical data repositoryData warehouses store large-scale historical data.
Which technique is used for prediction?ClusteringRegressionAssociationClassification onlyB) RegressionRegression predicts numeric outcomes.
K-means algorithm requires:Initial centroidsDecision treesSupervised dataRulesA) Initial centroidsK-means starts with k centroids for clustering.
The lift metric is used in:Regression analysisAssociation rule miningOutlier detectionPCAB) Association rule miningLift measures strength of association rules.
Data cleaning deals with:Missing and noisy dataNormalizationIntegrationTransformationA) Missing and noisy dataCleaning ensures high-quality datasets.
Outlier detection is crucial in:RegressionFraud detectionNormalizationData cleaningB) Fraud detectionOutliers indicate fraud.
Which is unsupervised?Decision treeLogistic regressionK-means clusteringNaive BayesC) K-means clusteringK-means finds groups in unlabeled data.
Apriori algorithm is used for:ClassificationAssociation rule miningRegressionPredictionB) Association rule miningFinds frequent itemsets.
ETL in data warehouse means:Extract, Transfer, LoadExtract, Transform, LoadEncode, Transfer, LoadExtract, Translate, LoadB) Extract, Transform, LoadETL prepares data for warehouses.
Dimensionality reduction avoids:UnderfittingCurse of dimensionalityNoiseData lossB) Curse of dimensionalityToo many dimensions make data sparse.
Outlier detection helps in:Data redundancyAnomaly detectionNormalizationClassificationB) Anomaly detectionOutliers reveal unusual data behavior.
Which is a data reduction technique?PCAClusteringClassificationRegressionA) PCAPrincipal Component Analysis reduces dimensionality.
Association rule mining is used to find:Sequential patternsCorrelations among itemsClassification rulesClusteringB) Correlations among itemsAssociation mining discovers item relationships (e.g., market basket).
Which algorithm is widely used for clustering?AprioriK-MeansDecision TreeNaïve BayesB) K-MeansK-Means partitions data into k clusters.
Which is an outlier detection method?KNNIsolation ForestNaïve BayesPCAB) Isolation ForestIsolation Forest identifies anomalies effectively.
Classification in data mining is:Grouping without labelsPredicting categorical labelsFinding frequent itemsetsEstimating continuous valuesB) Predicting categorical labelsClassification predicts predefined categories.
Data cleaning involves:Removing duplicates and errorsCreating indexesOptimizing queriesNormalizing schemasA) Removing duplicates and errorsData cleaning improves dataset quality.
Which of the following is a predictive model?ClusteringRegressionAssociation rulesSummarizationB) RegressionRegression predicts continuous values.
Outlier detection helps in:Detecting fraudImproving normalizationQuery optimizationIndex creationA) Detecting fraudOutliers often represent fraudulent or rare events.
Sequential pattern mining is used to:Find frequent sequences over timeBuild classifiersReduce dimensionsNormalize datasetsA) Find frequent sequences over timeIt identifies temporal associations in data.
Which is NOT a clustering algorithm?K-MeansDBSCANDBSCANAAprioriD) AprioriApriori is used for association rule mining.
The "curse of dimensionality" refers to:Low variance in dataIncreased difficulty with many featuresData redundancyMissing valuesB) Increased difficulty with many featuresHigh-dimensional data makes analysis harder.
Which technique is used for association rule learning?Decision TreeApriori algorithmNaive BayesSVMB) Apriori algorithmApriori finds frequent itemsets for association rules.
Outlier detection is part of:ClassificationClusteringData preprocessingData cleaningD) Data cleaningOutliers are detected/removed during preprocessing.
The process of combining data from multiple sources is called:Data cleaningData integrationData reductionData transformationB) Data integrationIntegration merges data into a unified view.
Which is a dimensionality reduction technique?RegressionPCADecision TreeKNNB) PCAPrincipal Component Analysis reduces feature dimensions.
Which evaluation measure is used for clustering?PrecisionRecallSilhouette coefficientF1-scoreC) Silhouette coefficientIt measures cohesion and separation in clustering.
Data cleaning in data mining means:Gathering dataRemoving inconsistenciesVisualizationPredictionB) Removing inconsistenciesCleaning ensures high-quality data.
Which visualization is best for showing clusters?Line ChartScatter PlotPie ChartHistogramB) Scatter PlotScatter plots reveal natural groupings.