CS8075-DATA WAREHOUSING AND DATA MINING Syllabus 2017 Regulation
DATA WAREHOUSING AND DATA MINING Syllabus 2017 Regulation,CS8075-DATA WAREHOUSING AND DATA MINING Syllabus 2017 Regulation
CS8075 DATA WAREHOUSING AND DATA MINING L T P C 3 0 0 3
OBJECTIVES:
- To understand data warehouse concepts, architecture, business analysis and tools
- To understand data pre-processing and data visualization techniques
- To study algorithms for finding hidden and interesting patterns in data
- To understand and apply various classification and clustering techniques using tools.
UNIT I DATA WAREHOUSING, BUSINESS ANALYSIS AND ON-LINE ANALYTICAL PROCESSING (OLAP) 9
Basic Concepts – Data Warehousing Components – Building a Data Warehouse – Database Architectures for Parallel Processing – Parallel DBMS Vendors – Multidimensional Data Model – Data Warehouse Schemas for Decision Support, Concept Hierarchies -Characteristics of OLAP Systems – Typical OLAP Operations, OLAP and OLTP.
UNIT II DATA MINING – INTRODUCTION 9
Introduction to Data Mining Systems – Knowledge Discovery Process – Data Mining Techniques – Issues – applications- Data Objects and attribute types, Statistical description of data, Data Preprocessing – Cleaning, Integration, Reduction, Transformation and discretization, Data Visualization, Data similarity and dissimilarity measures.
UNIT III DATA MINING – FREQUENT PATTERN ANALYSIS 9
Mining Frequent Patterns, Associations and Correlations – Mining Methods- Pattern Evaluation Method – Pattern Mining in Multilevel, Multi Dimensional Space – Constraint Based Frequent Pattern Mining, Classification using Frequent Patterns
UNIT IV CLASSIFICATION AND CLUSTERING 9
Decision Tree Induction – Bayesian Classification – Rule Based Classification – Classification by Back Propagation – Support Vector Machines –– Lazy Learners – Model Evaluation and Selection-Techniques to improve Classification Accuracy. Clustering Techniques – Cluster analysis-Partitioning Methods – Hierarchical Methods – Density Based Methods – Grid Based Methods – Evaluation of clustering – Clustering high dimensional data- Clustering with constraints, Outlier analysis-outlier detection methods.
UNIT V WEKA TOOL 9
Datasets – Introduction, Iris plants database, Breast cancer database, Auto imports database – Introduction to WEKA, The Explorer – Getting started, Exploring the explorer, Learning algorithms, Clustering algorithms, Association–rule learners.
TOTAL: 45 PERIODS
OUTCOMES:
Upon completion of the course, the students should be able to:
- Design a Data warehouse system and perform business analysis with OLAP tools.
- Apply suitable pre-processing and visualization techniques for data analysis
- Apply frequent pattern and association rule mining techniques for data analysis
- Apply appropriate classification and clustering techniques for data analysis
TEXT BOOK:
- Jiawei Han and Micheline Kamber, ―Data Mining Concepts and Techniques, Third Edition, Elsevier, 2012.
REFERENCES:
- Alex Berson and Stephen J.Smith, ―Data Warehousing, Data Mining & OLAP‖, Tata McGraw – Hill Edition, 35th Reprint 2016.
- K.P. Soman, Shyam Diwakar and V. Ajay, ―Insight into Data Mining Theory and Practice, Eastern Economy Edition, Prentice Hall of India, 2006.
- Ian H.Witten and Eibe Frank, ―Data Mining: Practical Machine Learning Tools and Techniques, Elsevier, Second Edition.