CIS/STA 3920: Sample course syllabus

Title Data Mining for Business Analytics
Description

Data Mining is the process by which useful information is extracted from large amounts of data. This course is designed to provide students with the necessary tools and techniques to perform data mining and business analytics. This course is intended as an introductory module targeted at individuals who plan to work with data (modeling, data management) as well as towards those who will work with data scientists. While the course will primarily focus on modeling and evaluation, it will also include data preparation and examination. Modeling techniques covered include dimension reduction, regression methods, decision trees, clustering, and other ad-hoc methods. Emphasis will be placed on the entire context surrounding data mining beginning with the understanding of business problem and how it translates to data processing, modeling, evaluation and deployment.  Students will be expected to implement these techniques on big-data case studies throughout the semester.

 

Prerequisites CIS 2200 and STA 2000
Learning Goals At the completion of this course, students will be able to:

  • Identify and apply statistical and computational techniques underlying data mining and business analytics to help business decision making
  • Identify and use appropriate tools in developing data mining solutions
  • Interpret results in terms of original business problems that led to the collection of the data

 

Grades

Quiz #1 – 15-20%

Quiz #2 – 20-25%

Class Exercises & Home Work Questions: 15%

Project – 40-50%

 

Textbooks

“Moneyball: The Art of Winning an Unfair Game,” by Michael Lewis, W. W. Norton & Company, available @ amazon.com.

“Competing on Analytics: The New Science of Winning,” by Thomas H. Davenport and Jeanne G. Harris, available @ amazon.com.

“Super Crunchers: Why Thinking-by-Numbers Is the New Way to Be Smart,” by Ian Ayres, Bantam Books, available @ amazon.com

 

Topics

Introduction to Course

Using SAS Enterprise Miner and/or JMP Software

Review of Important Statistical Concepts – CI; ANOVA, Linear Regression

Supervised & Unsupervised Learning Techniques

Data Visualization and Reduction

Classification and Predictive Performance Measures

Multiple Regression

Multiple Regression

K-Nearest Neighbor Techniques

Naïve Bayes & C.A.R.T.

Classification and Regression Trees (C.A.R.T.)

Logistic Regression & HBR Case

Discriminant Analysis

Association Rules (Affinity Analysis)