CIS 4093: Special Topics – Data Mining – Sample Course Syllabus

CIS4093: Data Mining & Business Intelligence

Course Information

Required Text: Data Mining for Business IntelligenceConcepts, Techniques, and Applications in Microsoft Office Excel with XLMiner – 2nd Edition; by Galit Shmueli; Nitin R. Patel; and Peter C. Bruce

Required Software: SAS Guide & Enterprise Miner.

Cases: Select cases from Harvard Business Review

Prerequisite: CIS2200 and STA2000

Organizations either build their strategies around their analytical capabilities or they try to get better at analytics. For example, every day, Wal-Mart uploads about 20 million point-of-sale transactions to a massively parallel system with 483 processors running a centralize database. There is much for companies to learn from the data they amass so as to improve knowledge of customers, and markets, and to make many decisions. Data Mining is the process by which useful information is extracted from the large amounts of data. It involves exploring and analyzing these data sets in order to discover meaningful patterns and rules. Computer-based techniques that are used in conjunction with Data Mining define what a given company’s Business Intelligence is. And, there is an urgent need for people with skills to support the use of these techniques, and to make good decisions based on them. Some common business questions that one might address through the process are: (1) [Marketing domain] – From a large list of prospective customers, which are most likely to respond to ad campaigns? (2) [MIS domain] – How can an engineer from Google classify good and bad search results? (3) [Finance domain] – A credit manager wants to know which customers are most likely to default on payments? This course covers a variety of techniques, from simple to complex, that are used in Data Mining. There is a heavy emphasis throughout the course on analytical methods. The methods are taught by an example-based approach. In this regard, emphasis is placed on realistic problems drawn from all areas of business. There are two related software that will be used in this course – (1) SAS JMP; and (2) SAS Enterprise Miner, both from the SAS Institute.

Tentative List of Topics Covered

We will go over the topics described below in an applied way. That is, by using the various techniques we cover, we will solve many different kinds of problems. We will also look at select case studies in which some of these techniques have been or could be used to address a problem.                                       

  1. Introduction to Course with a focus on what Data mining and BI are about.
  2. The Data Mining Process
  3. Core Ideas
  4. Supervised & Unsupervised Learning
  5. Steps in Data mining
  6. Model Building
  7. Using Excel & JMP Pro
  8. Data Visualization
  9. Dimension Reduction
  10. Classification Techniques
  11. Predictive Performance 
  12. Multiple Regression
  13. K-nearest Mean
  14. Naïve Bayes
  15. Classification and Regression Trees
  16. Logistics Regression                                           
  17. Neural Nets
  18. Discriminant Analysis
  19. Association Rules
  20. Cluster Analysis
  21. Text Analytics

The topics above will be covered based on the frameworks of problems and techniques shown on the next page.

Tentative Lecture Outline

This outline is tentative.  We will go over the topics described below in an applied way. That is, in solving problems our focus will be on the topic of the day. But keep in mind that the order in which these topics are covered may change in the event of unforeseen class disruptions. We will modify this schedule as time goes by.

Date

No.

Topic Chapter

TBD

1

Introduction to course Chap One

TBD

2

Getting data, the Data Mining Process Chap Two

TBD

3

Data Visualization & Home Work 1 due Chap Three

TBD

4

Dimension Reduction Chap Four

 

   

TBD

5

Predictive Performance & Multiple Regression Chap Five (pt1) & Six

TBD

6

In class Quiz #1 & Home Work 2 due
Harvard Business Case- Harrah’s Entertainment

TBD

7

Classification Performance & K-Nearest Neighbor Chap Five (pt2) & Seven

 

   

TBD

8

K-Nearest Neighbor & Classification and Regression Tree Chap Seven & Ten

 

   

TBD

9

In class Quiz #2 & Home Work 3 due 

 

   

TBD

10

Logistic Regression Chap Eleven

TBD

11

Neural Nets Chap Twelve

TBD

12

Association Rules & Home Work 4 due Chap Thirteen

TBD

13

Cluster Analysis Chap Fourteen

TBD

14

Final Project Presentation

TBD=to be determined