EE 380L/BME383J: DATA MINING
Spring 2009

Class times: TTh: 9:30-11am, ENS 126, Unique No. 16615(ECE)/14115(BME)
Instructor: Joydeep Ghosh. ghosh@ece.utexas.edu; www.ideal.ece.utexas.edu/~ghosh
Office: ACES 3.118, 471-8980
Office Hrs: TTh 1:30-2:30pm. Other times by appointment only.
TA info: TBD xx@ideal.ece.utexas.edu, office hrs: ACES 3.106

PREREQUISITES: (Graduate standing in Engineering, CS, Maths or Physics) OR (consent of the instructor). You are expected to know basics (undergraduate level) of probability/statistics. Knowledge of basic linear algebra  will help as well.

COURSE URL: http://www.ideal.ece.utexas.edu/courses/ee380l/

COURSE OUTLINE: The information explosion of the past few years has us drowning in data but often starved of knowledge. Many companies that gather huge amounts of electronic data have now begun applying data mining techniques to their data warehouses to discover and extract pieces of information useful for making smart business decisions. Effective data mining, as opposed to data dredging, requires an understanding of concepts from exploratory data analysis, pattern recognition, machine learning, heterogeneous data bases, parallel processing and data visualization, in addition to knowing the problem domain.

I will first give a series of lectures . While studying techniques for database representation/modeling, clustering, classification, finding associations and sequence processing, emphasis will be placed on the issues of algorithm scalability, performance, interpretability and ability to deal with messy data. You will be using the Matlab for some class exercises (separate tutorials can be arranged for students not familiar with Matlab). The last few classes will consist of student term-project presentations, followed by active discussion.

GRADING:
5+35+5 pts: pre-proposal presentation + Term paper (due May 7) + 20 min. presentation (groups of 2-3).
25 pts: Homework assignments*
25 pts: Written Exam; Thursday, March 26, in class
5 pts: Participation in discussions.
There will be no final exam.
A set of class notes and supplementary materials will be available via Blackboard.

*Late Assignment Policy: (i) you lose 10% per day late (incl. weekends/holidays). It is your responsibility to get the HW to the TA.
No piecemeal submissions (i.e. where you submit some problems on time and others later) - that becomes a logistical issue.
(ii) Once the solutions are posted, no credit can be given for the corresponding HW.

Textbook
C.M. Bishop (2006): Pattern Recognition and Machine Learning, Springer.
In addition, my notes will be available via Blackboard, and a reading list of papers will also be provided.

Some other recommended books:
1. P. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Addison-Wesley, 2005.
Some sample chapters are available at the book's website, http://www-users.cs.umn.edu/~kumar/dmbook/index.php. CS oriented
2. Hastie/Tibshirani/Friedman (2001) The Elements of Statistical Learning , Springer
Solid; stats oriented.
  3. I. H. Witten and E. Frank (2nd Ed, 2005), Data Mining. Morgan Kaufmann.
Machine learning viewpoint, closely tied to the WEKA software.
From a UT computer you can access an "e-book" version http://www.netlibrary.com/AccessProduct.aspx?ProductId=130260&ReturnLabel=lnkSearchResults&ReturnPath/Search/SearchResults.aspx&PrimedSearch=witten+frank
4, J. Han and M. Kamber (2005) Data Mining: Concepts and Techniques , 2nd Ed. Morgan Kaufmann.
Database oriented.
5. Duda/Hart/Stork (2000). Pattern Classification (2nd Ed) .
Solid again. Gives pattern recognition perspective.


Disabilities statement: "The University of Texas at Austin provides upon request appropriate academic accommodations for qualified students with disabilities. For more information, contact the Office of the Dean of Students at 471-6259, 471-4641 TTY."
The above was a mandated statement, quoted verbatim. It does not imply that this course is disabled. I wonder what TTY means.

WEBSITES:

Data Mining Web Sites

Data Mining and Knowledge Discovery Resources