CSIS 494: Data Mining
Very large collections of data - millions or even hundreds of millions of individual records are now being compiled into centralized data warehouses and reorganized globally by topic, allowing analysts to make use of powerful statistical and machine learning methods to examine data more comprehensively. Data mining is the art and science of using more powerful algorithms, than traditional query tools such as SQL, to extract more useful information. KDD - Knowledge Discovery in Databases is the term given to the complete process of data preparation, information extraction and analysis. This course introduces the key terminology for KDD used in industry. It covers each step of the KDD process with the emphasis on data mining extraction methods of regression, classification, dependency modeling, and clustering. The preprocessing step will focus on the use of a data warehouse and various data cleaning techniques. The analysis phase will focus on statistical methods as well as visualization tools. The CD-ROM that comes with the text provides a look at various industry tools. Students will also implement parts of data mining algorithms in C++. We will look at these techniques using both the theoretical as well as practical approach. Students will be given a number of handouts throughout the course as a supplement to the text. There will be extensive reading required in this course.