CSC-272: Introduction to Data Mining, Analytics, and Knowledge Discovery

Spring 2025

Department of Computer Science
Furman University
Greenville, SC 29613

Instructor: Prof. Kevin Treu

Class Meeting Time: Monday, Wednesday, Friday, 11:30 a.m.-12:20 p.m. (Section 1), 8:30-9:20 a.m. (Section 2)
Laboratories: Tuesday, 2:30-4:30 p.m. (Section 1), 4:30-6:30 p.m. (Section 2)
Class Meeting Place: Riley 106
Lab Meeting Place: Riley 203

Data is all around us. It is being gathered and stored in databases and data warehouses at unprecedented and ever growing rates, pertaining to every imaginable field of human endeavor -- science, politics, arts, commerce, literature, entertainment, medicine, social media, sports, etc. It is imperative that we develop methods of transforming this overwhelming collection of data into practical, actionable information. This can be accomplished using database query systems to extract factual, or "shallow" knowledge. In this course, however, we will investigate the theory and practice behind extracting patterns or regularities from data that represent "hidden" knowledge. This topic is referred to as data mining, and sometimes knowledge discovery or predictive analytics. Our focus will be on several powerful data mining strategies and the various algorithms used to implement those strategies. These algorithms involve machine learning, whereby computers learn important concepts from the data and express them using a variety of accessible models. We will experiment with several different strategies, including classification, numerical estimation, prediction, clustering, and association learning. These concepts will be applied to a variety of domains, including business, the arts, the social sciences, and the natural sciences, always with an emphasis on understanding and predicting human behavior. We will primarily use an open source tool called WEKA for these investigations, along with Google's n-gram viewer and some features of Excel. The techniques learned in the course will be put to use in a significant team-oriented term project.

Specific topics to be covered include:

In short, the primary objective of the course is to build a foundation for using large collections of data to better understand ourselves and the world we live in. This foundation, built upon algorithmic problem solving and special-purpose software, will provide you with the critical ability to understand -- and contribute to -- future developments and innovations in data science as well.

I am very excited about this offering of CSC-272, and am open to your suggestions about directions the course might take. Please feel free to share them with me.

Go to the Department of Computer Science Home Page