Edison Thomaz

Publications   Teaching   Vita   Prospective Students



EE380L Data Mining (Spring 2020)


Instructor: Edison Thomaz (ethomaz@utexas.edu), Office Hours: Wed 4-5pm

Time and Place: MW, 9-10:30am, EER 1.516

TAs:

The TAs will be introduced at the beginning of the semester. Their primary role is to help you learn the course subjects, help with your programming problems, and assist the instructor with grading and course support. Your TAs are your primary point of contact this semester. Please get to know them. Office hours begin the week of Jan 27th.

Piazza: This term we will be using Piazza for class discussion. The system is highly catered to getting you help fast and efficiently from classmates, the TA, and myself. Rather than emailing questions to the teaching staff, I encourage you to post your questions on Piazza.

Course Description Machine learning has quickly become an integral component of many products and services that we rely on every day and anticipate using in the future, from home assistants and product recommendations to computer gaming, health tracking and autonomous vehicles. In this course we will study a variety of techniques for data mining and machine learning, which we define as the study of algorithms that learn from large quantities of data, identify patterns and make predictions on new instances. We will go over the conceptual fundamentals of some key algorithms starting from basic principles and develop a good practical understanding of how they work. Additionally, we will also cover approaches that are key in data mining, such as data exploration and dimensionality reduction.

Course Objectives

Tentative Schedule

Week Topic Readings Assignments Project (Due)
Jan 22th Course Overview
Jan 27th Data Exploration and Visualization Research directions in data wrangling: Visualizations and transformations for usable and credible data

Information Visualization and Visual Data Mining

One Dataset, Visualized 25 Ways
Jan 29th Data Processing and Labels Data Preprocessing for Supervised Leaning

Sections 1.6, 3.1-3.5 in Data Processing in Data Mining
Feb 3rd Features and Feature Selection Selection of Relevant Features in Machine Learning
Feb 5th Machine Learning Fundamentals Chap 1 in Dive into Deep Learning #1 Out
Feb 10th k-NN section 4.6.5 in An Introduction to Statistical Learning by Tbishirani and section 1.3.2 in Introduction to Machine Learning by Smola

Feb 17th Linear Regression section 3.1,3.2 and 3.3 in An Introduction to Statistical Machine Learning by Tbishirani

#1 Due
#2 Out
Feb 24th Overfitting, Regularization, Bias/Variance
Learning Curves, Cross-Validation
sections 2.1.2, 2.2.1, 2.2.2 , 5.1, 6.2 and 3.3 in An Introduction to Statistical Learning by Tbishirani and section 2.4.2 of Introduction To Machine Learning by Smola

Mar 2nd Decision Trees sections 8.1 in An Introduction to Statistical Learning by Tbishirani

#2 Due
Mar 9th Midterm Review and Exam
Mar 16th Spring Break
Mar 23rd Spring Break
Mar 30th Neural Networks Representation
Learning, Backpropagation
Chap 4, section 4.1, 4.7 in Dive into Deep Learning and Chap 2 in Neural Networks and Deep Learning

#3 Out (Apr 3rd) Proposal (Apr 3rd)
Apr 6th Convolutional Neural Networks
Recurrent Neural Networks
Chap 4, section 4.2, Chap 6, section 6.2, 6.3, 6.4, Chap 8, section 8.1, 8.2 Dive into Deep Learning

Apr 13th Dimensionality Reduction
Unsupervised Learning, k-Means
Chap 10, section 10.1, 10.2, 10.3.1 An Introduction to Statistical Learning by Tbishirani

Update (Apr 17th)
Apr 20th DBSCAN, Hierarchical Clustering
Semi-Supervised Learning
Chap 10, section 10.3.2 An Introduction to Statistical Learning by Tbishirani Chap 7, section 7.2 Mining of Massive Datasets

#3 Due (Apr 20th)
Apr 27th Team Project Presentations
May 4th Team Project Presentations Final Report (May 4th)

Pre-requisites: Working knowledge of Python required (or you must be willing to pick up rapidly as we go). Familiarity with linear algebra, (basic) calculus and probability will be useful.

Textbook: No required textbook, I will assign readings from online books and papers. Lots of good reference books and online resources exist. Ask me if interested. Here are some:

Personal Devices: You should expect to have to use your own computer for all the work required in this course. You should be able to run Python, scikit-learn and similar packages in this system. We recommend installing the Anaconda package. Use your personal devices (e.g., laptop, phones) only for our in-class work. It is unprofessional and can distract others when you use your device for other activities during class.

Class Participation: A significant portion of your course grade will be assigned according to your presence in class and your participation in class discussions. I will take attendance occasionally, either directly or through short quizzes or in-class activities. If you miss a class (other than for illness or an emergency), it is not reasonable for you to expect me to repeat just for you the material that was covered in the class that you missed. This applies both to the content of the class as well as to announcements about policies, events, deadlines, etc.

Assignments: This course will include 3 assignments that will deepen your practical experience with key concepts you will learn throughout the semester. You will have around 2 weeks to complete each one of the assignments. Late deliverables will be accepted for one week after their due date, but at a penalty of 20 points. Afterward, the assignments will receive a 0. In the interest of fairness, there will not be any exceptions to this policy.

Midterm: We will have only one exam in this course, the midterm exam. It will cover material from lectures, assignments, and any assigned readings. The exam scores may be curved if the instructor believes it is warranted. The exam will be given in class. Bring your student ID to the exam; it may be requested for proof of identity. If your work or a personal situation forces you to unexpectedly miss the exam, you should expect to get a zero on those occasions. In any other situation, you should contact the instructor beforehand.

Project

You will work on a team project in the second half of the semester. This project will provide you with an end-to-end data mining experience, going over every step of data mining process. Each team should have 4-5 students. Students will be able to select a project leveraging one or more datasets provided by the instructor, or choose a completely different dataset, which will require instructor or TA approval. The deliverables for the project are:

- Project Proposal: (A maximum 3-page document)

- Project Progress Update: (A maximum 2-page document)

- Project Final Report: (A maximum 4-page document following the ACM double-column format)

- Project Final Presentation

The last 2 weeks of the semester will be dedicated to project presentations. Each team will have 10 minutes to present its project to the class. You will turn in your presentation slides on Canvas.

Grading

Here is a breakdown of how the final grade for each student will be computed:

Grade Distribution: A (90-100), B (80-89), C (70-79), D (60-69), Fail (<60). A + or – may be given for special circumstances at the discretion of the instructor.

Grade Disputes and Corrections:

If you are dissatisfied with a grade you receive, you must submit your complaint briefly in writing or by email, along with supporting evidence or arguments, to your TA within one week of the date the exam or assignment was returned to students. For programming assignments the dispute period starts with the posting of your score on the class Canvas gradebook page. Complaints about grades received after the one-week deadline will be considered only if there are extraordinary circumstances for missing the deadline (e.g. student hospitalization). No new disputes will be accepted after 11:59AM three days before the course grade sheets must be turned in.

The grade you are given, either on an individual exam or assignment or as your final grade, is not the starting point of a negotiation. It is your grade unless a concrete grading error has been made. Do not come to see me or the TA to ask for a better grade because you want one or you "feel you deserve it". Come only if you can document a specific error in grading or in recording your scores. Errors can certainly be made in grading, especially when large classes of many students are involved. But keep in mind that the errors can be made either in your favor or not. So it's possible that if you ask to have a piece of work re-graded your grade will go down rather than up.

Remember that the most important characteristic of any grading scheme is that it be fair to everyone in the class. Keep this in mind if you're thinking of asking, for example, for more partial credit points on a problem. The important thing is not the exact number of points that were taken off for each kind of mistake. The important thing is that the number was the same for everyone. So it may not be changed once the grading is done and the exams or assignments have been returned. If you have questions or concerns about any of your grades, contact your TA first, and if not satisfied with that interaction then contact me during office hours or via email.

Class Policies

Food and Drink: No food or drinks (aside from water) are allowed in class.

Absence or Lateness Due to Illness: If you miss an exam or turn in an assignment late (or did not turn it in at all) because of illness, you are expected to provide a certified document with a diagnosis that you are indeed sick. In other words, a slip showing that you visited the UT Health Center or your personal doctor is not sufficient for this. I understand that, for example, a headache or stomachache can get in the way of you performing at your best (or at all), and these conditions might not be easily diagnosed by a doctor as needing special care. Unfortunately students have abused the good will of instructors in the past so there are no exceptions.

Absence or Lateness Due to Career Fair or Interview: No exceptions will be made if you miss an exam or turn in an assignment late (or did not turn it in at all) because you opted to attend a career fair or had a job interview. I expect that you should be able to plan ahead of time for these events; this is why we make the effort to schedule all the course’s activities and deadlines ahead of time, and detailed in this syllabus.

Absence or Lateness Due to Religious Holiday: By UT Austin policy, you must notify the instructor of your pending absence at least fourteen days prior to the date of observance of a religious holy day. If you must miss a class, an examination, a work assignment, or a project in order to observe a religious holy day, you will be given an opportunity to complete the missed work within a reasonable time after the absence. For religious holy days that fall within the first two weeks of the semester, notice should be given on the first day of the semester. The notice must be personally delivered to the instructor and signed and dated by the instructor, or sent by certified mail, return receipt requested. Email notification will be accepted if received, but a student submitting such notification must receive email confirmation from the instructor. A student who fails to complete missed work within the time allowed will be subject to the normal academic penalties.

Personal Pronoun Preference: Professional courtesy and sensitivity are especially important with respect to individuals and topics dealing with differences of race, culture, religion, politics, sexual orientation, gender, gender variance, and nationalities. Class rosters are provided to the instructor with the student’s legal name. I will gladly honor your request to address you by an alternate name or gender pronoun. Please advise me of this preference early in the semester so that I may make appropriate changes to my records.

Standard UT Austin Course Information and Policies

Academic Honor Code: You are encouraged to discuss assignments with classmates, but anything submitted must reflect your own, original work. If in doubt, ask the instructor. Plagiarism and similar conduct represents a serious violation of UT's Honor Code and standards of conduct.

Students who violate University rules on academic dishonesty are subject to severe disciplinary penalties, such as automatically failing the course and potentially being dismissed from the University. **PLEASE** do not take the risk. We are REQUIRED to automatically report any suspected case to central administration for investigation and disciplinary hearings. Honor code violations ultimately harm yourself as well as other students, and the integrity of the University.

Academic honesty is strictly enforced. For more information, see the Student Judicial Services site.

Notice about students with disabilities: The University of Texas at Austin provides appropriate accommodations for qualified students with disabilities. To determine if you qualify, please contact the Dean of Students at 512-471-6529 or UT Services for Students with Disabilities. If they certify your needs, we will work with you to make appropriate arrangements.

Emergency Preparedness: Any students requiring assistance in evacuation must inform the instructor in writing of their needs during the first week of classes.

Coping with stress and personal hardships: The Counseling and Mental Health Center offers a variety of services for students, including both individual counselling and groups and classes, to provide support and assistance for anyone coping with difficult issues in their personal lives. As mentioned above, life brings unexpected surprises to all of us. If you are facing any personal difficulties in coping with challenges facing you, definitely consider the various services offered and do not be shy to take advantage of them if they might help. These services exist to be used.

Electronic mail Notification Policy: In this course e-mail, Canvas and Piazza will be used as a means of communication with students. You will be responsible for checking your e-mail regularly for class work and announcements. If you are an employee of the University, your e-mail address in Canvas is your employee address.

The University has an official e-mail student notification policy. It is the student's responsibility to keep the University informed as to changes in his or her e-mail address. Students are expected to check e-mail on a frequent and regular basis in order to stay current with University-related communications, recognizing that certain communications may be time-critical.


Edison Thomaz © 2020