Lectures: are on Tuesday/Thursday 12:30-1:50 PST on Zoom (see Canvas for link). The Zoom speak to will certainly start at 12:15 and we will certainly have actually music playing until lecture starts at 12:30. Prof. Leskovec will certainly lecture for 60 minutes and also then host a 20 minute Q&A session. Questions can be asked in the Zoom chat. Public resources: The lecture slides and also assignments will be posted online as the course progresses. We are happy for anyone to usage these sources, but we cannot grade the job-related of any students that are not officially enrolled in the course.






What is this course about?

The course will comment on data mining and machine learning algorithms for analyzing exceptionally large amounts of data. The emphasis will be on MapReduce and Spark as tools for producing parallel algorithms that deserve to process very big amounts of information. Topics include: Frequent itemsets and also Association rules, Near Neighbor Search in High Dimensional File, Locality Sensitive Hashing (LSH), Dimensionality reduction, Recommendation Solution, Clustering, Link Analysis, Large-range Supervised Machine Learning, Data streams, Mining the for Structured File, Advertising.

Previous offerings

The previous variation of the course is CS345A: File Mining which also had a course task. CS345A has currently been split into two courses, CS246 and also CS341.

You watching: Stanford data mining online course

You have the right to access course notes and slides of previous versions of the course here:
CS246 fairtradeexpo.orgsites: CS246: Winter 2020 / CS246: Winter 2019 / CS246: Winter 2018 / CS246: Winter 2017 / CS246: Winter 2016 / CS246: Winter 2015 / CS246: Winter 2014 / CS246: Winter 2013 / CS246: Winter 2012 / CS246: Winter 2011
CS345a fairtradeexpo.orgsite: CS345a: Winter 2010


Students are expected to have actually the following background:

Knowledge of basic computer system scientific research principles and skills, at a level enough to compose a sensibly non-trivial computer system routine (e.g., CS107 or CS145 or indistinguishable are recommended). Good knowledge of Java and Python will certainly be very helpful because the majority of assignments will certainly call for the use of Spark. Familiarity with fundamental probcapacity concept (CS109 or Stat116 or equivalent is sufficient yet not necessary). Familiarity with creating rigorous proofs (at a minimum, at the level of CS 103). Familiarity with standard linear algebra (e.g., any kind of of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary). Familiarity with algorithmic analysis (e.g., CS 161 would certainly be a lot more than necessary).

The recitation sessions in the initially weeks of the class will provide a review of the expected background.

See more: Ib Biology Online Course Book 2014 Edition Pdf, Ib Biology Online Course Book 2014 Edition Oxford

Reference Text

The adhering to text is valuable, however not required. It can be downloaded for free, or purchased from Cambridge University Press. Leskovec-Rajaraman-Ullman: Mining of Massive Datacollection


Lecture slides will certainly be posted below soon before each lecture. If you wish to see slides additionally in development, describe last year"s slides, which are largely equivalent.

This schedule is topic to readjust. All deadlines are at 11:59pm PST.

Date Description Course Materials Events Deadlines
Tue March 30 Introduction; MapReduce and Spark Suggested Readings:
Thu April 1 Frequent Itemsets Mining Suggested Readings: Colab 0, Colab 1, Homework-related 1 out
Recitation: Spark tutorial
Tue April 6 Locality-Sensitive Hashing I Suggested Readings:
Thu April 8 Locality-Sensitive Hashing II Suggested Readings: Colab 2out Colab 0,Colab 1 due
Recitation: Probcapability and Proof Techniques
Recitation: Linear Algebra
Tue April 13 Clustering Suggested Readings:
Thu April 15 Dimensionality Reduction Suggested Readings: Colab 3, Homejob-related 2out Colab 2,Homework-related 1 due
Tue April 20 Recommender Systems I Suggested Readings:
Thu April 22 Recommender Systems II Suggested Readings: Colab 4 out Colab 3 due
Tue April 27 PageRank Suggested Readings:
Thu April 29 Link Spam and Review to Social Netfunctions Suggested Readings: Colab 5, Homejob-related 3out Colab 4,Homework 2 due
Tue May 4 Community Detection in Graphs Suggested Readings:
Thu May 6 Graph Representation Learning Suggested Readings: Colab 6 out Colab 5 due
Tue May 11 Large-Scale Machine Learning Suggested Readings:
Thu May 13 Deep Learning Suggested Readings: Colab 7, Homework 4out Colab 6,Homework-related 3 due
Tue May 18 Mining Data Streams I Suggested Readings:
Thu May 20 Mining File Streams II Suggested Readings: Colab 8out Colab 7 due
Tue May 25 Computational Advertising Suggested Readings:
Thu May 27 Learning with Experimentation Suggested Readings: Colab 9out Colab 8,Homejob-related 4 due
Tues Jun 1 Optimizing Submodular Functions Suggested Readings:
Thu Jun 3 Graph Neural Networks Colab 9 due