This course covers how to use databases in applications, first principles on how to scale for large data sets and how to design good data systems.
A few key topics:
— Introduction to relational data model, relational database engines, and SQL.
— How to scale systems for large data sets on servers and server clusters
— How to design good schemas based on dependencies, normal forms so we build and evolve good applications. This will include indexes, views and transactions.
The class will culminate in a hands-on programming project in SQL+Python — a key part of the course — where you will query, visualize and predict from terabytes of data on BigQuery, a popular cloud database part of Google Cloud Platform.
|Lecture 1||9/15 Tu||
Concepts: Data models, DB systems overview
[Introduction: Why databases?]
[Getting Your Google Cloud Platform Credits]
[AWS: Data Lakes and Analytics]
[AWS: What is a Data Lake?]
|Lecture 2||9/17 Th||
Concepts: Schemas, Systems, Select-From-Where
[SQL - Part I]
|9/21 Mon||See Course Info for general submission information and the regrade policy.||
[Getting started with BigQuery]
|Lecture 3||9/22 Tu||
Concepts: Joins, Set operators, Subqueries
|[SQL Deep Dive]|
|Lecture 4||9/24 Th||
SQL III, Advanced
Concepts: Grouping, Aggregations, Nested queries
|[SQL Deep Dive (Same Slides as Previous Lecture)]|
|Section 1||9/25 Fr||
9:30 AM — 10:20 AM
|[ Section 1 slides]|
|Lecture 5||9/29 Tu||
Scale: Indexing and IO Model
|Lecture 6||10/1 Th||
Sorting, Building Indices Part 1
||[ Sorting, Building Indices Slides]|
|Project 1 Due||10/2 Fri|
|Lecture 7||10/6 Tu||
Query Optimization Part 1
[ B+ Trees]
[ Query Optimization Slides]
|10/7 Wed||See Course Info for general submission information and the regrade policy.||
[Project 2 colab]
|Lecture 8||10/8 Th||Query Optimization Part 2||[ Query Optimization Slides]|
|Section 2||10/9 Fr||
9:30 AM — 10:20 AM
|[Section 2 slides]|
|Lecture 9||10/13 Tu||
Dr. Girish Baliga -- On Presto and Vertica at Uber
Dr. Theo Vassilakis (ex-CTO at Grab) -- early lead on Dremel/BigQuery, CEO of Metanautix
|Lecture 10||10/15 Th||Systems Design: Putting it all together||[ Systems Design Slides]|
|Lecture 11||10/20 Tu||Exam Review||[Midterm review]|
|Exam #1||10/22 Th||TBA|
|Project 2 Due||10/26 Mon|
|Lecture 12||10/27 Tu||Transactions||[Transactions Slides]|
|Lecture 13||10/29 Th||Transactions||[Transactions Slides]|
|Section 3||10/30 Fr||
9:30 AM — 10:20 AM
|Lecture 14||11/3 Tu||Transactions||[Transactions Slides]|
|Lecture 15||11/5 Th||Data Security(Guest Lecture)|
|Lecture 16||11/10 Tu||E/R Model and Design Theory|
|Exam #2||11/12 Th||TBA|
|Section 4||11/13 Fr||
9:30 AM — 10:20 AM
|Lecture 17||11/17 Tu||Design Theory Continued|
|Project 3 Due||11/18 Wed|
|Lecture 18||11/19 Th||Guest Lecture|
Prerequisites CS 103 and CS 107 (or equivalent)
Grading Projects: 50% (10 + 15 + 25), Exam #1: 25%, Exam #2: 25%.
We will be offering extra credit for in class participation and high quality answers to fellow student questions in piazza.
Piazza Join our Piazza to receive important announcements and get answers to your questions.
Homeworks Four homework assignments will be released bi-weekly with the solutions. The Homework assignments are completely optional and ungraded. However, we strongly encourage you to self-study the assignments as it will be a helpful resource for studying the exam material. The homework assignments reflect the exam material, so it is in your best interest to complete them thoroughly. Aside from preparing you for the exam, they will assess and reinforce your understanding of the material.
Sections There will be 4 optional discussion sections that will accompany each homework assignment. The sections will be recorded and uploaded to Canvas. The slides will be posted online.Exam Dates
*Exams will be take-home and open notes. Teaching staff will communicate details of the exam logistics on course piazza.
Late Days You are allowed a total of two late days shared between all project deadlines. You do not lose any credit when using a late day. If you run out of late days and submit after the deadline, you receive a 0. (Late days can only be applied for projects.)
Lectures Lectures occur on Tues/Thurs 4:30-5:50 p.m. via zoom. Please find the zoom link on Canvas. NOTE that while attendance is not mandatory, we will be giving out extra credit for students with insightful in-class participation.
Lecture Videos Lecture videos will be recorded and posted on Canvas.Textbook There is no required textbook, but for students who want additional resources, we recommend the following two:
Accomodations If you need an academic accommodation based on a disability, you should initiate the request with the Office of Accessible Education (OAE). The OAE will evaluate the request, recommend accommodations, and prepare a letter for faculty. Students should contact the OAE as soon as possible and at any rate in advance of assignment deadlines, since timely notice is needed to coordinate accommodations. If you need OAE accommodations for exams, please notify us at least 7 days (ONE week) prior to the exams.
We encourage students to form study groups. Students may discuss and work on homework problems in groups. However, each student must write down the solution independently, and without referring to written notes from the joint session.
It is an honor code violation to copy, refer to, or look at written or code solutions from a previous year, including but not limited to: official solutions from a previous year, solutions posted online, and solutions you or someone else may have written up in a previous year. Furthermore, it is an honor code violation to post your assignment solutions online, such as on a public git repo.
The teaching staff will be using plagiarism detection software and if we have reason to believe that you are in violation of the honor code, we will follow the university policy to report it.
Group Size The first two projects are individual only, but the third project you are allowed to work in teams of two.
Project Submissions You will submit your projects via Gradescope. Sign up for Gradescope using your Stanford email address and student ID. The course code is 93E426. Each assignment will include specific instructions regarding what files to submit.
Regrade Policy If you think that we've made a grading mistake or that the work you submitted should be regraded, submit a regrade request on Gradescope within one week of receiving your grade. Be sure that you prepare a short and convincing argument on Gradescope about why you think your work was incorrectly graded – we reserve the right to ignore your regrade request if you don't provide a justification. If you submit a regrade request, we reserve the right to regrade your entire assignment. This means that your overall score could go down.
Emma Spellman (head)
All OH will be held online. All students should sign up at https://queuestatus.com/queues/1148 .
Please read the piazza post @10 carefully on how OH works for this quarter.