Welcome to the course
SF2526 - Numerics for data science
Numerical algorithms for data-intensive science
Data analysis of large data sets is increasing in importance as a new tool in many fields in science and technology. This course gives an introduction to the use of many efficient numerical algorithms arising in problems in the analysis of large amounts of data. We use mathematical and numerical tools to study problems and algorithms.
Want to get started: Try Quiz 0: Background and a bit and check out hw1.pdf under files.
Zoom link for lectures in announcement Lecture material information
You need to register to the exam. See dates for registration here: https://www.kth.se/en/student/schema/tenta-och-omtentaperioder-samt-anmalningstider-1.278515
Course contents and literature:
Before the course starts we recommend you review linear algebra summarized in background.pdf Download background.pdf and also do Quiz 0: Background and a bit.
The course consists of three blocks (the PDF-files are preliminary and will be updated)
- Block 1: Low-rank data algorithms
- block1.pdf Download block1.pdf which is giving page references to the course APPM 5270 Links to an external site.at Univ. Texas, in particular this pdf file. Links to an external site.also GVL_SVD.pdf.
- Block 2: Numerical algorithms for clustering
- block2.pdf Download block2.pdf of which some parts are based on von Luxburgs tutorial Links to an external site.
- Block 3: Structured matrices in data analysis
- block3.pdf Download block3.pdf and GvL_chapt4.pdf + bjorck.pdf + PGM_chapter5_fast_direct_solvers.pdf
What do I need to do?
The examination of the course consists of correctly completing
- Homework 1: Homework 1 the data files are available here.
- Homework 2: Homework 2
- Homework 3: Homework 3
- Exam
The course has mandatory homeworks, and each homework can give you bonus points to the exam. If you in addition complete the course training tasks you can get additional bonus. See further description on the page for Homework / bonus points rules
Course training area
Throughout the course you have the possibility to train the material and get teacher feedback. This is done on the course wiki. The pedagogical idea is described here.
- Instructions for the wiki (see also the description in the homework PDF-files)
- block 1 training area
- block 2 training area
- block 3 training area
The work in the course training area is mainly intended for your own learning, and it can also lead to additional bonus points as described on Homework / bonus points rules
Course plan weekly
Lecture slides are available here (only accessible for registered students).
Zoom link to lectures can be found in this announcement: Lecture material information (only accessible for registered students)
Week |
Contents |
1 | Lecture 1: B1: Course introduction. Low rank approximation applications (image processing, machine learning). Matrix factorizations. QR-factorization. |
Lecture 1.5 (Video): B1: QR-factorization and lin. least squares |
|
Lecture 2: B1: Methods QR-factorization via Gram-Schmidt (greedy Gram-Schmidt by column elimination). Low-rank approximation via partial QR. CPQR. Best low-rank property (SVD) |
|
Lecture 2.5 (video): B1: Best low-rank property of SVD Eckart-Young theorem. |
|
2 | Lecture 3: B1: SVD from partial QR. Randomized SVD. |
Lecture 3.5: (video) B1: Single pass improvement of randomized SVD |
|
Lecture 4: B1: Interpretability and the interpolatory decomposition (ID), row ID, column ID, two-sided ID |
|
Lecture 4.5 (video): B1: CUR decomposition |
|
3 | Lecture 5: B2 Intro to clustering and K-means. Image processing applications. Plastic collection clustering application. K-means. Similarity graphs. |
4 | Lecture 6: B2: Intro to graphs and data on graphs. Connectedness. Subgraph connectedness. Unnormalized laplacian. Eigenvectors of the Laplacian and indicator vectors. |
Lecture 6.5 (video) B2: Cut and MinCut |
|
Lecture 7: B2: Unnormalized Laplacian spectral clustering. K-means in spectral clustering. RatioCUT interpretation of clustering + Rayleigh Ritz theorem. |
|
5 | Lecture 8: B2: RatioCut (k>2). Perturbation viewpoint. Normalized Laplacians. NCut. |
Lecture 9: B3: Introduction to applications in signal processing. FFT, Cooley-Tukey, |
|
6 | Lecture 10: B3: Toeplitz-like matrices. Action and inverse action for circulant matrices via diagonalization and FFT |
Lecture 10.5 (video) B3: Toeplitz matrix action via extending to circulant matrices |
|
Lecture 11: B3: Durbin + Levinson-Durbing cont |
|
7 | Lecture 12 B3: Trench + intro to Semiseperable matrices. Recorded version: on this page |
Lecture 13 B3: semiseparable + HODLR matrix algorithm / reserve lecture |
|
Lecture 14: Summary video (only for enrolled students) |
Related courses elsewhere:
- APPM 5270 Links to an external site.at Univ. Texas
- Numerical methods for data science (Cornell): http://www.cs.cornell.edu/courses/cs6241/2018sp Links to an external site. and http://www.cs.cornell.edu/~bindel/class/sjtu-summer18 Links to an external site.