Welcome to the course
SF2526 - Numerics for data science
Numerical algorithms for data-intensive science
Data analysis of large data sets is increasing in importance as a new tool in many fields in science and technology. This course gives an introduction to the use of many efficient numerical algorithms arising in problems in the analysis of large amounts of data. We use mathematical and numerical tools to study problems and algorithms.
Want to get started: Try Quiz 0: Background and a bit and check out hw1.pdf under files.
Course contents and literature:
Before the course starts we recommend you review linear algebra summarized in background.pdf Download background.pdf and also do Quiz 0: Background and a bit.
The course consists of three blocks
- Block 1: Low-rank data algorithms
- block1.pdf Download block1.pdf which is giving page references to the course APPM 5270 Links to an external site.at Univ. Texas, in particular this pdf file. Links to an external site.
- Block 2: Numerical algorithms for clustering
- block2.pdf Download block2.pdf of which some parts are based on von Luxburgs tutorial Links to an external site.
- Block 3: Structured matrices in data analysis
- block3.pdf Download block3.pdf and GvL_chapt4.pdf + bjorck.pdf,
What do I need to do?
The examination of the course consists of correctly completing
- Homework 1: Homework 1
- Homework 2: Homework 2
- Homework 3: Homework 3
- Exam
Each homework can give you 2 bonus points to the exam if you
- Hand in correct solutions to the homework before the homework deadline
Maximum obtainable: 6 bonus points.
If you in addition complete the wiki-tasks, you can obtain wiki-bonus points. If you accumulate y wiki-points, the interval for grade A and B will be reduced by y points.
Maximum obtainable wiki bonus: 3 wiki bonus.
Course plan weekly
Lecture slides are available here (only accessible for registered students).
Week |
Contents |
1 | Lecture 1: B1: Course introduction. Low rank approximation applications (image processing, machine learning). Matrix factorizations. QR-factorization. |
Lecture 2: B1: Methods QR-factorization via Gram-Schmidt (greedy Gram-Schmidt by column elimination). Low-rank approximation via partial QR. CPQR. Best low-rank property (SVD). Eckart-Young theorem. |
|
2 | Lecture 3: B1: SVD from partial QR. Randomized QR. |
Lecture 4: B1: Interpolatory decomposition (ID), The CUR decomposition |
|
3 | Lecture 5: B2 Intro to clustering and K-means. Image processing applications. Plastic collection clustering application. (HW1 deadline). K-means. Similarity graphs. |
Lecture 6: B2: Intro to graphs and data on graphs. Connectedness. Subgraph connectedness. Unnormalized laplacian. Laplace eigenvectors and indicator vectors. Mincut. |
|
4 | Lecture 7: B2: Unnormalized Laplacian spectral clustering. K-means in spectral clustering. RatioCUT interpretation of clustering + Rayleigh Ritz theorem. |
Lecture 8: B2: RatioCut (k>2). Perturbation viewpoint. Normalized Laplacians. NCut. |
|
5 | Lecture 9: B3: Introduction to applications in signal processing. FFT, Cooley-Tukey, (HW2 deadline) |
Lecture 10: B3: Circulant matrices, diagonalization of circulant matrices, Toeplitz via extension to Circulant and FFT, Toeplitz computation via Levinson-Durbin (intro) |
|
6 | Lecture 11: B3: Levinson-Durbing cont., Cauchy like matrices |
Lecture 12 B3: Semiseperable matrices |
|
Lecture 13 B3: HSS-structure and H-matrices |
|
7 | (HW3 deadline) |
8 | Lecture 14: B3 / summary |
Related courses elsewhere:
- APPM 5270 Links to an external site.at Univ. Texas
- Numerical methods for data science (Cornell): http://www.cs.cornell.edu/courses/cs6241/2018sp Links to an external site. and http://www.cs.cornell.edu/~bindel/class/sjtu-summer18 Links to an external site.