Matrix norms! Ways of measuring the “size” of a matrix, or, more typically for me, the “distance” between matrices, which is the size of one minus the other. Useful as optimisation objectives, or as metrics for comparing matrices.
I write the singular value decomposition of an
The diagonal entries of
Now here is a question: Why does this notebook not connect more closely to the matrix concentration page?
1 Operator norm
A classic, defined in terms of norms on two other spaces, and using the matrix as an operator mapping between them. Define two norms, one on
1.1 Spectral norm
Operator norm with
Famous useful relation to the Frobenius norm:
2 Frobenius norm
Coincides with the
We can define the Frobenius norm in several equivalent ways,
It is useful to define the Frobenius norm in terms of the Frobenius inner product that we slyly introduced above:
Handy property for partitioned matrices:
Handy property for low-rank-style symmetric products of tall, skinny matrices
Handy property for low-rank-style products of non-symmetric tall-skinny matrices:
This norm seems to be mostly useful because it is tractable; squared Frobenius norm is simple and minimising squared Frobenius minimises basic Frobenius. And that representation as a simple trace, we can differentiate that easily.
It is not completely silly as a norm in its own right, though. Note that
Frobenius also induces a computationally expedient distance for low-rank-plus-diagonal matrices.
3 Nuclear norm
A.k.a. trace norm, Schatten 1-norm. The sum of a matrix’s singular values.
4 Schatten norms
Generalising nuclear and Frobenius norms. The Schatten
The case
5 Bregman divergence
🏗 Relation to exponential family and maximum likelihood.
Mark Reid: Meet the Bregman divergences:
If you have some abstract way of measuring the “distance” between any two points and, for any choice of distribution over points the mean point minimises the average distance to all the others, then your distance measure must be a Bregman divergence.
TBC
6 Relations
Mazumder, Hastie, and Tibshirani (2010) has a very nice little lemma (Lemma 6 in the paper) that links the nuclear norm of a matrix (i.e. the sum of its singular values) to the solution of an optimization problem involving Frobenius norms. Here is the lemma: Lemma: For any matrix
, we have If , then the minimum above is attained at a factor decomposition .
6.1 The inequality table
From Petersen and Pedersen (2012):
E. H. Rasmussen has in yet unpublished material derived and collected the following inequalities. They are collected in a table as below, assuming
is an , and
||_{1}$ $| | |_{}$ $| | athrm{B}|_{2}$ $| | hrm{B}|_{F}$ $| | m{B}|_{KF}$ | ||
---|---|---|---|---|---|---|
$ | 1$ |
|||||
m$ | $ | t{m}$ $\sqrt{ | m}$ $ | $ | ||
n$ |
$ | t{n}$ $\sqrt{ | n}$ $ | $ | ||
$ $ | qrt{n}$ $ | t{m}$ | ||||
$ $ | qrt{n}$ $ | t{m}$ $\sqrt{ | d}$ | |||
$ $ | qrt{nd}$ $ | t{md}$ |
$ | $ |
which are to be read as, e.g.
Sorry there are some weird line breaks in that table; not sure how to fix that.