Norms: How to Measure Size
By darcy on Mar 01, 2007
At times it is useful to summarize a set of values, say a vector of real numbers, as a single number representing the set's size. For example, distilling benchmark subcomponent scores into an overall score. One way to do this is to use a norm. Mathematically, a norm maps from a vector V of a given number of elements to a real number length such that the following properties hold:
- norm(V) ≥ 0 for all V and norm(V) = 0 if and only if V = 0 (positive definiteness)
- norm(c · V) = abs(c) · norm(V) for real constant c (homogeneity)
- norm(U + V) ≤ norm(U) + norm(V) (the triangle inequality)
There are a few commonly used norms:
- 1-norm: sum of the absolute values (Manhattan length)
- 2-norm: square root of the sum of the squares (Euclidean length)
- ∞-norm: largest absolute value
The first two norms are instances of p-norms. A p-norm adds up the result of raising the absolute value of each vector component to the pth power (squaring, or cubing, etc.) and then takes the pth root of the sum. The ∞-norm is the limit as p goes to infinity.
Given multiple possible norms, which one should be used? The 2-norm is often easier to work with since it is a differentiable function of the vector components, unlike the 1-norm and ∞-norm. On the other hand, the ∞-norm captures the worst-case behavior. Sometimes one norm is easier to compute than the others.
Another norm might make an error analysis more tractable.
For vectors, in some sense it doesn't matter which norm is used because any two norms, norma and normb, are equivalent in the following sense, there are constants c1 and c2 such that
c1 · norma(V) ≤ normb(V) ≤ c2 · norma(V)
This means that if one norm is tending toward zero, all other norms are tending toward zero too. For example, commonly in numerical linear algebra there is an iterative process that terminates once the norm of the error is small enough. Concretely, for vectors of size n, the common norms are related as follows:
norm2(V) ≤ norm1(V) ≤ sqrt(n) · norm2(V)
norm∞(V) ≤ norm2(V) ≤ sqrt(n) · norm∞(V)
norm∞(V) ≤ norm1(V) ≤ n · norm∞(V)
So to guarantee that the 1-norm is less than epsilon, it is enough to show that 2-norm is less than epsilon/sqrt(n).
However, in other ways the different norms are not equivalent; the norms can give different answers on the relative size of different vectors. Consider the three vectors A, B, and C:
A = [5, 0, 0]
B = [1, 3, 4]
C = [8/3, 8/3, 3]
Vector 1-norm 2-norm ∞-norm A 5 5 5 B 8 ≈5.1 4 C ≈8.3 ≈4.8 3 Biggest Vector C B A
Each vector is considered the largest under one of the norms.
I've found the notion of norms to be useful in many different contexts. The performance differences between quicksort and mergesort can be described as quicksort having a better 1-norm but mergesort having a better ∞-norm. Buying more insurance coverage raises the 1-norm of your costs, but lowers your ∞-norm. A more conservative evaluation tends to focus on the worst-case outcome and thus favors something like the ∞-norm. For example, in the math library the relative size of the error at any location must be less than the stated number of ulps (units in the last place). It is not good enough to have a low average error, but a few locations, or even one location, with an very inaccurate result. During software development, risk assessments evolve with the release life cycle. A change that is welcome early in the release may be rejected as too risky a few weeks before shipping; one way to view this phenomena is that a larger value of p is being used to compute risk assessments later in the release.References
Applied Numerical Linear Algebra, James W. Demmel
Matrix Computations, Gene H. Golub and Charles F Van Loan
Numerical Linear Algebra, Lloyd N. Trefethen and David Bau, III