st source and homepage

Notes on Numerical Differentiation

This is a gentle, progressive, introduction to numerically differentiation. It slowly ramps you up to an application of the Savitzky–Golay filter with a straight forward learning progression, without a completely rigorous nomenclature.

We use the term noise very loosely.

We can differentiate analytic functions, like \( \frac{\mathrm{d}}{\mathrm{d}x} \sin(x) = \cos(x)\), but to say that we differentiate numerical data is not the same, it's just an analogy. Most measured data is noisy and, in our case, may be of unknown origin, and so can't in general correspond with an analytic function. We keep the idea of differentiating analytic functions as an unattainable ideal, sort of. Really, numerical differentiation is whatever you define it to be. We don't know what the continuum of values are between the sampled data points that we are given, so we imagine, we guess. We don't even know the nature of the noise; is it random or biased some way. The bias can be from numerical conversion, like when converting binary float data to a ASCII decimal data in like in the axa-driver-telematics-analysis kaggle contest. The people running this contest unwittingly added a bias to the raw data by rounding the floating point numbers when converting the data to an ASCII text format with fixed decimal place, which adds an aliasing, stair step, "noise" to the data. This stair step noise makes differentiation require some type of "smoothing" to remove the added steps, which would blow-up when differentiated, and has the net effect of decreasing the fidelity of the raw data and all derived data. In addition to this biased stair step noise, there will be others types of noise (errors) in the data, that will have varying sizes.

We start by using "The Method of Undetermined Coefficients" we find the forms we need to calculate derivatives of functions that are represented as a sequence of variable values at fixed intervals.

We will calculate derivatives with the same sample rate as given. We will test the derivative values by comparing calculations with varying number of points as input.

Derivative Forward 2 Points

We start with an easy case which we already know the answer to: \[ f^\prime(x) = A f(x) + B f(x+h) \label{df2_1} \] where \(f(x)\) is our function of \(x\), \(f^\prime (x)\) is the derivative with respect to \(x\), \(h\) is our fixed change in \(x\), and \(A\), and \(B\), are constants that we must find that are dependent on \(h\).

From Taylor series we have \[ f(x+h) = f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \dots \label{taylor} \] where \(x\) is the constant value we expand about, and \(x+h\) is the variable that we plugged in. Note, this is not the usual form of a Taylor series function expansion. You need to confirm this from a common form like \[ f(x) = \sum_{n=0}^\infty \frac{f^n(a)}{n!} (x - a)^n \mathrm{,} \] where \(a\) is a constant and \(f^n(a)\) is the \(n\)-th derivative of \(f(x)\) with respect to \(x\) evaluated at \(x \to a\).

Plugging [\(\ref{taylor}\)] into [\(\ref{df2_1}\)] gives \[ f^\prime(x) = A f(x) + B \left[f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \dots \right] \label{df2_2}, \] which we rearrange to give \[ 0 = \left[ A + B \right] f(x) + \left[ B h - 1 \right] f^\prime(x) + B \frac{h^2}{2} f^{\prime\prime}(x) + \dots \label{df2_3} \] We intend to get an expression that we'll call an approximation for the first derivative of our data. But this only works well if the sample rate is high enough (\(h\) is small) and the noise is low enough. We assert that successive derivatives of the function are unrelated and higher order Taylor terms tend to get smaller. Any who, we next solve \begin{align} 0 & = A + B \\ 1 & = 0 + B h \end{align} so \[ A = - \frac{1}{h} \quad B = \frac{1}{h} \] and so [\(\ref{df2_1}\)] becomes \[ f^\prime(x) = \frac{f(x+h) - f(x)}{h} \label{f2_result}\] the two point forward differencing derivative approximation. We can use the higher order terms to get ideas about how approximate this is, that is assuming that our data is somehow representative of an analytic function.

Lets try a different way

We require that this work perfectly for the linear function \[ f(x) = ax + b \label{df2_5}\] giving \[ f^\prime(x) = a \mathrm{.} \label{df2_6}\] Plugging [\(\ref{df2_5}\)] into [\(\ref{df2_1}\)] and using [\(\ref{df2_6}\)] gives \[ a = A a x + A b + B a x + B a h + B b \] which we rewrite as \[ a = A b + B a h + B b + x \left( A a + B a \right) \label{df2_7} \mathrm{.} \] We require that [\(\ref{df2_7}\)] be valid for all values of \(x\), therefore we have the set of equations \begin{align} a = & A b + B (a h + b) \\ 0 = & A a + B a \mathrm{,}\\ \end{align} giving us \(A = -\frac{1}{h}\) and \(B = \frac{1}{h}\) as before.

We note that the method that uses Taylor series appears simpler by not introducing as many superfluous constants (\(a\) and \(b\)). We find that when finding expressions with more terms the Taylor series method is quicker and gives the same result. The proof is similar to proving Taylor series, so we guess. So from here out we'll use the Taylor series method and than check the result by seeing that a polynomial function of order \(N-1\), where \(N\) is the number of function evaluations, gives the exact derivative. In the above case \(N = 2\).

Derivative Forward 3 Points

\[ f^\prime(x) = A f(x) + B f(x+h) + C f(x+2h) \label{f3_1} \] From [\(\ref{taylor}\)] with \(h \to 2 h\) we have \[ f(x+2 h) = f(x) + 2 h f^\prime(x) + 2 h^2 f^{\prime\prime}(x) + \dots \label{taylor2} \] and so \begin{align} f^\prime(x) = & A f(x) + \nonumber \\ & B \left[f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \dots \right] + \nonumber \\ & C \left[ f(x) + 2 h f^\prime(x) + 2 h^2 f^{\prime\prime}(x) + \dots \right] \mathrm{,} \\ \label{df3_2} \end{align} and so \begin{align} 0 = & \left[A + B + C\right] f(x) + \nonumber \\ & \left[-1 + B h + 2 C h \right] f^\prime(x) + \nonumber \\ & \left[0 + B \frac{h^2}{2} + 2 C h^2 \right] f^{\prime\prime}(x) + \nonumber \\ & \dots \end{align}

Again, we assert that successive derivatives of the function are unrelated and higher order Taylor terms tend to get smaller, gives us the following 3 equations needed to solve for the 3 unknown constants \(A\), \(B\), and \(C\): \begin{align} 0 & = A + B + C \nonumber\\ 1 & = 0 + B h + 2 C h \nonumber\\ 0 & = 0 + B \frac{h^2}{2} + 2 C h^2 \end{align} giving \[ A = - \frac{3}{2h} \quad B = \frac{4}{2h} \quad C = - \frac{1}{2h} \mathrm{.} \label{ABC_f3}\] Plugging [\(\ref{ABC_f3}\)] into [\(\ref{f3_1}\)] gives \[ f^\prime(x) = \frac{ -3 f(x) + 4 f(x + h) - f(x + 2h)}{2 h} \mathrm{.} \label{f3_form}\]

As a test of [\(\ref{f3_form}\)] we see that if \( f(x) = ax^2 + bx + c \) we get \(f^\prime(x) = 2ax + b\), so it works as expected for a quadratic function.

Derivative Forward 4 Points

We are looking for the constants \(A\), \(B\), \(C\) and \(D\) in \[ f^\prime(x) = A f(x) + B f(x+h) + C f(x+2h) + D f(x + 3h) \mathrm{.} \label{f4_1} \] Applying Taylor series to the right-hand-side of [\(\ref{f4_1}\)] gives \begin{align} f^\prime(x) = & A f(x) + \nonumber \\ & B \left[ f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \frac{h^3}{6} f^{\prime\prime\prime}(x) + \dots \right] + \nonumber \\ & C \left[ f(x) + 2 h f^\prime(x) + 2 h^2 f^{\prime\prime}(x) + \frac{4 h^3}{3} f^{\prime\prime\prime}(x) + \dots \right] + \nonumber \\ & D \left[ f(x) + 3 h f^\prime(x) + \frac{9 h^2}{2} f^{\prime\prime}(x) + \frac{9 h^3}{2} f^{\prime\prime\prime}(x) + \dots \right] \\ \end{align} Requiring that the coefficient of the terms of each derivative of \(f(x)\) be zero gives us \begin{align} 0 = & A + B + C + D \nonumber \\ \frac{1}{h} = & 0 + B + 2 C + 3 D \nonumber \\ 0 = & 0 + B + 4 C + 9 D \nonumber \\ 0 = & 0 + B + 8\, C + 27\, D \mathrm{.}\\ \end{align} Using axiom one-liner

  m := matrix([[1,1,1,1],[0,1,2,3],[0,1,4,9],[0,1,8,27]])
  inverse(m)*vector([0,1/h,0,0])
we get \[ A = - \frac{11}{6 h} \quad B = \frac{3}{h} \quad C = - \frac{3}{2 h} \quad D = \frac{1}{3 h} \] giving the 4 point forward differencing derivative approximation: \[ f^\prime(x) = - \frac{11}{6 h} f(x) + \frac{3}{h} f(x + h) - \frac{3}{2 h} f(x + 2 h) + \frac{1}{3 h} f(x + 3 h) \mathrm{.} \label{f4_r} \] Testing with general cubic function, \[ f(x) = a x^3 + b x^2 + c x + d \] using axiom code to avoid using lots of paper,
  function(a * x**3 + b * x**2 + c * x + d, f, x)
  function(-11*f(x)/(6*h)+3*f(x+h)/h-3*f(x+2*h)/(2*h)+f(x+3*h)/(3*h), fprime, x)
  fprime(x)
gives, \[ f^\prime(x) = 3 a x^2 + 2 b x + c \mathrm{,}\] so we see it works perfectly for \(f(x)\) being any cubic polynomials in \(x\).

Derivative Forward 5 Points

We are looking for the constants \(A\), \(B\), \(C\), \(D\), and \(E\) in \[ f^\prime(x) = A f(x) + B f(x+h) + C f(x+2h) + D f(x + 3h) + E f(x + 4h) \mathrm{.} \label{f5_1} \] Applying Taylor series to the right-hand-side of [\(\ref{f5_1}\)] gives \begin{align} f^\prime(x) = & A f(x) + \nonumber \\ & B \left[ f(x) + h f^1(x) + \frac{h^2}{2} f^2(x) + \frac{h^3}{6} f^3(x) + \frac{h^4}{24} f^4(x) + \dots \right] + \nonumber \\ & C \left[ f(x) + 2 h f^1(x) + 2 h^2 f^2(x) + \frac{4 h^3}{3} f^3(x) + \frac{2 h^4}{3} f^4(x) + \dots \right] + \nonumber \\ & D \left[ f(x) + 3 h f^1(x) + \frac{9 h^2}{2} f^2(x) + \frac{9 h^3}{2} f^3(x) + \frac{27 h^4}{8} f^4(x) + \dots \right] + \nonumber \\ & E \left[ f(x) + 4 h f^1(x) + 8 h^2 f^2(x) + \frac{32 h^3}{3} f^3(x) + \frac{32 h^4}{3} f^4(x) + \dots \right] \mathrm{.} \\ \end{align} Requiring that the coefficient of the terms of each derivative of \(f(x)\) be zero gives us \begin{align} 0 = & A + B + C + D + E \nonumber \\ \frac{1}{h} = & 0 + B + 2 C + 3 D + 4 E \nonumber \\ 0 = & 0 + B + 4 C + 9 D + 16 E \nonumber \\ 0 = & 0 + B + 8 C + 27 D + 64 E \nonumber \\ 0 = & 0 + B + 16 C + 81 D + 256 E \mathrm{.}\\ \end{align} Using axiom one-liner

  m := matrix([[1,1,1,1,1],[0,1,2,3,4],[0,1,4,9,16],[0,1,8,27,64],[0,1,16,81,256]])
  inverse(m)*vector([0,1/h,0,0,0])
we get \[ A = - \frac{25}{12 h} \quad B = \frac{4}{h} \quad C = - \frac{3}{h} \quad D = \frac{4}{3 h} \quad E = - \frac{1}{4 h} \label{df5_cof} \] giving the 5 point forward differencing derivative approximation: \[ f^\prime(x) = - \frac{25}{12 h} f(x) + \frac{4}{h} f(x + h) - \frac{3}{h} f(x + 2 h) + \frac{4}{3 h} f(x + 3 h) - \frac{1}{4 h} f(x + 4 h) \mathrm{.} \label{f5_r} \] Testing with general forth order polynomial function, \[ f(x) = a x^4 + b x^3 + c x^2 + d x + e \] using axiom code to avoid using lots of paper,
  function(a * x**4 + b * x**3 + c * x**2 + d * x + e, f, x)
  function(-25*f(x)/(12*h)+4*f(x+h)/h-3*f(x+2*h)/h+4*f(x+3*h)/(3*h)-f(x+4*h)/(4*h), fprime, x)
  fprime(x)
gives, \[ f^\prime(x) = 4 a x^3 + 3 b x^2 + 2 c x + d \mathrm{,}\] so we see it works perfectly for \(f(x)\) being any forth order polynomials in \(x\).

Derivative Back 2 Points

This time we will evaluate \(f(x)\) at 2 points using \[ f^\prime(x) = A f(x - h) + B f(x) \mathrm{,}\] but this is just [\(\ref{df2_1}\)] with \(h \to - h\), so with [\(\ref{f2_result}\)] we have \[ f^\prime(x) = \frac{f(x) - f(x - h)}{h} \mathrm{.} \]

Derivative Back 3 Points

From [\ref{f3_form}] with \(h \to - h\) we get \[ f^\prime(x) = \frac{ 3 f(x) - 4 f(x - h) + f(x - 2h)}{2 h} \mathrm{.} \]

Derivative Back 4 Points

From [\(\ref{f4_r}\)] with \(h \to -h\) we get \[ f^\prime(x) = \frac{11}{6 h} f(x) - \frac{3}{h} f(x - h) + \frac{3}{2 h} f(x - 2 h) - \frac{1}{3 h} f(x - 3 h) \mathrm{.} \]

Derivative Centered 3 Points

We want the constants \(A\), \(B\), and \(C\) such that \[ f^\prime(x) = A f(x - h) + B f(x) + C f(x + h) \mathrm{.} \label{c3_1} \] We start with the Taylor series expanding the right hand side of [\(\ref{c3_1}\)] giving \begin{align} f^\prime(x) = & A \left[ f(x) - h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \dots \right] + \nonumber \\ & B \left[ f(x) \right] + \nonumber \\ & C \left[ f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \dots \right] \mathrm{.}\\ \end{align} Requiring that the coefficients of the terms of each derivative of \(f(x)\) be zero gives \begin{align} 0 = & A + B + C \nonumber \\ \frac{1}{h} = & - A + 0 B + C \nonumber \\ 0 = & A + 0B + C \mathrm{.} \label{c3_3} \\ \end{align} Using axiom to solve,

  inverse(matrix([[1,1,1],[-1,0,1],[1,0,1]]))*vector([0,1/h,0])
or by inspection we have \[ A = - \frac{1}{2 h} \quad B = 0 \quad C = \frac{1}{2 h} \] giving \[ f^\prime(x) = \frac{ f(x + h) - f(x - h)}{ 2 h } \]

Derivative 4 Points, Forward 1 From Center

\[ f^\prime(x) = A f(x - h) + B f(x) + C f(x + h) + D f(x + 2h) \] \begin{align} f^\prime(x) = & A \left[ f(x) - h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) - \frac{h^3}{6} f^{\prime\prime\prime}(x) + \dots \right] + \nonumber \\ & B \left[ f(x) \right] + \nonumber \\ & C \left[ f(x) + h f^\prime(x) + \frac{h^2}{2} f^{\prime\prime}(x) + \frac{h^3}{6} f^{\prime\prime\prime}(x) + \dots \right] + \nonumber \\ & D \left[ f(x) + 2 h f^\prime(x) + 2 h^2 f^{\prime\prime}(x) + \frac{4 h^3}{3} f^{\prime\prime\prime}(x) + \dots \right] \\ \end{align} \begin{align} 0 = & A + B + C + D \nonumber \\ \frac{1}{h} = & - A + 0 B + C + 2 D \nonumber \\ 0 = & A + 0 B + C + 4 D \nonumber \\ 0 = & -A + 0B + C + 8 D \\ \end{align}

  inverse(matrix([[1,1,1,1],[-1,0,1,2],[1,0,1,4],[-1,0,1,8]]))*vector([0,1/h,0,0])
\[ A = - \frac{1}{3 h} \quad B = - \frac{1}{2 h} \quad C = \frac{1}{h} \quad D = - \frac{1}{6 h} \] \[ f^\prime(x) = - \frac{1}{3 h} f(x - h) - \frac{1}{2 h} f(x) + \frac{1}{h} f(x + h) - \frac{1}{6 h} f(x + 2 h) \label{f4+1_r} \] Testing
  function(a * x**3 + b * x**2 + c * x + d, f, x)
  function(-f(x-h)/(3*h)-f(x)/(2*h)+f(x+h)/h-f(x+2*h)/(6*h), fprime, x)
  fprime(x)
gives \[ f^\prime(x) = 3 a x^2 + 2 b x + c = \frac{\mathrm{d}}{\mathrm{d}x} \left(ax^3+bx^2+cx+d\right) \mathrm{,} \] where of course we failed on the first try, which is why we need to do this check.

Derivative 4 Points, Back 1 From Center

From [\(\ref{f4+1_r}\)] with \(h \to -h\) we get \[ f^\prime(x) = + \frac{1}{3 h} f(x + h) + \frac{1}{2 h} f(x) - \frac{1}{h} f(x - h) + \frac{1}{6 h} f(x - 2 h) \]

Derivative 5 Points, Centered

\[ f^\prime(x) = A f(x - 2 h) + B f(x- h) + C f(x) + D f(x + h) + E f(x + 2 h) \] \begin{align} f^\prime(x) = & A \left[ f(x) - 2 h f^1(x) + 2 h^2 f^2(x) - \frac{4 h^3}{3} f^3(x) + \frac{2 h^4}{3} f^4(x) - \dots \right] + \nonumber \\ & B \left[ f(x) - h f^1(x) + \frac{h^2}{2} f^2(x) - \frac{h^3}{6} f^3(x) + \frac{h^4}{24} f^4(x) - \dots \right] + \nonumber \\ & C \left[ f(x) \right] + \nonumber \\ & D \left[ f(x) + h f^1(x) + \frac{h^2}{2} f^2(x) + \frac{h^3}{6} f^3(x) + \frac{h^4}{24} f^4(x) + \dots \right] + \nonumber \\ & E \left[ f(x) + 2 h f^1(x) + 2 h^2 f^2(x) + \frac{4 h^3}{3} f^3(x) + \frac{2 h^4}{3} f^4(x) + \dots \right] \\ \end{align} where we switched to using \(f^n(x)\), \(n = 1, 2, 3, 4\), notation instead of the prime notation for the derivatives. \begin{align} 0 = & A + B + C + D + E \nonumber \\ \frac{1}{h} = & - 2 A - B + 0 C + D + 2 E \nonumber \\ 0 = & 4 A + B + 0 C + D + 4 E \nonumber \\ 0 = & - 8 A - B + 0 C + D + 8 E \nonumber \\ 0 = & 16 A + B + 0 C + D + 16 E \\ \end{align}

  m := matrix([[1,1,1,1,1],[-2,-1,0,1,2],[4,1,0,1,4],[-8,-1,0,1,8],[16,1,0,1,16]])
  inverse(m)*vector([0,1/h,0,0,0])
\[ A = \frac{1}{12 h} \quad B = - \frac{2}{3 h} \quad C = 0 \quad D = \frac{2}{3 h} \quad E = - \frac{1}{12 h} \] \[ f^\prime(x) = \frac{1}{12 h} f(x - 2 h) - \frac{2}{3 h} f(x - h) + \frac{2}{3 h} f(x + h) - \frac{1}{12 h} f(x + 2 h) \label{dc5_result} \] Testing
  function(a * x**4 + b * x**3 + c * x**2 + d * x + e, f, x)
  function(f(x-2*h)/(12*h)-2*f(x-h)/(3*h)+2*f(x+h)/(3*h)-f(x+2*h)/(12*h), fprime, x)
  fprime(x)
gives \[ f^\prime(x) = 4 a x^3 + 3 b x^2 + 2 c x + d = \frac{\mathrm{d}}{\mathrm{d}x} \left(ax^4+bx^3+cx^2+dx+e\right) \mathrm{.} \]

Higher Derivatives

If we think of the first derivative as being exact for some particular order polynomial, applying the first derivative twice (or more) will also be exact too, but with just a little more arithmetic round off when compared to making a direct formula.

We need to be careful how we rely on values farther back and forward when making higher derivatives.

Application

Since our data is given at a fixed rate we don't need to worry about dividing by zero. We do however need to be concerned about gaps in the data, which will look like steps in the function and delta function like behavior in the derivative. We start by just looking and seeing.

We looked and saw what looks like:

Differentiation Using Least Squares Fit Polynomials

First we change variables to scaled (dimensionless) variables defined by \[ \chi \equiv \frac{x - \delta}{h} \] where \(\delta\) is the value of \(x\) where we will get the derivative of \(f(x)\) with respect to \(x\) and \(h\) is the constant change in \(x\) between samples due to the constant sample rate. So in the scaled variable \(\chi\) the difference between samples is 1 and we will compute the derivative at \(\chi\) equals zero. So for convenience we define the function \(F(\chi)\) by \[ F(\chi) = f(x) = f(\delta + h \chi) \mathrm{.} \] Like above, we are using \(f(x) = f(\delta + h \chi)\) to refer to the actual data values. \(F(\chi)\) has the same values as \(f(x)\) but is a function of \(\chi\) and not \(x\). So \[ \frac{\mathrm{d}}{\mathrm{d}x} f(x) = \frac{1}{h} \frac{\mathrm{d}}{\mathrm{d} \chi} F(\chi) \mathrm{.} \]

We seek a polynomial function of the form \begin{align} \Psi_\theta(\chi) = & \theta_0 + \theta_1 \chi + \theta_2 \chi^2 + \theta_3 \chi^3 + \dots + \theta_n \chi^n \nonumber \\ = & \sum_{j=0}^n \theta_j \chi^n \\ \end{align} We call \(\Psi_\theta(\chi)\) the fit function which is defined by a given set of \(\theta_j\) values for \(j = 0,1,2,\dots\,n\). If the fit is perfect than \[ \Psi_\theta(\chi) = F(\chi) \mathrm{.} \] If that is so the first derivative of \(f(x)\) at \(x = \delta\) is \(\frac{\theta_1}{h}\) and we see that higher derivatives are easy to get giving the \(p\)-th derivative at \(x\) equal to \(\delta\) as \[ \left. \frac{\mathrm{d}^p}{\mathrm{d}x^p} f(x) \right|_{x=\delta} = \frac{p!}{h^p} \theta_p \label{psi_eq_F} \] If the fit is not exact than we give an approximation for these derivatives. In our case with nosey data the approximation will be construed as a smoothing of the data. This smoothed fitting of the data can can even be useful for taking the 0-th derivative while smoothing of the data, which will just be a smoothing of the data by using polynomial fitting.

We will have many values of \(\chi\) so we mark each one with a subscript, so we switch to referring to \(\chi_j\) as the \(j\)-th value of variable \(\chi\) and from here out refer to \(\chi\) as a vector with \(m\) values. We wish to find the value of the vector \(\theta\) which makes [\(\ref{psi_eq_F}\)] for \(m = n + 1\), or if \(m \gt n + 1\) minimize the following cost function \[ \sum_{j=1}^m \left[\Psi_\theta(\chi_j) - F(\chi_j)\right]^2 \label{LR_cost} \mathrm{.} \] We define the matrix of column vectors like so \[ X \equiv \left[ \begin{array}{cc} \chi_1^0 & \chi_1^1 & \chi_1^2 & \chi_1^3 & \dots & \chi_1^n \\ \chi_2^0 & \chi_2^1 & \chi_2^2 & \chi_2^3 & \dots & \chi_2^n \\ \chi_3^0 & \chi_3^1 & \chi_3^2 & \chi_3^3 & \dots & \chi_3^n \\ \chi_4^0 & \chi_4^1 & \chi_4^2 & \chi_4^3 & \dots & \chi_4^n \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ \chi_m^0 & \chi_m^1 & \chi_m^2 & \chi_m^3 & \dots & \chi_m^n \end{array} \right] \] where we have \(m\) samples, and \(\chi_j\) where \(j = 1,2,3,\dots,m\) are particular \(\chi\) values so that \[ X \, \left[ \begin{array}{c} \theta_0 \\ \theta_1 \\ \theta_2 \\ \theta_3 \\ \vdots \\ \theta_n \end{array} \right] \equiv X \, \theta = \left[ \begin{array}{c} \Psi_\theta(\chi_1) \\ \Psi_\theta(\chi_2) \\ \Psi_\theta(\chi_3) \\ \Psi_\theta(\chi_4) \\ \vdots \\ \Psi_\theta(\chi_m) \\ \end{array} \right] \equiv \Psi_\theta(\chi) \mathrm{,} \] is column vector of \(\Psi_\theta(\chi)\) values with vector \(\chi\) values \(\chi_1\), \(\chi_2\), \(\chi_3\) and so on. So by using linear least squares the \(\theta\) vector can be solved as (big jump here) \[ \theta = \left(X^T X\right)^{-1} X^T F(\chi) \, \mathrm{,} \label{lls_X} \] where \(F(\chi)\) is a vector of values for the corresponding vector function as in \[ F(\chi) = \left[ \begin{array}{c} F(\chi_1) \\ F(\chi_2) \\ F(\chi_3) \\ F(\chi_4) \\ \vdots \\ F(\chi_m) \end{array} \right] \mathrm{.} \]

This should give us the previous results that we got above for particular cases that have \(m = n + 1\). When \(m \gt n + 1\) we will be doing a smoothing of the data by under-fitting it with a lower order polynomial function, in effect removing squiggles.

Derivative Forward 2 Points by Least Squares Fit Polynomials

We wish to get [\(\ref{f2_result}\)] by way of [\(\ref{lls_X}\)] so we have \[ X = \left[ \begin{array}{cc} 1 & 0 \\ 1 & 1 \\ \end{array} \right] \quad \mathrm{and} \quad F(\chi) = \left[ \begin{array}{c} F(0) \\ F(1) \\ \end{array} \right] \] Using axiom

  m := matrix([[1,0],[1,1]])
  t := transpose(m)
  inverse(t * m) * t
gives \[ \theta = \left[ \begin{array}{cc} 1 & 0\\ -1 & 1\\ \end{array} \right] \left[ \begin{array}{c} F(0) \\ F(1) \\ \end{array} \right] = \left[ \begin{array}{c} F(0) \\ F(1) - F(0)\\ \end{array} \right] \] so \[ \theta_1 = F(1) - F(0) = f(\delta + h) - f(\delta) \] giving \[ f^\prime(\delta) = \frac{f(\delta + h) - f(\delta)}{h} \mathrm{.} \] which is the same as [\(\ref{f2_result}\)] with \(x \to \delta\).

Derivative Centered 5 Points by Least Squares Fit Polynomials

We wish to get [\(\ref{f2_result}\)] by way of [\(\ref{lls_X}\)] so we have \[ X = \left[ \begin{array}{cc} 1 & -2 & 4 & -8 & 16 \\ 1 & -1 & 1 & -1 & 1 \\ 1 & 0 & 0 & 0 & 0 \\ 1 & 1 & 1 & 1 & 1 \\ 1 & 2 & 4 & 8 & 16 \\ \end{array} \right] \quad \mathrm{and} \quad F(\chi) = \left[ \begin{array}{c} F(-2) \\ F(-1) \\ F(0) \\ F(1) \\ F(2) \\ \end{array} \right] \] Using axiom

  m := matrix([[1,-2,4,-8,16],[1,-1,1,-1,1],[1,0,0,0,0],_
       [1,1,1,1,1],[1,2,4,8,16]])
  t := transpose(m)
  inverse(t * m) * t
        + 0     0    1    0    0  +
        |                         |
        |  1     2        2      1|
        | --   - -   0    -   - --|
        | 12     3        3     12|
        |                         |
        |   1   2     5   2      1|
        |- --   -   - -   -   - --|
   (3)  |  24   3     4   3     24|
        |                         |
        |   1   1          1    1 |
        |- --   -    0   - -   -- |
        |  12   6          6   12 |
        |                         |
        |  1     1   1     1    1 |
        | --   - -   -   - -   -- |
        + 24     6   4     6   24 +
gives a \(\theta_1\) with the same constant coefficients as in [\(\ref{dc5_result}\)], oh boy what fun. We use the third row in the above matrix can be used to get the 2nd derivative. Note the first row is simple because it corresponds to the 0-th derivative which is just the original data. Had we used more than 5 points in the fitting the first row gets more interesting.

Now, unexpectedly, we have a fast way, using axiom (or other like thing), to compute coefficients to calculate derivatives using a polynomial function fitting.

Derivative 5 Points Forward

  m := matrix([[1,0,0,0,0],[1,1,1,1,1],[1,2,4,8,16],_
       [1,3,9,27,3*27],[1,4,16,4*16,16*16]])
  t := transpose(m)
  inverse(t * m) * t
Gives the same coefficients as in [\(\ref{df5_cof}\)] as we see in the second row in the matrix below.
        + 1     0     0    0    0 +
        |                         |
        |  25              4     1|
        |- --   4    - 3   -   - -|
        |  12              3     4|
        |                         |
        | 35     13  19     7  11 |
        | --   - --  --   - -  -- |
        | 24      3   4     3  24 |
        |                         |
        |   5   3          7     1|
        |- --   -    - 2   -   - -|
        |  12   2          6     4|
        |                         |
        |  1     1    1     1   1 |
        | --   - -    -   - -  -- |
        + 24     6    4     6  24 +

Derivative 6 Points Forward with \(x^4\) polynomial Fit

  m := matrix([[1,0,0,0,0],[1,1,1,1,1],[1,2,4,8,16],_
[1,3,9,27,3*27],[1,4,16,4*16,16*16],[1,5,25,125,25*25]])
  t := transpose(m)
  inverse(t * m) * t
gives a matrix of
    | 251      5       5     5       5     1  |
    | ---     ---   - ---   ---   - ---   --- |
    | 252     252     126   126     252   252 |
    |                                         |
    |  1375   506      67    248   811      50|
    |- ----   ---   - ---  - ---   ---   - ---|
    |   756   189     189    189   756     189|
    |                                         |
    | 155      349   67     107     209    55 |
    | ---    - ---   --     ---   - ---   --- |
    | 144      144   72      72     144   144 |
    |                                         |
    |   55    149      41     49   121      35|
    |- ---    ---   - ---  - ---   ---   - ---|
    |  216    216     108    108   216     216|
    |                                         |
    |   1       1     1      1       1     1  |
    |  --    - --    --     --    - --    --  |
    |  48      16    24     24      16    48  |
we note that the \(\theta_1\), 2nd row, looks reasonable, and it looks a little more complex because we are now not doing an exact polynomial fitting.

Better Notation

There are three integer parameters of interest. They are the \(n\) the largest polynomial power as in \(x^n\), \(m\) the number of data values used which are \(F(\chi)\) function values, and by convention we order the component values of \(\chi\) making \(\chi_1\) the lowest and \(\chi_m\) the highest. So the integer triplet \(n, m, \chi_1\) are enough to define a method to approximate a derivative (so far). We may use this \(n\), \(m\) convention in the C and axiom code.

See axiom file diff_coef.input which can generate many derivative coefficients at one time.

Making the Fit Function More Local

We wish to reduce the effect of distant points on local values. We change the effect that distant points have by adding a weight function to the cost function that we are minimizing by rewriting [\(\ref{LR_cost}\)] like so \[ \sum_{j=1}^m w_j \left[\Psi_\theta(\chi_j) - F(\chi_j)\right]^2 \label{LRW_cost} \, \mathrm{,} \] where \(w_j\) is a weight function with \[ w_j \geq 0 \, \mathrm{.} \] We call \(w_j\) a weight function, but it's just the discrete sampled values of a weight function. A common weight function is the Gaussian function as in \[ w_j = \mathrm{e}^\frac{\chi_j^2}{2 \sigma} \, \mathrm{,} \] where \(\sigma\) is a chosen constant. Note that this Gaussian function is centered and peaked at \(\chi_j = 0\). Using this Gaussian weight function we decrease the cost of the fit function as we increase the magnitude of \(\chi_j\) which we like.

Again we can exactly solve to find the value of vector \(\theta\) that minimize the new cost function [\(\ref{LRW_cost}\)] from Weighted linear least squares \[ \theta = \left(X^T W X\right)^{-1} X^T W F(\chi) \, \mathrm{,} \] where \(W\) is a diagonal matrix with elements \(w_j\) on the diagonal.

Application

TODO: add weight function.

We wrote the libst library C function

  void stDeriv(const StReal_t *fromArray, StReal_t *toArray, size_t arrayLen, int derivNum, int n, int m)
It generates an array of floating point values that represent the derivNum derivative (0,1,2,3 ...) of an array using a polynomial fitting with highest order \(x^n\) using \(m\) points around each data point. \(m\) must be at least \(n+1\). If \(m\) is greater than \(n+1\) there will be smoothing. For noisy data we can use \(m \gt n+1\) to make the resulting data smoother, i.e. look less noisy. Getting smooth derivatives is the whole point of this process. Some of the C code in stDeriv() is generated with open-axiom. axiom works too, but not non-interactively.

We must keep in mind that the 0th derivative is a very important data smoothing function and that the level of smoothing using in these operations in combination can have beneficial and advance effects.

It may be that we are seeking the most reversible numerical operations. For example we want to preserve the following that applies to analytic functions so that we have an analogous numerical concept of \[ \int \frac{\mathrm{d}f(x)}{\mathrm{d} x} \mathrm{d} x = f(x) + \mathrm{constant} \, \mathrm{.} \]

References