Persistent Topology for Peak Detection
This is a little story of how an abstract mathematical theory like persistent homology can be a practical tool for a usual task in industry.
At B+R Industrial Automation we have an open discussion culture and from time to time there are little take-away questions for people from computer science or mathematics. One such task was the following:
In order to identify the natural frequency of some mechanics attached to an ACOPOS servo drive one actuates the drive with a frequency sweep signal and records the response signal of the position. The (Fourier) spectrum of such a response signal could then look like in the picture below. The question is now how to detect the “main peaks” in this signal?
The original problem statement came from motion control, however, the underlying task carries more universality1: Given a real-valued signal resp. a time series, where are the “significant peaks”?
One of the first attempts would be to find maxima. The global maximum is not what we look for because we might be interested in this very significant local maximum at 10. However, all (or only some?) local maxima is probably also not what we look for: (i) the local maximum at 33 is quite insignificant compared to the local maximum at around 55 although the signal is higher and (ii) measured signals come with noise that cause many local maxima that are irrelevant for our analysis.
We could get rid of the noise by smoothing the signal. However, this often comes at costs: For instance the sharp peak at 10 would be quickly dampened and our impression on the “significance” of the peak would strongly altered. If possible we would always prefer to operate on the original data. But even if we would decide to smooth the signal, say, by a Gaussian smoothing kernel, what would be suitable standard deviation? Bringing bigger guns like scale space into the game appears to be inadequate to address the simple question originally raised.
What we look for is a method to identify the “relative maxima” in a stable, noise-resilient fashion, and, moreover, quantify the significance (or dominance) of these maxima. If we look for the eight most significant peaks in the signal above then we would expect something similar to this:
In simple words: Consider a water level starting above the global maximum that continuously lowers its level. Whenever the water level reaches a local maximum a new island is born. Whenever it reaches a local minimum two islands merge, and we consider the lower island to be merged into the higher island. This gives a precise notion of a lifespan of an island; it is the significance we mentioned above, or the so-called persistence in persistent homology. In this precise sense the above plot shows the eight most persistent islands/0-dimensional cycles on the original data.
The blue bars start at local maxima and grow downwards until the level where the corresponding island merges with another island. Only the island raised by the global maximum never dies; everything is eventually merged into it on the way to minus infinity. The blue bars are known as the barcode in persistent topology.
The persistence diagram is another representation of the features of persistent homology. It depicts every island as a point with birth- and death-level as the x- and y-coordinates. In our example the (0-dimensional) persistence diagram looks as follows:
An efficient algorithm
The above example is a simple case of the general problem of computing the persistent homology (over ) in a given dimension of a filtration of a simplicial complex. The classical approach to this is via matrix reduction (c.f. Gaussian elimination) by Edelsbrunner and Harer2 and takes time, where denotes the length of the signal in the above example, or the number of simplices of the underlying simplcial complex in general. This is what my C++ library libstick or one of the several other available implementations do.
In the simple case of the sublevel set filtration of a real-valued signal, however, we can do much better, namely in the speed of sorting numbers, which is , and the algorithm is very simple, too:
The general theme for this algorithm is to track growing islands and merge them from time to time. A union-find data structure is what we look for in general. However, since our sets are disjoint intervals on the real line, we can replace the part after sorting by signals values to a loop with a simple, constant-time body instead of otherwise time-optimal union-find operations.
Typically one considers the superlevel set filtration in which the water level rises and death levels are higher than birth levels, which then leads to the usual definition of persistence as death level minus birth level and dots in the persistence diagrams reside above the diagonal. However, the sublevel set filtration of one function is the superlevel set filtration of the negative function.
References and further reading
Examples for the 0-th and 1-st dimensional persistence diagram on two-dimensional input are given on the demonstration page of the libstick library. The getting started of libstick describes the general work flow from defining simplicial complexes to computing persistence diagrams.