Usage Guide

As a more detailed usage example, we will go through the analysis of a simulated FTIR spectrum.

A note on plotting

NoisySignalIntegration provides several "recipes" for Plots.jl to easily plot the various (interim) results. Often, merely calling plot() and passing in data types from NoisySignalIntegration will work. Examples are included in this guide.

Suppose our spectrum looks like the following simulation:

using Distributions, NoisySignalIntegration, Plots
using Random: seed!

spectrum = NoisySignalIntegration.testdata_1()
plot(spectrum, label="simulated spectrum")

In order two apply the NoisySignalIntegration uncertainty analysis, we must perform 4 basic steps:

From the spectrum, crop the region that contains the signals and the region that contains a representative sample of the noise
Characterize the noise (to be able to simulate it in the Monte-Carlo draws)
Set integration bounds and their associated uncertainties
Run the mc_integrate function

Cropping the spectrum

Let's start by dividing the spectrum into the bands we want to integrate and the noise we want to analyse. We can do this by using the crop function.

slice_bands = crop(spectrum,  5.0,  40.0)
slice_noise = crop(spectrum, 40.0, 100.0)

plot(slice_bands; label="bands")
plot!(slice_noise; label="noise")

Noise analysis

The spectrum has a quite considerable baseline which constitutes a problem when analysing the noise. To prepare the noise spectrum slice_noise for analysis, we create a NoiseSample object. Upon construction of the NoiseSample object, a polynomial is fitted and subtracted to remove the baseline:

noise = NoiseSample(slice_noise, 3) # 3 = remove third order polynomial baseline

Plotting the original slice and the NoiseSample object shows the data after baseline removal:

plot(slice_noise, label="cropped noise")
plot!(noise, label="NoiseSample object")

In order to simulate the noise, we must determine its characteristics. A model is retrieved by fitting the estimated autocovariance:

nm = fit_noise(noise)

# plot the fitting result:
plotautocovfit(noise, nm);
# create zoomed inset:
lens!([0, 1.5], [-1e-3, 3e-3], inset = (1, bbox(0.3, 0.3, 0.3, 0.3)))

Plotting the model next to the noise object is an important sanity check to verify that the fitting yielded a sensible estimate and that generated noise samples do mimic the experimental noise.

plot(noise, nm)

plotting more samples

They keyword draws controls how many random generated noise draws are plotted:

plot(noise, nm; draws=5) # draw 5 instead of 3 (default) noise draws

This also works when you plot Monte-Carlo draws.

Preparing the spectrum for integration

Now that we have a noise model, we can generate an UncertainCurve. An UncertainCurve holds random draws of the original spectrum plus noise:

uncertain_spectrum = add_noise(slice_bands, nm, 50_000) # generate 50_000 random samples

If we plot the uncertain_spectrum, we get a ribbon plot showing a 95% confidence band:

plot(uncertain_spectrum)

We can also plot single draws by using the mcplot() function from MonteCarloMeasurements.jl:

using MonteCarloMeasurements

mcplot(uncertain_spectrum; draws=20)

Integration bounds

NoisySignalIntegration deals with uncertainty in placing integration bounds by expressing each bound by one ore more probability distributions. Any continuous, univariate distribution from Distributions.jl can be used to define integration bounds.

To define integration bounds, the UncertainBound object is used. There are several options available to create an UncertainBound:

Passing two distributions
Passing a position, a distribution, and an UncertainCurve
Passing a vector of positions, a distribution, and an UncertainCurve

Defining an `UncertainBound` using a start and end point

If two distributions are passed to UncertainBound(), they will be interpreted as start and end points for the integration window with the uncertainty of these points being expressed by the spread of the distributions.

For example, let's say we want to integrate the right peak of our simulated spectrum:

plot(crop(spectrum, 20, 40), label="right peak")
plot!([27, 32], [1.3, 1.3]; markershape=:cross, label="integration interval")

It looks like integrating from about 27 to 32 would be appropriate, but there is some doubt of the exact location of the integration bounds. Perhaps a reasonable estimate is that the left bound falls in the range from 26 to 27 and the right bound in the range from 32 to 33. This would be expressed with a UncertainBound that is defined using two uniform distributions:

lrb = UncertainBound(Uniform(26, 27), Uniform(32, 33)) # 10 k samples by default

Upon creation of the UncertainBound object, pseudo random samples of the integration start and end point are drawn. If we do not provide the number of samples, it will default to 10 000. We can plot the bound as a histogram to see the distribution of the start and end point:

histogram(lrb; label=["start" "end"])

The uniform distribution is of course a bit of an awkward choice, because its probability density suddenly drops to 0, which perhaps does not model one's belief about the position of the integration bounds very well.

Due to the central limit theorem and the general applicability of the normal distribution, it is often a natural choice when dealing with uncertainties:

lrb_normal = UncertainBound(Normal(26.5, 0.5), Normal(32.5, 0.5), 12_000) # we draw 12_000 samples, just to illustrate how it works

histogram(lrb_normal; label=["start" "end"])

However, in this particular case of describing the uncertainty of integration bounds, the tails of the normal distribution are problematic, because they lead to occasional extreme values of the integration interval, which would not seem realistic.

A compromise between the uniform and normal distribution is a scaled and shifted beta(2, 2) distribution. Its shape resembles the shape of the normal distribution but it is missing the tails. Since a scaled and shifted beta distribution does not ship with Distributions.jl, NoisySignalIntegration includes the function scale_shift_beta(α, β, a, b) which can be used to generate a beta(α, β) distribution that has a support in the interval a to b.

Again, a demonstration may help to explain (we keep the normal distribution for the right bound so we can compare the distributions):

lrb_beta_normal = UncertainBound(scale_shift_beta(2, 2, 26, 27), Normal(32.5, 0.5))

histogram(lrb_beta_normal; label=["start" "end"], normalize=true)

Defining an `UncertainBound` using a position, width, and `UncertainCurve` (symmetric bands)

For some spectra, one can assume that a band is more or less symmetric. In this case, it may be better to define an integration window not by a start and end point but merely by a width, and integrate the band symmetrically around its peak position ± half this width.

To accomplish this, one has to construct an UncertainBound object by passing a position (the peak position of the symmetic band), a distribution that describes the uncertainty in the integration window width (here width_distribution), and an UncertainCurve that holds samples of the spectrum (here uncertain_spectrum):

position = 15.0
 # widths will fall in the range 2 to 3, with a maximum at 2.5
width_distribution = scale_shift_beta(2, 2, 2, 3)
# define a "width bound"
wb = UncertainBound(position, width_distribution, uncertain_spectrum)

From the provided data, the UncertainBound object is created as follows:

The width distribution is used to draw as many random samples of the integration window width $w$ as the uncertain_spectrum contains spectral samples
For each spectral sample in uncertain_spectrum, the peak position $p_x$ in the range position ± $\frac{w}{2}$ is retrieved
The peak position $p_x$ is used to define the start and end point of the integration window for each spectral sample, $p_x - \frac{w}{2}$ and $p_x + \frac{w}{2}$

Therefore, what is stored in the UncertainBound object are again start and end points for the integration. We can verify this by plotting another histogram:

histogram(wb; label=["start" "end"], normalize=true, linewidth=0)

The crucial difference compared to the bound defined from two distributions is that the start and end points are now placed symmetrically with respect to the band's peak position. The UncertainBound now "follows" the peak in each Monte-Carlo draw, so to speak.

Defining an `UncertainBound` using several positions, a width, and `UncertainCurve` (several symmetric bands with same width)

If, for example from a physical argument, we can say that two bands should have the same width, we can constrain our UncertainBounds even further: we can create several bounds that share the exact same integration window width in each draw.

All we have to do is to provide the constructor of UncertainBound not with a single position, but with an array of several positions:

positions = [15.0, 30.0]

wb_1, wb_2 = UncertainBound(positions, width_distribution, uncertain_spectrum)

Note that the constructor will then return an array of UncertainBound objects which we unpacked into the variables wb_1 and wb_2 in the example above.

The histograms of the start and end points looks like this:

histogram( wb_1; label=["start 1" "end 1"], normalize=true, linewidth=0)
histogram!(wb_2; label=["start 2" "end 2"], normalize=true, linewidth=0)

It is not obvious from the histograms that the widths of the integration windows stored in wb_1 and wb_2 are identical, so we calculate and print them here manually to prove this:

for i in 1:10
    l1 = wb_1.left.particles[i]
    l2 = wb_2.left.particles[i]
    r1 = wb_1.right.particles[i]
    r2 = wb_2.right.particles[i]

    println("draw $i: width 1 = ", r1 - l1, " width 2 = ", r2 - l2)
end

draw 1: width 1 = 2.508220740749886 width 2 = 2.508220740749884
draw 2: width 1 = 2.3693963358500785 width 2 = 2.3693963358500767
draw 3: width 1 = 2.4979266547829138 width 2 = 2.4979266547829155
draw 4: width 1 = 2.3238679213036786 width 2 = 2.3238679213036804
draw 5: width 1 = 2.335871581391256 width 2 = 2.3358715813912525
draw 6: width 1 = 2.029793123091501 width 2 = 2.029793123091501
draw 7: width 1 = 2.692943688217433 width 2 = 2.692943688217433
draw 8: width 1 = 2.110326439755454 width 2 = 2.110326439755454
draw 9: width 1 = 2.517320261240366 width 2 = 2.5173202612403642
draw 10: width 1 = 2.13427868379193 width 2 = 2.1342786837919263

Watch out for the support of your width distribution

Note that the distribution that you pass to UncertainBound along with a position/positions must not allow for negative values (i.e. its support must end before 0). Keep in mind that a normal distribution, for example, has support from -∞ to ∞, so it is a poor choice here.

Plotting Monte-Carlo draws

To verify that the integration windows and derived integrals are sensible, it is a good idea to plot a few draws and integrals before running the full Monte-Carlo algorithm. We can do so by passing an UncertainCurve and an array of UncertainBounds to the plot function:

plot(uncertain_spectrum, [wb_1, wb_2]; size=(400, 500), local_baseline=true)

We can see from the plot that our estimate for the width of the peaks was perhaps a bit too small, so we retry:

width_distribution = scale_shift_beta(2, 2, 3, 4) # width will fall in the range [3, 4]
wb_1, wb_2 = UncertainBound(positions, width_distribution, uncertain_spectrum)

plot(uncertain_spectrum, [wb_1, wb_2]; size=(400, 500), local_baseline=true)

An alternative to the plot function is the animate_draws function which allows you to visualize the Monte-Carlo draws in a gif animation:

NoisySignalIntegration.animate_draws(
    uncertain_spectrum, [wb_1, wb_2];
    size=(300, 150),
    local_baseline=true,
    draw_band_centers=true,
)

The vertical lines are band centers (see mc_bandcenter), which are drawn if the keyword argument draw_band_centers is true.

Running the integration algorithm

The integration is performed with the function mc_integrate. We have to pass in the uncertain spectrum and integration bounds. Since we pass in two integration bounds, we retrieve two areas:

area_1, area_2 = mc_integrate(uncertain_spectrum, [wb_1, wb_2]; local_baseline=true)

2-element Vector{MonteCarloMeasurements.Particles{Float64, 50000}}:
 0.804 ± 0.13
 1.93 ± 0.13

We can look at the histogram of the integrals:

histogram([area_1.particles, area_2.particles]; label=["band area 1" "band area 2"])

Or of the peak area ratio, simply by calculating with the retrieved areas:

ratio = area_1 / area_2
histogram(ratio.particles; label="band area ratio (band 1/band 2)")

We see that the histogram of peak area ratio peaks around 0.5, which is what we put into the simulation of the spectrum.

We can use some basic statistical functions to characterize the result:

Statistics on `MonteCarloMeasurements` uncertain numbers

To calculate statistics on uncertain numbers (which are implemented using the MonteCarloMeasurements package), you have to prefix the statistics function names with the letter p, e.g. pmean(), pstd(), etc., see the MonteCarloMeasurements documentation.

pmean(ratio)

0.41792104677106023

pstd(ratio)

0.07488909421860397

pquantile(ratio, 0.025)

0.2760005180511169

pquantile(ratio, 0.5)

0.4158723852463002

pquantile(ratio, 0.975)

0.570985589289268

We find that, considering the noise and the uncertainty in the integration bounds, we end up with a 95% uncertainty interval of roughly 0.4 to 0.7.

Sensitivity analysis

It is perfectly valid to create several UncertainBounds for one and the same band and feed them into mc_integrate(), e.g. to perform a sensitivity analysis on how much the result depends on the kind and parameters of the bounds.

You find more usage examples on the next page. In particular, check out the error propagation example to see how to proceed with uncertainty calculations with the retrieved areas using MonteCarloMeasurements.jl.