Laser Induced Breakdown Spectroscopy Analysis¶

Tutorial 4: LIBS Analysis using Python¶

Having looked at the options using tools of increasing complexity we come to using my favourite tool for LIBS Analysis - Python. I like Python because it is reasonably easy to start, and it encourages experimentation - not everything has to be right before you can start getting something out of it.

I use WinPython as it offers (at least on Windows) a version that works out of the box.

The source files for these tutorials are Jupyter notebooks which enable me to combine the code and the explanation together. The files are available from the Elementia Consulting Github if you want to try/adapt these for your own use.

About Notebooks¶

The beauty of Jupyter notebooks is that they allow the combining together of Python script and an explanation of what is happening. If you are viewing this on the libs-info website, then the file won't be interactive - you will just the output. If you download the notebook files from the Github, then they can be run live and the results seen.

If you are only interested in the techniques used, then you can skip the Python code lines and their explanation. The pure code commentary will be marked with italics

In the next could of lines, we initialise the Python code that we need to run the calculations. We will be using the Bokeh for the charts

import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import Range1d
from IPython.display import display, Markdown, HTML

from elementia.tutorial_fns import calculate_cross_validation, calculate_intensity_cv, calculate_sum_cv

output_notebook()

Importing the data¶

For this work we will use the following samples, and analyse for Sodium (Na)

Sample	Na. Conc (ppm)
OREAS45e	594
OREAS501b	20848
OREAS601	14543
OREAS921	6068
OREAS603	4283
OREAS933	1515
OREAS903	301

To do this, we use the Python Data Analysis Library (PANDAS). This package is included in most Python implementations.

PANDAS has a read_csv function that handles the import of the file, placing into a PANDAS dataframe, which has a range of usefule features which we will take advantage of - the first of which is that we can use the column heading to label the data

conc_data = pd.read_csv('.\\files\\Na_concs.csv')
print(conc_data)

   Sample  Na Conc (ppm)
0  501b_6          20848
1   601_7          14543
2   921_7           6068
3   603_7           4283
4   933_5           1515
5   45e_7            594
6   903_5            301

In this case we will import the same source CSV file as we have used previously and plot all the spectra

If you are using notebook file, then the tools to the left of the chart allow zooming, panning and saving the image to file

# load the data from file
spectra_data = pd.read_csv('.\\files\\OREAS_source.csv')
# and display the first 5 lines
print(spectra_data.head(5))

#use the Bokeh library, start a chart (or figure)
p1 = figure()

#on our new figure, display the intensities we have loaded from the files.  One of the many beauties of the PANDAS library we used to do 
# the import is that it allows us to use the column labels from the file to represent the columns of data.
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['501b_6'], legend='OREAS 501b', color='red')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['601_7'], legend='OREAS 601', color='orange')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['921_7'], legend='OREAS 921', color='yellow')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['603_7'], legend='OREAS 603', color='green')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['933_5'], legend='OREAS 933', color='blue')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['45e_7'], legend='OREAS 45e', color='indigo')
p1.line(x=spectra_data['Wavelength (nm)'], y=spectra_data['903_5'], legend='OREAS 903', color='violet')

show(p1)

   Wavelength (nm)  501b_6  601_7  921_7  603_7  933_5  45e_7  903_5
0         186.4672   206.1   73.9  278.1   57.7  395.6   88.9  116.5
1         186.5354   204.8   85.3  283.4   75.2  392.7   90.3  117.3
2         186.6037   211.0   79.8  285.5   70.5  402.3   92.7  132.2
3         186.6720   218.9   94.6  293.1   72.7  408.3   94.5  137.6
4         186.7402   229.7   94.4  300.6   88.2  415.1  105.0  142.8

Analysing for Sodium¶

589nm and 589.59nm Peaks¶

Restricting our chart from above to just the Na peaks at 589/589.5nm gives a similar plot to previously.

#restrict the range of the x axis (wavelengths)
p1.x_range = Range1d(585.0, 594.0)

#and reshow the plot
show(p1)

When we used the spreadsheet for this analysis, we were forced to manually inspect the data to find the nearest points for our calculations. Here we can ask Python to do it for us.

# creating a PANDAS index allows for a range of search functions.
wl_index = pd.Index(spectra_data['Wavelength (nm)'])

# firstly find the index (that is row number) in the wavelength list that is closest to 588.0nm.  Store it in the index_588 variable for use later
index_588 = wl_index.get_loc(588.0, method='nearest')

# repeat this for index closest to 589.2nm, 590nm and 587.5nm (background point).
index_589 = wl_index.get_loc(589.2, method='nearest')
index_590 = wl_index.get_loc(590.0, method='nearest')
index_5875 = wl_index.get_loc(587.5, method='nearest')

print("Nearest Point to 588nm is ", spectra_data['Wavelength (nm)'][index_588], 'nm which is at index:', index_588)
print("Nearest Point to 589.2nm is ", spectra_data['Wavelength (nm)'][index_589], 'nm which is at index:', index_589)
print("Nearest Point to 590nm is ", spectra_data['Wavelength (nm)'][index_590], 'nm which is at index:', index_590)
print("Nearest Point to 587.5nm is ", spectra_data['Wavelength (nm)'][index_5875], 'nm which is at index:', index_5875)

Nearest Point to 588nm is  587.9515 nm which is at index: 4289
Nearest Point to 589.2nm is  589.1783 nm which is at index: 4299
Nearest Point to 590nm is  590.0361 nm which is at index: 4306
Nearest Point to 587.5nm is  587.4603 nm which is at index: 4285

Next we ask for the highest value between the our indices

# we will build a list of the peak intensities
intensities_589 = []

# go through all the samples
for sample in conc_data['Sample']:
    #finding the peak/maximum intensity between the rows representing our wavelengths (and subtracting off the background point)
    intensities_589.append(max(spectra_data[sample][index_588:index_589])-spectra_data[sample][index_5875])
    
# repeat this process for the 589.5nm peak
intensities_5895 = []
for sample in conc_data['Sample']:
    intensities_5895.append(max(spectra_data[sample][index_589:index_590])-spectra_data[sample][index_5875])

#and display the intensities vs concentrations plots.
p2 = figure()
p2.xaxis.axis_label = "Intensity (Arb Units)"
p2.yaxis.axis_label = "Concentration (ppm)"
p2.scatter(intensities_589, conc_data['Na Conc (ppm)'], legend="589nm", color="red")
p2.scatter(intensities_5895, conc_data['Na Conc (ppm)'], legend="589.5nm", color="blue")

show(p2)

In the spreadsheet example, we got Excel to calculate the slope of the curve.

Mathematically, the slope can be calculated using

$\begin{aligned} slope = \frac{x \cdot y}{x \cdot x} \end{aligned} $

#while we could have done this calculation many different ways, the python NumPy library has a function for calculating 
# the dot product between 2 vectors.
# The cost is that the lists we made before need to be converted into the NumPy array format.
np_intensities_589 = np.array(intensities_589)
np_intensities_5895 = np.array(intensities_5895)

#now the slopes are a straightforward calculation
slope_589 = np.dot(np_intensities_589, conc_data['Na Conc (ppm)'])/np.dot(np_intensities_589,np_intensities_589)
slope_5895 = np.dot(np_intensities_5895, conc_data['Na Conc (ppm)'])/np.dot(np_intensities_5895,np_intensities_5895)
print("Slope of 589nm Curve:", slope_589)
print("Slope of 589.5nm Curve:", slope_5895)

Slope of 589nm Curve: 0.3402271774176682
Slope of 589.5nm Curve: 0.4058018653231595

Plotting these on our curve, alongside the data shows (again) that the peak intensity of the Na lines at 589nm are unlikely to give us good/useable calibration curves

p2.line([0.0, 40000.0], [0.0, slope_589*40000], color="red")
p2.line([0.0, 40000.0], [0.0, slope_5895*40000], color="blue")
show(p2)

818.3nm and 819.4nm Na Peaks¶

As we did previously, we will now investigate the Na peaks at 818.3nm and 819.4nm

# change our previous plot to better display our new region of interest.
p1.x_range = Range1d(815, 825.0)
p1.y_range = Range1d(0, 7000)
show(p1)

As the concentration of Na reduces, there still remains some underlying structure in the spectra, which makes the 818.3nm line unlikely to be useable at low Na concentrations. For this work we will not use it.

As before, we will get Python to find the nearest peaks for the region in which we will search for the highest intensity (819.1-819.9nm) and for a measure of the underlying background signal (817.5nm)

#Similar to before, find the indices (row numbers) of interest 
index_8191 = wl_index.get_loc(819.1, method='nearest')
index_8199 = wl_index.get_loc(819.9, method='nearest')
index_8175 = wl_index.get_loc(817.5, method='nearest')

# caculate the peak intensities (removing the background at 817.5nm)
intensities_8194 = []
for sample in conc_data['Sample']:
    intensities_8194.append(max(spectra_data[sample][index_8191:index_8199])-spectra_data[sample][index_8175])
    
# show the figure    
p3 = figure()
p3.xaxis.axis_label = "Intensity (Arb Units)"
p3.yaxis.axis_label = "Concentration (ppm)"
p3.scatter(intensities_8194, conc_data['Na Conc (ppm)'], color="red")

# with the calibration curve
np_intensities_8194 = np.array(intensities_8194)
slope_8194 = np.dot(np_intensities_8194, conc_data['Na Conc (ppm)'])/np.dot(np_intensities_8194,np_intensities_8194)
p3.line([0.0, 6000.0], [0.0, slope_8194*6000], color="red")

show(p3)

Analysis Performance¶

As with previous work, while a visual inspection of the quality of a calibration curve can tell an experienced researcher quite a bit, deciding the which curve to use should have some more formal measures. In Tutorial 2, we used LOD and the linear regression uncertainty as measures.

Here we introduce a new one - Root Mean Square Error in Prediction (RMSEP) or sometimes referred to as Root Mean Square Error in Cross Validation The aim of this measure is to provide an estimate of the uncertainty in the concentration value from the calibration curve.

Calibration Curves for Intensity¶

Na Peak at 819.4nm¶

# As a first example, lets calcuate the calibration curve without the point with the maximum concentration.
# So, as before, calculate the intensities
intensities_8194_drop_max = []
for sample in conc_data['Sample'][1:]:
    intensities_8194_drop_max.append(max(spectra_data[sample][index_8191:index_8199])-spectra_data[sample][index_8175])
    
#now manually calculate the intensity for the 'missing' point
intensity_501b_6 = max(spectra_data['501b_6'][index_8191:index_8199])-spectra_data['501b_6'][index_8175]
    
# calculate the slope of the calibration curve
np_intensities_8194_drop_max = np.array(intensities_8194_drop_max)
slope_8194_drop_max = np.dot(np_intensities_8194_drop_max, conc_data['Na Conc (ppm)'][1:])/np.dot(np_intensities_8194_drop_max,np_intensities_8194_drop_max)

# plot the new calibration curve [blue dotted line]
p3.line([0.0, 6000.0], [0.0, slope_8194_drop_max*6000], color="blue", line_dash='dotted')

# plot the predicted concentration [blue circle]
p3.scatter([intensity_501b_6], [(slope_8194_drop_max*intensity_501b_6)], color='blue')

print("Calculated Concentration:", slope_8194_drop_max*intensity_501b_6, "ppm. Actual Conc:", conc_data['Na Conc (ppm)'][0])
print("Diff (ppm):", slope_8194_drop_max*intensity_501b_6-conc_data['Na Conc (ppm)'][0] )

show(p3)

Calculated Concentration: 20019.69882664158 ppm. Actual Conc: 20848
Diff (ppm): -828.3011733584208

In this example the calibration curve without the highest concentration point, predicts a concentration of 20019.7ppm (a difference of 828.3ppm). This shows there is an internal consistency in the data - the prediction is reasonably close, even when the point is not used in the calibration.

Repeating this for each of the points in turn - that is:-

removing the point $I_i$ from the data set
performing a calibration without the missing point
analysing the removed point using the new calibration curve (making concentration $C^u_i$
determine the difference to the true value $C^s_i - C^u_i $

Adding the differences using the Root-Mean-Square formula, that is

$\begin{aligned} RMSEP = \sqrt{\frac{\sum_n \left(C^s_i - C^u_i \right)^2 }{n}} \end{aligned} $

# To simplify things, we have automated this in a function.
# If you are working from the Github, make sure you also download the elementia directory as this has the functions in it
cv_concs, rmsep = calculate_cross_validation(concs=conc_data['Na Conc (ppm)'], intensities=intensities_8194)

# plot the predicted concentrations on the chart
p3.scatter(intensities_8194, cv_concs, color='blue')

# and show the predicted concs
print(cv_concs)
# and the RMSEP
print(rmsep)

show(p3)

[20019.69882664158, 14830.32899507123, 6529.187070738327, 3763.894013992636, 2413.5634135692017, 814.078983752322, 977.7699656694091]
605.2949550009506

Our calibration curve using the peak intensities has a RMSEP = 605ppm. Not bad!

As a different way of visualising the results - let's plot the predicted concentrations vs the actual concentrations. Sometimes this can give insight into where our calibration is performing well and where it isn't

p4 = figure()
p4.xaxis.axis_label = "Predicted Concentration (ppm)"
p4.yaxis.axis_label = "Actual Concentration (ppm)"

p4.scatter(conc_data['Na Conc (ppm)'], cv_concs, color="red")

# The 1:1 line (that is if the point lies on this line, then there is an exact match between predicted and actual.)
p4.line([0.0, conc_data['Na Conc (ppm)'][0]], [0.0, conc_data['Na Conc (ppm)'][0]])

show(p4)

Na Peaks at 589nm and 589.5nm¶

As a comparison, we will do all the same work on the 589nm and 589.5nm peak intensities. That is,

calculate the predicted concentration from a calibration curve made without the point
calculate the difference between the predicted concentration and the actual concentration.
Calcuate the RMSEP

# Set up the Chart
p5 = figure()
p5.xaxis.axis_label = "Predicted Concentration (ppm)"
p5.yaxis.axis_label = "Actual Concentration (ppm)"
p5.line([0.0, conc_data['Na Conc (ppm)'][0]], [0.0, conc_data['Na Conc (ppm)'][0]])

# Calculate the RMSEP and Cross Validation Predictions for the 589nm peak
cv_concs_589, rmsep_589 = calculate_intensity_cv(wavelengths=spectra_data['Wavelength (nm)'], intensities=spectra_data, 
                                                 concs=conc_data['Na Conc (ppm)'], samples=conc_data['Sample'], 
                                                 wavelength_lower=588.0, wavelength_upper=589.25, wavelength_background=587.5)

# Calculate the RMSEP and Cross Validation Predictions for the 589.5nm peak
cv_concs_5895, rmsep_5895 = calculate_intensity_cv(wavelengths=spectra_data['Wavelength (nm)'], intensities=spectra_data, 
                                                 concs=conc_data['Na Conc (ppm)'], samples=conc_data['Sample'], 
                                                 wavelength_lower=589.25, wavelength_upper=591.0, wavelength_background=587.5)

# Plot the 589nm points in red
p5.scatter(conc_data['Na Conc (ppm)'], cv_concs_589, color="red")

# Plot the 589.5nm points in blue
p5.scatter(conc_data['Na Conc (ppm)'], cv_concs_5895, color="blue")

print("RMSEP (589nm):", rmsep_589)
print("RMSEP (589.5nm):", rmsep_5895)
show(p5)

RMSEP (589nm): 5877.700787954347
RMSEP (589.5nm): 4970.670933930147

The predicted points lie quite a way off the 1:1 line, and this is demonstrated in the RMSEP - at best 8 times worse than the 819.4nm peak.

Peak Area Calibration Curve for 819.4nm¶

Repeating similar work, we will calculate the Peak Area for each of the samples, constructing a calibration curve using these values and determine the RMSEP.

sum_8194 = []

#Calculate the peak area, subtracting off the background area.
for sample in conc_data['Sample']:
    sum_8194.append(sum(spectra_data[sample][index_8191:index_8199])-(index_8199-index_8191-1)*spectra_data[sample][index_8175])
    
p6 = figure()
p6.xaxis.axis_label = "Intensity (Arb Units)"
p6.yaxis.axis_label = "Concentration (ppm)"
p6.scatter(sum_8194, conc_data['Na Conc (ppm)'], color="red")

# calculate the slope of the fitting curve
np_sum_8194 = np.array(sum_8194)
slope_sum_8194 = np.dot(np_sum_8194, conc_data['Na Conc (ppm)'])/np.dot(np_sum_8194,np_sum_8194)
p6.line([0.0, 15000.0], [0.0, slope_sum_8194*15000], color="red")
print(slope_sum_8194)

show(p6)

1.3061347141817483

Now, calcuate the RMSEP

cv_concs_sum_8194, rmsep_sum_8194 = calculate_cross_validation(concs=conc_data['Na Conc (ppm)'], intensities=sum_8194)
p6.scatter(sum_8194, cv_concs, color='blue')

print("RMSEP (819.4nm):", rmsep_sum_8194)

show(p6)

RMSEP (819.4nm): 1058.5072423993122

As we found using the Spreadsheet example in Tutorial 2, the performance of the calibration curve using peak areas for the 819.4nm peak doesn't perform as well as the curve using peak intensity.

Replotting this to show the predicted vs the actual concentration shows performance is worse at both the high and low concentration ends.

p7 = figure()
p7.xaxis.axis_label = "Predicted Concentration (ppm)"
p7.yaxis.axis_label = "Actual Concentration (ppm)"
p7.scatter(conc_data['Na Conc (ppm)'], cv_concs_sum_8194, color="red")
p7.line([0.0, conc_data['Na Conc (ppm)'][0]], [0.0, conc_data['Na Conc (ppm)'][0]])

show(p7)

Combining our :-

RMSEP measures for 819.4nm, 589nm, 589.5nm and the combined 589/589.5nm peak
Standard Error (from Tutorial 2)

Tells a consistent story:-

The calibration curve using the maximum peak intensity of the 819.4nm Na peak should give the best results
For some applications, the 819.4nm peak area calibration curve may offer the best compromise of better LOD without too large an uncertainty penalty.

rmsep_sum_589 = calculate_sum_cv(wavelengths=spectra_data['Wavelength (nm)'], intensities=spectra_data, 
                                                 concs=conc_data['Na Conc (ppm)'], samples=conc_data['Sample'], 
                                                 wavelength_lower=588.0, wavelength_upper=589.25, wavelength_background=587.5)[1]
rmsep_sum_5895 = calculate_sum_cv(wavelengths=spectra_data['Wavelength (nm)'], intensities=spectra_data, 
                                                 concs=conc_data['Na Conc (ppm)'], samples=conc_data['Sample'], 
                                                 wavelength_lower=589.25, wavelength_upper=591.0, wavelength_background=587.5)[1]
rmsep_sum_589_5895 = calculate_sum_cv(wavelengths=spectra_data['Wavelength (nm)'], intensities=spectra_data, 
                                                 concs=conc_data['Na Conc (ppm)'], samples=conc_data['Sample'], 
                                                 wavelength_lower=588.0, wavelength_upper=591.0, wavelength_background=587.5)[1]
display(HTML("<table><tr><th>Peak (nm)</th> <th>Intensity RMSEP</th><th>Area RMSEP</th><th>Intensity SE (Tut 2)</th><th>Area SE (Tut 2)</th><th>Intensity LOD (Tut 2)</th><th>Area LOD (Tut 2)</th>"+
             "</tr><tr><td>819.4</td><td>%.1f</td><td>%.1f</td><td>562.0</td><td>743.4</td><td>80.0</td><td>6.0</td></tr>" % (rmsep, rmsep_sum_8194) +
             "</tr><tr><td>589</td><td>%.1f</td><td>%.1f</td><td>5054.2</td><td>2720.5</td><td>16.3</td><td>30.5</td></tr>" % (rmsep_589, rmsep_sum_589) +
             "</tr><tr><td>589.5</td><td>%.1f</td><td>%.1f</td><td>4178.2</td><td>2511.3</td><td>19.4</td><td>8.3</td></tr>" % (rmsep_5895, rmsep_sum_5895) +
            "</tr><tr><td>589/589.5</td><td></td><td>%.1f</td><td></td><td>2753.4</td><td></td><td>3.7</td></tr></table>" % (rmsep_sum_589_5895)))

Peak (nm)	Intensity RMSEP	Area RMSEP	Intensity SE (Tut 2)	Area SE (Tut 2)	Intensity LOD (Tut 2)	Area LOD (Tut 2)
819.4	605.3	1058.5	562.0	743.4	80.0	6.0
589	5877.7	3464.5	5054.2	2720.5	16.3	30.5
589.5	4970.7	3165.9	4178.2	2511.3	19.4	8.3
589/589.5		3335.9		2753.4		3.7