To date, our analysis has used a single spectrum for each sample. The single spectrum, has been the average of multiple spectra across the sample. This has simplified our analysis, but the averaging has meant that information about the variation of elements across the sample has been discarded.
In this tutorial, we will look at what can be done with the information we had previously discarded.
As with the previous tutorial, those who are interested in the Python can read the code cells (which have comments). If not, just skip to the next markdown cell for the outcomes.
For most LIBS systems and samples there will be variation between spectra. This is the result of
Any use of these spectra needs to quantify the effect of those variations on the result.
import pandas as pd
import numpy as np
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
from bokeh.models import Range1d
output_notebook()
# load the data from file
spectra_data = pd.read_csv('.\\files\\OREAS501b_6.txt', index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# and display the first 5 lines
print(spectra_data.head(5))
#use the Bokeh library, start a chart (or figure)
p1 = figure()
#use a list of colours
colours_list = ["red", "orange", "yellow", "green", "blue", "indigo", "violet", "black", "steelblue", "gray"]
# to automate the plotting
for i in range(1,11):
p1.line(x=spectra_data['Wavelengths'], y=spectra_data['%i' % i ], legend='Spectra %i' % i, color=colours_list[i-1])
p1.yaxis.axis_label = "Intensity (Arb Units)"
p1.xaxis.axis_label = "Wavelength (nm)"
show(p1)
The example couple of data rows, and the spectra chart shows the level of variation between spectra. On this sample, the different spectra represent 10 different locations in a line across the sample.
If we zoom in on our Na peaks at 818-820nm, this spectra to spectra variation becomes more apparent
#restrict the range of the x axis (wavelengths)
p1.x_range = Range1d(816.0, 821.0)
# restrict the range of the y axis to enable better visualisation.
p1.y_range = Range1d(0.0, 8000.0 )
#and reshow the plot
show(p1)
Or plotting the peak height by laser shot
# creating a PANDAS index allows for a range of search functions.
wl_index = pd.Index(spectra_data['Wavelengths'])
#Similar to previous tutorial, find the indices (row numbers) of interest
index_8191 = wl_index.get_loc(819.1, method='nearest')
index_8199 = wl_index.get_loc(819.9, method='nearest')
index_8175 = wl_index.get_loc(817.5, method='nearest')
# we will build a list of the peak intensities across our sample
intensities_8194 = []
spectra_num = []
# go through all the spectra
for i in range(1,11):
#finding the peak/maximum intensity between the rows representing our wavelengths (and subtracting off the background point)
intensities_8194.append(max(spectra_data['%i' % i][index_8191:index_8199])-spectra_data['%i' % i][index_8175])
spectra_num.append(i)
# output the figure
p2 =figure()
p2.line(x=spectra_num, y=intensities_8194)
p2.y_range = Range1d(0.0, 8000.0 )
p2.yaxis.axis_label = "Intensity (Arb Units)"
p2.xaxis.axis_label = "Spectra Number"
show(p2)
Later in this Tutorial, we will look at one strategy for reducing some of the difference between our spectra, for now we will just use the peak heights.
This is the variation for one sample, now let's look at the variation across the samples in our calibration set.
# Load the concentration Data
conc_data = pd.read_csv('.\\files\\Na_concs.csv')
# Repackage the data into a dictionary to make retrieving/interacting with this easier in future
conc_dict = {}
for i in range(len(conc_data)):
conc_dict[conc_data['Sample'][i]] = conc_data['Na Conc (ppm)'][i]
spectra_filenames = ["45e_7", "501b_6", "601_7", "603_7", "903_5", "921_7", "933_5"]
spectras_dict = {} # the spectras will be saved here for easy access.
intensities_8194 = [] # the intensities for the Na Peak at 819.4nm will be saved here
concs = []
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
#finding the peak/maximum intensity between the rows representing our wavelengths (and subtracting off the background point)
intensities_8194.append(max(spectras_dict[sf]['%i' % i][index_8191:index_8199])-spectras_dict[sf]['%i' % i][index_8175])
concs.append(conc_dict[sf])
# Show this as a figure
p3 =figure()
p3.scatter(x=intensities_8194, y=concs)
p3.xaxis.axis_label = "Intensity (Arb Units)"
p3.yaxis.axis_label = "Concentration (ppm)"
# Calculate the line of best fit.
np_intensities_8194 = np.array(intensities_8194)
slope_8194 = np.dot(np_intensities_8194, concs)/np.dot(np_intensities_8194,np_intensities_8194)
p3.line([0.0, 7000.0], [0.0, slope_8194*7000], color="red")
show(p3)
Despite having a good calibration curve for the average intensities, there is still quite a bit of variation in the intensities for a concentration. Even to the point where the lowest intensities for one concentration overlap the higher intensities for lower concentrations.
Let's explore what this looks like in terms of a spread of concentrations for a sample.
#import the functions we used last tutorial to do this.
from elementia.tutorial_fns import calculate_cross_validation, calculate_intensity_cv, calculate_sum_cv
cv_concs, rmsep_8194 = calculate_cross_validation(concs=concs, intensities=intensities_8194)
# plot the predicted concentrations on the chart
p3.scatter(intensities_8194, cv_concs, color='blue')
# and the RMSEP
print(rmsep_8194)
show(p3)
The spread of values for a concentration is a little alarming. How big is it actually?
dict_8194_intensity = {'base': [], 'upper': [], 'lower':[], 'concs': []} # out data will go here
#start the html that will form our table later
html_text = "<table><tr><th>Sample</th><th>Conc Ref (ppm)</th> <th>Conc Pred (ppm)</th><th>SD (ppm)</th><th>RSD</th></tr>"
for sf in spectra_filenames:
# go through all the spectra
predicted_conc_values = []
for i in range(1,11):
#finding the peak/maximum intensity between the rows representing our wavelengths (and subtracting off the background point)
#calculate what this is as a concentration and add to the list.
predicted_conc_values.append(slope_8194*(max(spectras_dict[sf]['%i' % i][index_8191:index_8199])-spectras_dict[sf]['%i' % i][index_8175]))
dict_8194_intensity['base'].append(np.mean(np.array(predicted_conc_values)))
dict_8194_intensity['upper'].append(np.mean(np.array(predicted_conc_values))+np.std(np.array(predicted_conc_values)))
dict_8194_intensity['lower'].append(np.mean(np.array(predicted_conc_values))-np.std(np.array(predicted_conc_values)))
dict_8194_intensity['concs'].append(conc_dict[sf])
# add this as a row in the table
html_text += "<tr><td>%s</td><td>%.1f</td><td>%.1f</td><td>%.1f</td><td>%.1f</td></tr>" % (sf, conc_dict[sf], np.mean(np.array(predicted_conc_values)), np.std(np.array(predicted_conc_values)), 100.0*np.std(np.array(predicted_conc_values))/np.mean(np.array(predicted_conc_values)))
#finish off the table
html_text += "</table>"
from IPython.display import display, Markdown, HTML
# and display it
display(HTML(html_text))
A Relative Standard Deviation (RSD) of 26% is not at all good.
Maybe using the peak area will give better results.
pa_8194 = [] # peak area for 819.4nm peak will go here
pa_concs = [] # and this will hold the concentrations for the figure.
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
#sum the intensity between the rows representing our wavelengths (and subtracting off the background level)
pa_8194.append(sum(spectras_dict[sf]['%i' % i][index_8191:index_8199])-(index_8199-index_8191+1)*spectras_dict[sf]['%i' % i][index_8175])
pa_concs.append(conc_dict[sf])
# prepare the figure
p5 =figure()
p5.scatter(x=pa_8194, y=pa_concs)
p5.xaxis.axis_label = "Intensity (Arb Units)"
p5.yaxis.axis_label = "Concentration (ppm)"
# calculate the slope for the best fit line
np_pa_8194 = np.array(pa_8194)
slope_pa_8194 = np.dot(np_pa_8194, pa_concs)/np.dot(np_pa_8194,np_pa_8194)
# and plot the line.
p5.line([0.0, 20000.0], [0.0, slope_pa_8194*20000], color="red")
show(p5)
dict_8194_pa = {'base': [], 'upper': [], 'lower':[], 'concs': []} # out data will go here
#start the html that will form our table later
html_text = "<table><tr><th>Sample</th><th>Conc Ref (ppm)</th> <th>Conc Pred (ppm)</th><th>SD (ppm)</th><th>RSD</th></tr>"
for sf in spectra_filenames:
# go through all the spectra
predicted_conc_values = []
for i in range(1,11):
#finding the peak/maximum intensity between the rows representing our wavelengths (and subtracting off the background point)
#calculate what this is as a concentration and add to the list.
predicted_conc_values.append(slope_pa_8194*(sum(spectras_dict[sf]['%i' % i][index_8191:index_8199])-(index_8199-index_8191+1)*spectras_dict[sf]['%i' % i][index_8175]))
dict_8194_pa['base'].append(np.mean(np.array(predicted_conc_values)))
dict_8194_pa['upper'].append(np.mean(np.array(predicted_conc_values))+np.std(np.array(predicted_conc_values)))
dict_8194_pa['lower'].append(np.mean(np.array(predicted_conc_values))-np.std(np.array(predicted_conc_values)))
dict_8194_pa['concs'].append(conc_dict[sf])
# add this as a row in the table
html_text += "<tr><td>%s</td><td>%.1f</td><td>%.1f</td><td>%.1f</td><td>%.1f</td></tr>" % (sf, conc_dict[sf], np.mean(np.array(predicted_conc_values)), np.std(np.array(predicted_conc_values)), 100.0*np.std(np.array(predicted_conc_values))/np.mean(np.array(predicted_conc_values)))
#finish off the table
html_text += "</table>"
# and display it
display(HTML(html_text))
On this occasion, most of out performance was worse for the peak areas.
cv_concs, rmsep_pa_8194 = calculate_cross_validation(concs=pa_concs, intensities=pa_8194)
# plot the predicted concentrations on the chart
p5.scatter(pa_8194, cv_concs, color='blue')
# and the RMSEP
print(rmsep_pa_8194)
show(p5)
And this is reflected in the RMSEP also.
A number of different data techniques are available which seek to:-
One of the simplest solutions is to divide the intensity of the peak of interest, by another peak from an element which is expected to be constant across all samples. Good candidates are elements in the sample analysis atmosphere (N, O, Ar, Ne), or in the sample binder (if one is used).
Returning to the LIBS equation
$\begin{aligned} I = \frac{C_1 \left ( X Y \right ) \left (g_{k1} A_{ki1} \right) \left (\frac{e^{\frac{-E_{k1}}{kT}}} {U_1 (T)} \right) K_1 (a,x)}{\left( C_2 \left ( X Y \right ) \left (g_{k2} A_{ki2} \right) \left (\frac{e^{\frac{-E_{k2}}{kT}}} {U_2 (T)} \right) K_2 (a,x) \right )^b}=\left [\frac{C_1 \left ( X Y \right ) \left (g_{k1} A_{ki1} \right) K_1 (a,x)}{C^b_2 \left ( X Y \right )^b \left (g_{k2} A_{ki2} \right)^b K^b_2 (a,x)} \right ] \frac{U^b_2 (T) e^{\frac{-E_{k1}}{kT}}}{U_1 (T) e^{\frac{-bE_{k2}}{kT}}} = \left [\frac{C_1 \left ( X Y \right ) \left (g_{k1} A_{ki1} \right) K_1 (a,x)}{C^b_2 \left ( X Y \right )^b \left (g_{k2} A_{ki2} \right)^b K^b_2 (a,x)} \right ] \frac{U^b_2 (T) }{U_1 (T)} e^{\frac{1}{kT} \left (bE_{k2}-E_{k1} \right )} \end{aligned}$
where parameters with 1 subscripts are the element of interest, and those with 2 subscripts are for the ratio element.
Looking at the final term, if $ b E_{k2}-E_{k1} = 0 $, then the major temperature fluctuation cancels out, leaving the much slower variation of the the Partition Function (U).
From knowledge of the samples:
This makes peaks from Oxygen or Nitrogen good candidates for ratio elements.
Heading over to the NIST LIBS Database allows us to look for candidate peaks nearby.
First fill in some basic information on the sample. The values don't have to be all that accurate - we are primarily after data on useful peaks. Note that the Resolution has been boosted to 5000 to enable a bit more separation in the peaks when they are next plotted.
This shows some interesting peaks in the vicinity, of interest:
which are also present in our sample spectra
#restrict the range of the x axis (wavelengths)
p1.x_range = Range1d(700.0, 900.0)
# resturn the range of the y axis to enable better visualisation.
p1.y_range = Range1d(0.0, 8000.0 )
p1.oval(x=745, y=750, width=10, height = 1500, fill_alpha=0.1) # Nitrogen Peaks Oval
p1.oval(x=777, y=2500, width=10, height = 5000, fill_alpha=0.1) # Oxygen Peaks Oval
#and reshow the plot
show(p1)
Looking at the peak data, the upper energy level for our Na peak (at 819.482nm) is 29172.8 $cm^{-1}$
Doing a similar search on candidate N and O peaks yields
Element | Wavelength (nm) | $ E_i cm^{-1} $ | b |
---|---|---|---|
Na | 819.482 | 29172.8 | |
O | 777.194 | 86631.5 | 0.337 |
N | 746.831 | 96750.8 | 0.302 |
Let's start by using the Oxygen peak at 777.194nm to reduce our spectra to spectra variation. We will first consider ratio of peak areas.
import math
#as we have done many times before. Find the closest peaks to our points of interest.
index_7760 = wl_index.get_loc(776.0, method='nearest') # lower limit for the area
index_7790 = wl_index.get_loc(779.0, method='nearest') # upper limit for the area
index_7795 = wl_index.get_loc(779.5, method='nearest') # background candidate.
pa_8194_r777 = [] # peak area data for 819.4nm peak ratioed to 777nm peak will go here
pa_concs_r777 = [] # concentrations to form the plot
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
# peak area for the 819.4nm peak
pa_8194 = sum(spectras_dict[sf]['%i' % i][index_8191:index_8199])-(index_8199-index_8191+1)*spectras_dict[sf]['%i' % i][index_8175]
# peak area for Oxygen peak at 777nm
pa_777 = sum(spectras_dict[sf]['%i' % i][index_7760:index_7790])-(index_7790-index_7760+1)*spectras_dict[sf]['%i' % i][index_7795]
# ratio them, taking into account the difference in energy levels.
pa_8194_r777.append(pa_8194/math.pow(pa_777,(29172.8/86631.5)))
pa_concs_r777.append(conc_dict[sf])
#plot the figure
p7 =figure()
p7.scatter(x=pa_8194_r777, y=pa_concs_r777)
p7.xaxis.axis_label = "Intensity (Arb Units)"
p7.yaxis.axis_label = "Concentration (ppm)"
# calculate the line of best fit
np_pa_8194_r777 = np.array(pa_8194_r777)
slope_pa_8194_r777 = np.dot(np_pa_8194_r777, pa_concs_r777)/np.dot(np_pa_8194_r777,np_pa_8194_r777)
# calculate the RMSEP while we are here
cv_concs, rmsep_pa_8194_r777 = calculate_cross_validation(concs=pa_concs_r777, intensities=pa_8194_r777)
# plot the predicted concentrations on the chart
p7.scatter(pa_8194_r777, cv_concs, color='blue')
# and the RMSEP
print(rmsep_pa_8194_r777)
show(p7)
Thas given us some improvement, does the main Nitrogen peak at 746.8nm work any better?
# cranking the handle now
# find the key locations
index_7460 = wl_index.get_loc(746.0, method='nearest')
index_7480 = wl_index.get_loc(748.0, method='nearest')
index_7485 = wl_index.get_loc(748.5, method='nearest')
pa_8194_r746 = [] #store the ratio peak area data
pa_concs_r746 = []
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
# peak areas for 819.4nm peak
pa_8194 = sum(spectras_dict[sf]['%i' % i][index_8191:index_8199])-(index_8199-index_8191+1)*spectras_dict[sf]['%i' % i][index_8175]
# peak area for Nitrogen peak at 746.8nm
pa_746 = sum(spectras_dict[sf]['%i' % i][index_7460:index_7480])-(index_7480-index_7460+1)*spectras_dict[sf]['%i' % i][index_7485]
#make the ratio, taking into account relative energy levels
pa_8194_r746.append(pa_8194/math.pow(pa_746,(29172.8/96750.8)))
pa_concs_r746.append(conc_dict[sf])
#plot the figure
p8 =figure()
p8.scatter(x=pa_8194_r746, y=pa_concs_r746)
p8.xaxis.axis_label = "Intensity (Arb Units)"
p8.yaxis.axis_label = "Concentration (ppm)"
# calculate the slope for the curve of best fit
np_pa_8194_r746 = np.array(pa_8194_r746)
slope_pa_8194_r746 = np.dot(np_pa_8194_r746, pa_concs_r746)/np.dot(np_pa_8194_r746,np_pa_8194_r746)
# calculate the RMSEP
cv_concs, rmsep_pa_8194_r746 = calculate_cross_validation(concs=pa_concs_r746, intensities=pa_8194_r746)
# plot the predicted concentrations on the chart
p8.scatter(pa_8194_r746, cv_concs, color='blue')
# and the RMSEP
print(rmsep_pa_8194_r746)
show(p8)
Hmmm - Not so good. How about with the ratioed intensities? Oxygen first.
int_8194_r777 = []
int_concs_r777 = []
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
int_8194 = max(spectras_dict[sf]['%i' % i][index_8191:index_8199])-spectras_dict[sf]['%i' % i][index_8175]
int_777 = max(spectras_dict[sf]['%i' % i][index_7760:index_7790])-spectras_dict[sf]['%i' % i][index_7795]
int_8194_r777.append(int_8194/math.pow(int_777,(29172.8/86631.5)))
int_concs_r777.append(conc_dict[sf])
p9 =figure()
p9.scatter(x=int_8194_r777, y=int_concs_r777)
p9.xaxis.axis_label = "Intensity (Arb Units)"
p9.yaxis.axis_label = "Concentration (ppm)"
np_int_8194_r777 = np.array(int_8194_r777)
slope_int_8194_r777 = np.dot(np_int_8194_r777, int_concs_r777)/np.dot(np_int_8194_r777,np_int_8194_r777)
cv_concs, rmsep_int_8194_r777 = calculate_cross_validation(concs=int_concs_r777, intensities=int_8194_r777)
# plot the predicted concentrations on the chart
p9.scatter(int_8194_r777, cv_concs, color='blue')
# and the RMSEP
print(rmsep_int_8194_r777)
show(p9)
A little better. Checking Nitrogen at 746.8nm
int_8194_r746 = []
int_concs_r746 = []
for sf in spectra_filenames:
spectras_dict[sf] = pd.read_csv('.\\files\\OREAS%s.txt' % sf, index_col=None, names=["Wavelengths", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10"])
# go through all the spectra
for i in range(1,11):
int_8194 = max(spectras_dict[sf]['%i' % i][index_8191:index_8199])-spectras_dict[sf]['%i' % i][index_8175]
int_746 = max(spectras_dict[sf]['%i' % i][index_7460:index_7480])-spectras_dict[sf]['%i' % i][index_7485]
int_8194_r746.append(int_8194/math.pow(int_746,(29172.8/96750.8)))
int_concs_r746.append(conc_dict[sf])
p10 =figure()
p10.scatter(x=int_8194_r746, y=int_concs_r746)
p10.xaxis.axis_label = "Intensity (Arb Units)"
p10.yaxis.axis_label = "Concentration (ppm)"
np_int_8194_r746 = np.array(int_8194_r746)
slope_int_8194_r746 = np.dot(np_int_8194_r746, int_concs_r746)/np.dot(np_int_8194_r746,np_int_8194_r746)
cv_concs, rmsep_int_8194_r746 = calculate_cross_validation(concs=int_concs_r746, intensities=int_8194_r746)
# plot the predicted concentrations on the chart
p10.scatter(int_8194_r746, cv_concs, color='blue')
# and the RMSEP
print(rmsep_int_8194_r746)
show(p10)
html_text = "<table><tr><th>Method</th><th>Uncertainty</th><th>Ratio O 777nm</th><th>Ratio N 746nm</th></tr>"
html_text += "<tr><td>Intensity 819.4nm</td><td>%.1f</td><td>%.1f</td><td>%.1f</td></tr>" % (rmsep_8194, rmsep_int_8194_r777, rmsep_int_8194_r746)
html_text += "<tr><td>Peak Area 819.4nm</td><td>%.1f</td><td>%.1f</td><td>%.1f</td></tr>" % (rmsep_pa_8194, rmsep_pa_8194_r777, rmsep_pa_8194_r746)
html_text += "</table>"
display(HTML(html_text))
So, bringing it together, we got the best performance from out calibration curves using the ratio of the intensities of the Na peak at 819.4nm and the Oxygen peak at 777nm.
This highlights several things:-
the best method, depends on what question is being answered
I have not found any way to predict well what approach will provide the best answers - often trying lots of things to see what works best for this particular set of samples is the only way.