NPL Data Fusion Software: Release 1
Contents
- INTRODUCTION
- Simulation methodology
- Simulation options provided by the software
- Framework for the examples
- Example 3B: Distributed temperature sensing with time lag correction, missing data, maximum clique data fusion method
- Structure of the MATLAB code
- Generating pseudo-random numbers
- Licence agreement
- MAIN PROGRAM
- Identify the software to the user
- Add folder 'functions' to MATLAB search path
- Set seed values for random number generation
- Assign path to Excel workbook to be processed
- Define measurand type
- Define sensor behaviour
- Define data fusion method
- Write results to Excel workbook
- Generate plots of results
- Remove folder 'functions' from MATLAB search path
- Acknowledgements
- Program identifier
INTRODUCTION
This MATLAB script DataFusionSoftware_3B.m and its associated Excel workbook DFSInput_3B.xls form part of a suite of MATLAB software routines and Excel workbooks that can be used to simulate distributed measuring systems. The software can be used to study network design, to compare data fusion algorithms and to evaluate measurement uncertainties associated with aggregated data in networks.
The software can be used to simulate cases in which sensors are intermittently faulty or unreliable, varying levels of noise appear in the sensor outputs, and the sensor outputs possess interdependencies, that is, the behaviour of one class of sensor depends on the quantity being measured by another class of sensor.
The software has been designed in a modular manner. Users select the modules and workbooks that are relevant to the task of interest and then combine them in a manner suitable for their application. In some cases it may be necessary for users to provide a limited amount of additional code of their own.
We have developed a set of examples that demonstrate how the software can be used. For each example we provide all the necessary MATLAB routines and an associated Excel workbook that is used to define the inputs to the MATLAB routines and to record the numerical results from the simulation.
The cases we demonstrate are:
1. A simulation of a small network of pressure and temperature sensors in which the response of each pressure sensor is temperature-dependent and some sensors may be intermittently faulty.
2. A simulation of a dynamic measurement task in which the sensors are modelled as filters and Kalman filtering is employed as the data fusion methodology.
3. A simulation based on a real wireless sensor network that was employed to make a distributed temperature measurement.
For each of the three cases we provide two examples (identified by the suffixes A and B) designed to demonstrate different aspects of the software.
This script and its associated MATLAB routines and Excel workbook implement case 3, example B from the above list. The example is explained in more detail below.
Simulation methodology
Consider a measurement of a physical quantity (the measurand) that varies with time.
In the real world, as opposed to the simulation world, we assume that the measurand is an analogue signal that can be represented by an analogue model, i.e., a model from which one can derive an exact instantaneous value of the signal at any time of interest in a manner that is not limited by sampling and quantization considerations. This also applies to noise that may be associated with the measurand.
In the real world the analogue assumptions made concerning the measurand are also to be applied to sensors and to any noise that the sensing process itself generates.
We have therefore assumed that a measurement embodies the following stages:
1. Specification of an analogue measurand (and any associated noise).
2. Specification of an analogue sensor (and any associated noise). The sensor may also be faulty. The sensor produces a continuous electrical output (e.g., voltage). The manner in which the sensor converts the measurand to a voltage signal is described by a mathematical model that may be the sensor's calibration function.
3. Sampling of the sensor output. The sampling process may be imperfect, and also introduce further noise effects such as jitter.
4. Quantization of the sampled sensor output. The quantization process may itself be imperfect, with uneven quantization intervals, for example.
5. Inversion of the quantized and sampled electrical output of the sensor to produce an estimate of the measurand and its associated standard uncertainty, using the sensor's calibration function. If the measurement is regarded as dynamic, an additional stage may be required here to allow for correction of the sensor output for sensor bandwidth limitations. This correction will typically be performed using a deconvolution algorithm.
6. If we can assume that all the sensors are measuring the same physical quantity, then estimates from the individual sensors (after any necessary correction) may be aggregated using a data fusion algorithm to produce a combined estimate of the measurand and its associated standard uncertainty. At this stage faulty sensors may be identified and excluded from the evaluation of the combined estimate.
The simulation software allows a user to include as many of these stages as are necessary to simulate the problem of interest but it does not require every stage to be included in the simulation. For example, the user may omit noise processes or effects due to imperfect sampling and quantization if these are not of interest for the simulation task in question. The software has been designed in a modular fashion that allows users to add their own modules that simulate particular features of interest. We consider the software to be a toolbox that allows users to build their own solutions.
The examples provided with the software demonstrate key features, though not necessarily all features, of the stages described above.
Simulation options provided by the software
The examples we have developed provide three methods of simulating a measurand. These methods are:
1. Fourier series;
2. Piecewise linear model;
3. Time series explicitly defined in input worksheet.
Two methods of modelling sensors are provided:
1. Represent sensor as a filter;
2. Linear calibration function.
We provide three data fusion algorithms:
1. Chi-squared method;
2. Maximum clique method;
3. Kalman filter.
More information about these options, including references to relevant scientific papers, is provided below in the relevant sections of this documentation.
Framework for the examples
The generic framework for the examples is as follows:
1. Define the measurand in terms of probability distributions for the parameters in a mathematical model for the measurand, and distributions for additive random noise.
2. Generate 'true' parameter values and random noise values from the distributions defined at stage 1.
3. Generate 'true' values for the measurand using the parameter and random noise values at stage 2 for a sampling frequency that exceeds that for any sensor.
4. Define calibration information about the sensors in terms of probability distributions for the parameters in a mathematical model for each sensor, and distributions for additive random noise, as well as the sampling frequency, the number of bits and saturation level for quantization, and the proportion of data dropouts. For a faulty sensor, information about the 'true' behaviour of the sensor (in addition to the behaviour 'known' through calibration) is also provided.
5. Generate 'true' parameter values from the distributions defined at stage 4.
6. Generate a voltage response of each sensor to the measurand generated at stage 3, based on the 'true' parameter values generated at stage 5.
7. Add random noise to the voltage response based on the distributions defined at stage 4.
8. Sample and quantize the voltage response using the sampling frequency, number of bits and saturation level defined at stage 4.
9. Generate measurement results (estimates of the measurand and associated standard uncertainties) for the sensors based on the sampled and quantized voltage responses of stage 8 and the calibration information provided at stage 4.
10. Randomly remove measurement results based on the data dropout information given at stage 4.
11. Apply a data fusion method to the measurement results provided by the sensors at stage 10.
12. Write results to an output file.
13. Generate figures displaying the 'true' values of the measurand provided at stage 3, the measurement results provided by the sensors at stage 10 and the combined estimates of the measurand and associated standard uncertainties from stage 11.
Example 3B: Distributed temperature sensing with time lag correction, missing data, maximum clique data fusion method
This example is based on a real measurement problem in which a network of five wireless sensors was used to measure the temperature inside an oven.
The key features of the example are:
1. The measurand is a temperature time history to which the five wireless sensors were exposed simultaneously. The time history was measured independently of the wireless sensors and it is this independent time history that is used to define the measurand for the sensors. The measurand is known only imperfectly owing to noise and quantization effects associated with the temperature sensor. The first worksheet of the Excel workbook is simply a listing of the time history of the input temperature at one second intervals.
2. Five wireless sensors were employed to measure temperature, each being calibrated independently by the experimentalist whose network we are studying. In each case the calibration produced a linear calibration function that also incorporated a time lag.
3. There is no faulty sensor but there are missing data packets (five consecutive missing data points). This simulates a typical failure mode of the network, as communication failures always lead to the loss of complete data packets of five consecutive data points.
4. The sampled signal is quantized.
5. A time lag is included in the calculation of a sensor's voltage response to allow simulation of delayed reponses to temperature changes, such as those that arise from thermal lag. Compensation is made for time lag when calculating a sensor's estimate of the measurand. The uncertainty associated with the time lag is not included in the uncertainty evaluation.
6. Data fusion is performed using the maximum clique method (for more information about this method see below).
The input parameters for the example are entered into the Excel workbook that is also used to record the numerical results of the simulation. The workbook contains three worksheets. Worksheet Measurand contains the user-generated time history of values that is to be used as the measurand for this example. Worksheet Sensor contains the definitions of the sensor behaviour (the sensor response models). At the end of the simulation the data fusion results are written to worksheet Results that has the following format. The first columm contains the times at which sensor measurements have been made. Subsequent columns contain, for each time value, the value of the measurand, the combined estimate of the measurand obtained from the consistent sensors, the standard uncertainty associated with that estimate, a list of the sensors identified as consistent, the estimate of the measurand obtained from each individual sensor and the standard uncertainty associated with that estimate. The results worksheet also records the values of the seeds that were employed for the random number generator. If the analysis is repeated either with the same or amended input data and no change is made to the worksheet name, the results worksheet is overwritten with the new results. Users who need to ensure traceability of results should rename the results worksheet they wish to save to avoid data being overwritten.
Structure of the MATLAB code
The MATLAB software calls a number of functions. These functions are included as separate files containing comments that explain the calculations being undertaken.
Users should note that the software does not carry out exhaustive checks on values provided in worksheets.
Generating pseudo-random numbers
Pseudo-random numbers are generated using the Wichmann-Hill random number generator [1]. This generator is simple to implement in software and allows the user to generate repeatable random number sequences with a long period.
[1] Generating good pseudo-random numbers
B A Wichmann and I D Hill
Computational Statistics and Data Analysis, 51 (2006):1614-1622
The Wichmann-Hill random number generator has been implemented as a DLL independently of this MATLAB code. The files needed to run the Wichmann-Hill generator (WH_MVS.dll and wh_mvs.h) are included as part of this software release.
Licence agreement
The software is provided with a software licence agreement (REF: MSC/L/11/004) and the use of the software is subject to the terms laid out in that agreement. By running the software, the user accepts the terms of the agreement.
MAIN PROGRAM
Identify the software to the user
fprintf('\nNPL Data Fusion Software: Release 1'); fprintf('\nNational Physical Laboratory, UK'); fprintf('\nDecember 2011 \n');
NPL Data Fusion Software: Release 1 National Physical Laboratory, UK December 2011
Add folder 'functions' to MATLAB search path
The MATLAB search path is updated to include the folder that contains the MATLAB functions called by the example scripts. It is assumed that this folder is a subfolder of the current folder and is named 'functions'.
topfolder = pwd;
funfolder = [topfolder, '\functions'];
path(path, funfolder);
Set seed values for random number generation
The user must assign a row vector of four integers each in the interval [1, 2147483647]. To assign seed values based on the current time, the user should set the seed variable to be a vector of four zeros. For traceability purposes the values of the chosen seeds are written to the results worksheet of the Excel workbook.
seed = [25 05 20 05]; global g_wh_seed if all(seed == 0) c = fix(clock); g_wh_seed = c(3:6); else g_wh_seed = seed; end fprintf('\nSeed vector for random number generation: %u %u %u %u \n', ... g_wh_seed);
Seed vector for random number generation: 25 5 20 5
Assign path to Excel workbook to be processed
The user must assign the path to an Excel workbook containing worksheets that describe all components of the measurement.
The user may choose to edit or add worksheets defining different measurand types and sensor behaviours, and may also introduce new components to the calculation by writing a new MATLAB function that may require a new worksheet format.
If on running the software the Excel workbook is open an error will be obtained and the results will not be written to the selected worksheet.
workbookpath = 'DFSInput_3B.xls'; fprintf(['\nWorkbook: %', num2str(length(workbookpath)), 's \n'], ... workbookpath);
Workbook: DFSInput_3B.xls
Define measurand type
In the example in this script, the measurand is provided as a time history of values in the worksheet Measurand and read in by the routine TimeSeries.m.
'True' measurand values are stored in the structure MeasIn which includes the fields:
t, the vector containing time values, and
y, the vector containing the 'true' measurand values corresponding to the time values in t.
A list and description of the information that must be supplied by the user is available here.
measurandworksheet = 'Measurand';
MeasIn = TimeSeries(workbookpath, measurandworksheet);
Type of measurand simulation: Time series Worksheet: Measurand
Define sensor behaviour
In this example, the behaviour of each sensor is described by a linear calibration function specified in the worksheet Sensor. The routine LinearCalibrationLagComp.m takes information about the calibration function given in Sensor, and the 'true' values of the measurand contained in the data structure MeasIn, and returns the data structure Sens containing measurement results in the fields:
t, the vector containing time values,
Y, the array containing the estimates of the measurand obtained using the different sensors, corresponding to the time values in t, and
uY, the array containing the standard uncertainties associated with the estimates of the measurand.
A list and description of the information that must be supplied by the user, together with a mathematical description of how sensor estimates of the measurand are calculated, is available here.
sensorworksheet = 'Sensor';
Sens = LinearCalibrationLagComp(workbookpath, sensorworksheet, MeasIn);
Type of sensor behaviour: Linear calibration with compensated lag Worksheet: Sensor
Define data fusion method
Combined estimates of the measurand and their associated standard uncertainties are determined using the maximum clique data fusion method [2]. This method assumes uncorrelated measured values. The method calculates the Moffat distance between each pair of measured values, to test the pair-wise consistency of the results. The largest consistent subset of measurement results is then determined, either by an exhaustive search, or through a linear approximation using overlapping intervals. The standard uncertainty associated with any measured value that is not contained in the largest consistent subset is multiplied by its largest Moffat distance to any of the values in the consistent subset. This creates consistency between all the measured values. A weighted mean is calculated using all measured values and the associated augmented uncertainties. In cases in which there are multiple consistent subsets of the same size, the intersection of those subsets is chosen as the largest consistent subset. It is possible for this intersection to contain only a single sensor. The remaining sensor estimates are still used to form a combined estimate of the measurand and its uncertainty, but the individual uncertainties for each sensor are increased. If there is no largest consistent subset at a particular time point, the value from the previous time point is used as an estimate.
[2] The application of self-validation to wireless sensor networks
M A Collett, M G Cox, M Duta, T J Esward, P M Harris, M P Henry
Measurement Science and Technology, 19 (2008), 125201
The routine MaxCliqueDataFusion.m takes the measurement results in the data structure Sens, and returns the data structure MeasOut containing measurement results in the fields:
t, the vector containing time values,
y, the vector containing the estimates of the measurand corresponding to the time values in t,
uy, the vector containing the standard uncertainties associated with the estimates of the measurand, and
indices, the structure containing the indices of consistent sensors.
MeasOut = MaxCliqueDataFusion(Sens);
Data fusion method: Maximum clique WARNING: No consistent subset found at t = 8000s Warning will not show again for any subsequent times with no consistent subset
Write results to Excel workbook
Results are written to the worksheet named Results.
resultsworksheet = 'Results'; WriteResults(workbookpath, resultsworksheet, MeasIn, Sens, MeasOut, ... seed);
Results worksheet: Results
Generate plots of results
The plots for example 3B are set out below.
1. The first figure plots the time history of temperature values employed as a user-generated measurand. Note that the small temperature oscillations that are observed arise from small cyclical variations in the heat output of the oven in which the measurements were made as the oven control thermostat system seeks to maintain the target step change in temperature.
2. The second figure shows the estimates of the measurand obtained using the sensors, both the estimates and the estimates plus their associated standard uncertainties. The effect of quantization is evident.
3. The third figure shows the 'true' measurand and the estimates of the measurand that arise from aggregating the estimates of the measurand provided by the individual sensors shown in the previous figure. In this case the figure shows the estimates and coverage intervals corresponding to (approximately) a 95 % coverage probability. In this example we have employed the maximum clique method of data fusion.
4. The fourth figure plots the standard uncertainties associated with the estimates of the measurand as a function of time.
5. The fifth figure indicates the times at which individual measurand sensors fall outside the largest consistent subset. For the final time point, the presence of time lags for all sensors means that no estimate of the measurand can be made and therefore no consistent subset can be established. A warning is displayed in the MATLAB command window. In this example, apart from the inconsistency at the final time point, there are at least two consistent sensors at all other times.
PlotMeasIn(MeasIn) PlotMeasOut(MeasIn, Sens, MeasOut)





Remove folder 'functions' from MATLAB search path
The MATLAB search path is updated to remove the folder that contains the MATLAB functions called by the example scripts.
rmpath(funfolder);
Acknowledgements
The work described here was supported by the National Measurement Office of the UK Department of Business, Innovation and Skills as part of its NMS Mathematics and Modelling for Metrology programme.
Program identifier
Data Fusion Software (Release 1)
Authors: T J Esward, P M Harris, C E Matthews, H D Minh and I M Smith
National Physical Laboratory, UK
Date: December 2011
(c) Queen's Printer and Controller of HMSO, 2011