NPL Data Fusion Software: Release 1

Contents

INTRODUCTION

This MATLAB script DataFusionSoftware_1A.m and its associated Excel workbook DFSInput_1A.xls form part of a suite of MATLAB software routines and Excel workbooks that can be used to simulate distributed measuring systems. The software can be used to study network design, to compare data fusion algorithms and to evaluate measurement uncertainties associated with aggregated data in networks.

The software can be used to simulate cases in which sensors are intermittently faulty or unreliable, varying levels of noise appear in the sensor outputs, and the sensor outputs possess interdependencies, that is, the behaviour of one class of sensor depends on the quantity being measured by another class of sensor.

The software has been designed in a modular manner. Users select the modules and workbooks that are relevant to the task of interest and then combine them in a manner suitable for their application. In some cases it may be necessary for users to provide a limited amount of additional code of their own.

We have developed a set of examples that demonstrate how the software can be used. For each example we provide all the necessary MATLAB routines and an associated Excel workbook that is used to define the inputs to the MATLAB routines and to record the numerical results from the simulation.

The cases we demonstrate are:

1. A simulation of a small network of pressure and temperature sensors in which the response of each pressure sensor is temperature-dependent and some sensors may be intermittently faulty.

2. A simulation of a dynamic measurement task in which the sensors are modelled as filters and Kalman filtering is employed as the data fusion methodology.

3. A simulation based on a real wireless sensor network that was employed to make a distributed temperature measurement.

For each of the three cases we provide two examples (identified by the suffixes A and B) designed to demonstrate different aspects of the software.

This script and its associated MATLAB routines and Excel workbook implement case 1, example A from the above list. The example is explained in more detail below.

Simulation methodology

Consider a measurement of a physical quantity (the measurand) that varies with time.

In the real world, as opposed to the simulation world, we assume that the measurand is an analogue signal that can be represented by an analogue model, i.e., a model from which one can derive an exact instantaneous value of the signal at any time of interest in a manner that is not limited by sampling and quantization considerations. This also applies to noise that may be associated with the measurand.

In the real world the analogue assumptions made concerning the measurand are also to be applied to sensors and to any noise that the sensing process itself generates.

We have therefore assumed that a measurement embodies the following stages:

1. Specification of an analogue measurand (and any associated noise).

2. Specification of an analogue sensor (and any associated noise). The sensor may also be faulty. The sensor produces a continuous electrical output (e.g., voltage). The manner in which the sensor converts the measurand to a voltage signal is described by a mathematical model that may be the sensor's calibration function.

3. Sampling of the sensor output. The sampling process may be imperfect, and also introduce further noise effects such as jitter.

4. Quantization of the sampled sensor output. The quantization process may itself be imperfect, with uneven quantization intervals, for example.

5. Inversion of the quantized and sampled electrical output of the sensor to produce an estimate of the measurand and its associated standard uncertainty, using the sensor's calibration function. If the measurement is regarded as dynamic, an additional stage may be required here to allow for correction of the sensor output for sensor bandwidth limitations. This correction will typically be performed using a deconvolution algorithm.

6. If we can assume that all the sensors are measuring the same physical quantity, then estimates from the individual sensors (after any necessary correction) may be aggregated using a data fusion algorithm to produce a combined estimate of the measurand and its associated standard uncertainty. At this stage faulty sensors may be identified and excluded from the evaluation of the combined estimate.

The simulation software allows a user to include as many of these stages as are necessary to simulate the problem of interest but it does not require every stage to be included in the simulation. For example, the user may omit noise processes or effects due to imperfect sampling and quantization if these are not of interest for the simulation task in question. The software has been designed in a modular fashion that allows users to add their own modules that simulate particular features of interest. We consider the software to be a toolbox that allows users to build their own solutions.

The examples provided with the software demonstrate key features, though not necessarily all features, of the stages described above.

Simulation options provided by the software

The examples we have developed provide three methods of simulating a measurand. These methods are:

1. Fourier series;

2. Piecewise linear model;

3. Time series explicitly defined in input worksheet.

Two methods of modelling sensors are provided:

1. Represent sensor as a filter;

2. Linear calibration function.

We provide three data fusion algorithms:

1. Chi-squared method;

2. Maximum clique method;

3. Kalman filter.

More information about these options, including references to relevant scientific papers, is provided below in the relevant sections of this documentation.

Framework for the examples

The generic framework for the examples is as follows:

1. Define the measurand in terms of probability distributions for the parameters in a mathematical model for the measurand, and distributions for additive random noise.

2. Generate 'true' parameter values and random noise values from the distributions defined at stage 1.

3. Generate 'true' values for the measurand using the parameter and random noise values at stage 2 for a sampling frequency that exceeds that for any sensor.

4. Define calibration information about the sensors in terms of probability distributions for the parameters in a mathematical model for each sensor, and distributions for additive random noise, as well as the sampling frequency, the number of bits and saturation level for quantization, and the proportion of data dropouts. For a faulty sensor, information about the 'true' behaviour of the sensor (in addition to the behaviour 'known' through calibration) is also provided.

5. Generate 'true' parameter values from the distributions defined at stage 4.

6. Generate a voltage response of each sensor to the measurand generated at stage 3, based on the 'true' parameter values generated at stage 5.

7. Add random noise to the voltage response based on the distributions defined at stage 4.

8. Sample and quantize the voltage response using the sampling frequency, number of bits and saturation level defined at stage 4.

9. Generate measurement results (estimates of the measurand and associated standard uncertainties) for the sensors based on the sampled and quantized voltage responses of stage 8 and the calibration information provided at stage 4.

10. Randomly remove measurement results based on the data dropout information given at stage 4.

11. Apply a data fusion method to the measurement results provided by the sensors at stage 10.

12. Write results to an output file.

13. Generate figures displaying the 'true' values of the measurand provided at stage 3, the measurement results provided by the sensors at stage 10 and the combined estimates of the measurand and associated standard uncertainties from stage 11.

Example 1A: Interdependence of sensors, no faulty sensors, maximum clique data fusion method

This example demonstrates the simulation of a small sensor network in which there are two measurands, pressure and temperature, and two classes of sensor, one class that measures pressure and a second that measures temperature. The response of each pressure sensor is temperature-dependent.

The key features of the example are:

1. The pressure measurand is described by a piecewise linear model, as is the temperature measurand. Both measurands are contaminated by random noise processes. Data fusion is performed using the maximum clique method (for more information about this method see below).

2. The network consists of five pressure sensors and five temperature sensors. The five pressure sensors have nominally identical calibration functions, but the five temperature sensors have different calibration functions.

3. There is no faulty temperature or pressure sensor and there is no missing data.

4. The sampling is performed without quantization for both the pressure and temperature sensors, that is, the simulated analogue measurand is sampled at discrete time intervals, but there are no rounding or reduced resolution effects arising from quantization.

5. Both the pressure and temperature sensors are described by linear calibration functions. To obtain the simulated pressure sensor output, the true temperature is used. However, when inverting the simulated pressure sensor output, it is the temperature measured by the temperature sensors that is employed, rather than the true temperature.

The input parameters for the example are entered into the Excel workbook that is also used to record the numerical results of the simulation. The workbook contains six worksheets. Worksheets Measurand1 and Measurand2 contain the rules (the models of the measurands) for generating in a piecewise linear manner the principal measurand (pressure) and the subsidiary measurand (temperature), respectively. Worksheets Sensor1 and Sensor2 contain the definitions of the sensor behaviour (the sensor response models). At the end of the simulation the data fusion results are written to worksheets Results1, recording results for the primary measurand (pressure) and Results2, recording results for the subsidiary measurand (temperature). Both results worksheets have the following format. The first columm contains the times at which sensor measurements have been made. Subsequent columns contain, for each time value, the value of the measurand of interest, the combined estimate of the measurand obtained from the consistent sensors, the standard uncertainty associated with that estimate, a list of the sensors identified as consistent, the estimate of the measurand obtained from each individual sensor and the standard uncertainty associated with that estimate. The results worksheets also record the values of the seeds that were employed for the random number generator. If the analysis is repeated either with the same or amended input data and no change is made to the worksheet name, the results worksheet is overwritten with the new results. Users who need to ensure traceability of results should rename the results worksheets they wish to save to avoid data being overwritten.

Structure of the MATLAB code

The MATLAB software calls a number of functions. These functions are included as separate files containing comments that explain the calculations being undertaken.

Users should note that the software does not carry out exhaustive checks on values provided in worksheets.

Generating pseudo-random numbers

Pseudo-random numbers are generated using the Wichmann-Hill random number generator [1]. This generator is simple to implement in software and allows the user to generate repeatable random number sequences with a long period.

[1] Generating good pseudo-random numbers

B A Wichmann and I D Hill

Computational Statistics and Data Analysis, 51 (2006):1614-1622

The Wichmann-Hill random number generator has been implemented as a DLL independently of this MATLAB code. The files needed to run the Wichmann-Hill generator (WH_MVS.dll and wh_mvs.h) are included as part of this software release.

Licence agreement

The software is provided with a software licence agreement (REF: MSC/L/11/004) and the use of the software is subject to the terms laid out in that agreement. By running the software, the user accepts the terms of the agreement.

MAIN PROGRAM

Identify the software to the user

  fprintf('\nNPL Data Fusion Software: Release 1');
  fprintf('\nNational Physical Laboratory, UK');
  fprintf('\nDecember 2011 \n');
NPL Data Fusion Software: Release 1
National Physical Laboratory, UK
December 2011 

Add folder 'functions' to MATLAB search path

The MATLAB search path is updated to include the folder that contains the MATLAB functions called by the example scripts. It is assumed that this folder is a subfolder of the current folder and is named 'functions'.

  topfolder = pwd;
  funfolder = [topfolder, '\functions'];
  path(path, funfolder);

Set seed values for random number generation

The user must assign a row vector of four integers each in the interval [1, 2147483647]. To assign seed values based on the current time, the user should set the seed variable to be a vector of four zeros. For traceability purposes the values of the chosen seeds are written to the results worksheets of the Excel workbook.

  seed = [25 05 20 05];
  global g_wh_seed
  if all(seed == 0)
      c = fix(clock);
      g_wh_seed = c(3:6);
  else
      g_wh_seed = seed;
  end
  fprintf('\nSeed vector for random number generation: %u %u %u %u \n', ...
      g_wh_seed);
Seed vector for random number generation: 25 5 20 5 

Assign path to Excel workbook to be processed

The user must assign the path to an Excel workbook containing worksheets that describe all components of the measurement.

The user may choose to edit or add worksheets defining different measurand types and sensor behaviours, and may also introduce new components to the calculation by writing a new MATLAB function that may require a new worksheet format.

If on running the software the Excel workbook is open an error will be obtained and the results will not be written to the selected worksheet.

  workbookpath = 'DFSInput_1A.xls';
  fprintf(['\nWorkbook: %', num2str(length(workbookpath)), 's \n'], ...
      workbookpath);
Workbook: DFSInput_1A.xls 

Simulate principal measurand

In the example in this script, the principal measurand (pressure) is generated as a series of straight-line segments using the routines PiecewiseLinear.m and GeneratePiecewiseLinear.m and the input given in the worksheet Measurand1.

'True' measurand values are stored in the structure Meas1In which includes the fields:

t, the vector containing time values, and

y, the vector containing the 'true' principal measurand values corresponding to the time values in t.

A list and description of the information that must be supplied by the user, together with a mathematical description of how the 'true' principal measurand values are calculated, is available here.

  measurandworksheet = 'Measurand1';
  Meas1In = PiecewiseLinear(workbookpath, measurandworksheet);
Type of measurand simulation: Piecewise linear 
Worksheet: Measurand1 

Simulate subsidiary measurand for sensor dependence

In this example, the subsidiary measurand (temperature) is also generated as a series of straight-line segments using the routines PiecewiseLinear.m and GeneratePiecewiseLinear.m and the input given in the worksheet Measurand2.

'True' measurand values are stored in the structure Meas2In which includes the fields:

t, the vector containing time values, and

y, the vector containing the 'true' subsidiary measurand values corresponding to the time values in t.

A list and description of the information that must be supplied by the user, together with a mathematical description of how the 'true' subsidiary measurand values are calculated, is available here.

  measurandworksheet = 'Measurand2';
  Meas2In = PiecewiseLinear(workbookpath, measurandworksheet);
Type of measurand simulation: Piecewise linear 
Worksheet: Measurand2 

Define sensor behaviour for subsidiary measurand

In this example, the behaviour of each sensor is described by a linear calibration function specified in the worksheet Sensor2. The routine LinearCalibration.m takes information about the calibration function given in Sensor2, and the 'true' values of the subsidiary measurand (temperature) contained in the data structure Meas2In, and returns the data structure Sens2 containing measurement results in the fields:

t, the vector containing time values,

Y, the array containing the estimates of the subsidiary measurand obtained using the different sensors, corresponding to the time values in t, and

uY, the array containing the standard uncertainties associated with the estimates of the subsidiary measurand.

A list and description of the information that must be supplied by the user, together with a mathematical description of how sensor estimates of the subsidiary measurand are calculated, is available here.

  sensorworksheet = 'Sensor2';
  Sens2 = LinearCalibration(workbookpath, sensorworksheet, Meas2In);
Type of sensor behaviour: Linear calibration 
Worksheet: Sensor2 

Define data fusion method for subsidiary measurand

Combined estimates of the subsidiary measurand and their associated standard uncertainties are determined using the maximum clique data fusion method [2]. This method assumes uncorrelated measured values. The method calculates the Moffat distance between each pair of measured values, to test the pair-wise consistency of the results. The largest consistent subset of measurement results is then determined, either by an exhaustive search, or through a linear approximation using overlapping intervals. The standard uncertainty associated with any measured value that is not contained in the largest consistent subset is multiplied by its largest Moffat distance to any of the values in the consistent subset. This creates consistency between all the measured values. A weighted mean is calculated using all measured values and the associated augmented uncertainties. In cases in which there are multiple consistent subsets of the same size, the intersection of those subsets is chosen as the largest consistent subset. It is possible for this intersection to contain only a single sensor. The remaining sensor estimates are still used to form a combined estimate of the measurand and its uncertainty, but the individual uncertainties for each sensor are increased. If there is no largest consistent subset at a particular time point, the value from the previous time point is used as an estimate.

[2] The application of self-validation to wireless sensor networks

M A Collett, M G Cox, M Duta, T J Esward, P M Harris, M P Henry

Measurement Science and Technology, 19 (2008), 125201

The routine MaxCliqueDataFusion.m takes the measurement results in the data structure Sens2, and returns the data structure Meas2Out containing measurement results in the fields:

t, the vector containing time values,

y, the vector containing the estimates of the subsidiary measurand corresponding to the time values in t,

uy, the vector containing the standard uncertainties associated with the estimates of the subsidiary measurand, and

indices, the structure containing the indices of consistent sensors.

  Meas2Out = MaxCliqueDataFusion(Sens2);
Data fusion method: Maximum clique 

Write results for subsidiary measurand to Excel workbook

Results are written to the worksheet named Results2.

  resultsworksheet = 'Results2';
  WriteResults(workbookpath, resultsworksheet, Meas2In, Sens2, Meas2Out, ...
      seed);
Results worksheet: Results2 

Define sensor behaviour for principal measurand

In this example, the behaviour of each sensor is described by a linear calibration function specified in the worksheet Sensor1. The routine LinearCalibrationInterdependenceCov.m takes information about the calibration function given in Sensor1, the 'true' values of the principal measurand (pressure) and subsidiary measurand (temperature) contained in the data structures Meas1In and Meas2In, respectively, and the results for the subsidiary measurand (temperature) contained in the data structure Meas2Out, and returns the data structure Sens1 containing measurement results in the fields:

t, the vector containing time values,

Y, the array containing the estimates of the principal measurand obtained using the different sensors, corresponding to the time values in t, and

uY, the array containing the standard uncertainties associated with the estimates of the principal measurand.

A list and description of the information that must be supplied by the user, together with a mathematical description of how sensor estimates of the principal measurand are calculated, is available here.

  sensorworksheet = 'Sensor1';
  Sens1 = LinearCalibrationInterdependenceCov(workbookpath, ...
      sensorworksheet, Meas1In, Meas2In, Meas2Out);
Type of sensor behaviour: Linear calibration with interdependence 
Worksheet: Sensor1 

Define data fusion method for principal measurand

As for the subsidiary measurand, combined estimates of the principal measurand and their associated standard uncertainties are determined using the maximum clique method.

The routine MaxCliqueDataFusion.m takes the measurement results in the data structure Sens1, and returns the data structure Meas1Out containing measurement results in the fields:

t, the vector containing time values,

y, the vector containing the estimates of the principal measurand corresponding to the time values in t,

uy, the vector containing the standard uncertainties associated with the estimates of the principal measurand, and

indices, the structure containing the indices of consistent sensors.

  Meas1Out = MaxCliqueDataFusion(Sens1);
Data fusion method: Maximum clique 

Write results for principal measurand to Excel workbook

Results are written to the worksheet named Results1.

  resultsworksheet = 'Results1';
  WriteResults(workbookpath, resultsworksheet, Meas1In, Sens1, ...
      Meas1Out, seed);
Results worksheet: Results1 

Generate plots of results

The plots for example 1A are set out below.

1. The first figure shows the 'true' principal measurand.

2. The second figure shows the 'true' subsidiary measurand.

3. The third figure shows the estimates of the subsidiary measurand obtained using the sensors, both the estimates and the estimates plus their associated standard uncertainties.

4. The fourth figure shows the 'true' subsidiary measurand and the estimates of the subsidiary measurand that arise from aggregating the estimates of the subsidiary measurand provided by the individual sensors shown in the previous figure. In this case the figure shows the estimates and coverage intervals corresponding to (approximately) a 95 % coverage probability. In this example we have employed the maximum clique method of data fusion.

5. The fifth figure plots the standard uncertainties associated with the estimates of the subsidiary measurand as a function of time.

6. The sixth figure indicates the times at which individual subsidiary measurand sensors fall outside the largest consistent subset. In this example no faulty sensors were defined so that departures from consistency arise from the random processes built into the simulation.

7. The final four figures relate to the principal measurand and follow exactly the same pattern as figures 3 to 6 for the subsidiary measurand.

  PlotMeasIn(Meas1In)
  PlotMeasIn(Meas2In)
  PlotMeasOut(Meas2In, Sens2, Meas2Out)
  PlotMeasOut(Meas1In, Sens1, Meas1Out)

Remove folder 'functions' from MATLAB search path

The MATLAB search path is updated to remove the folder that contains the MATLAB functions called by the example scripts.

  rmpath(funfolder);

Acknowledgements

The work described here was supported by the National Measurement Office of the UK Department of Business, Innovation and Skills as part of its NMS Mathematics and Modelling for Metrology programme.

Program identifier

Data Fusion Software (Release 1)

Authors: T J Esward, P M Harris, C E Matthews, H D Minh and I M Smith

National Physical Laboratory, UK

Date: December 2011

(c) Queen's Printer and Controller of HMSO, 2011