1.5. Data Fusion

1.5.1. Introduction

This example shows how to use pSeven Core to train an approximation model using two data sets. One of them is considered to be precise, the other is coarse data.

Start by importing the Generic Tool for Data Fusion (GTDF) module:

from da.p7core import gtdf

1.5.2. Model Training

Generate the example samples:

import numpy as np

def getTrainData(sample_size, dim, noisy):
  x = np.random.uniform(low=0, high=1, size=sample_size * dim).reshape((sample_size, dim))
  f= np.sum(x*x, axis=1).reshape((sample_size, 1))
  if noisy:
    noise = np.random.normal(loc=0.0, scale=1.0, size=sample_size).reshape((sample_size, 1))
    f += noise
  return x, f

dim = 3
sample_size_hf = 10
sample_size_lf = sample_size_hf * 10

# prepare the high fidelity sample
x_hf, f_hf = getTrainData(sample_size_hf, dim, False)
# prepare the low fidelity sample
x_lf, f_lf = getTrainData(sample_size_lf, dim, True)

Create a gtdf.Builder instance and train the model:

from da.p7core import gtdf
from da.p7core import loggers

# create the model builder
builder = gtdf.Builder()

# set logger; StreamLogger outputs to sys.stdout by default
builder.set_logger(loggers.StreamLogger())

# train the model
model = builder.build(x_hf, f_hf, x_lf, f_lf, options={'GTDF/loglevel': 'Info'})

1.5.3. Using the Model

Print a readable model summary:

print(model)

Save the model and reload it from disk:

model.save('model.gtdf')
loaded_model = gtdf.Model('model.gtdf')

Validate the model by checking its errors on a est sample with reference data:

# calculate errors
def calc_mean_error(model, x_sample, f_sample):
  errors = f_sample - model.calc(x_sample)
  return np.std(errors)

sample_size_test = 1000
x_hf, f_hf = getTrainData(sample_size_test, dim, False)
x_lf, f_lf = getTrainData(sample_size_test, dim, True)
print('-'*80)
print('RMS errors:')
print('HF RMS error: %.15g' % calc_mean_error(model, x_hf, f_hf))
print('LF RMS error: %.15g' % calc_mean_error(model, x_lf, f_lf))

1.5.4. Full Example Code

import numpy as np

from da.p7core import gtdf
from da.p7core import loggers

def getTrainData(sample_size, dim, noisy):
  x = np.random.uniform(low=0, high=1, size=sample_size * dim).reshape((sample_size, dim))
  f = np.sum(x*x, axis=1).reshape((sample_size, 1))
  if noisy:
    noise = np.random.normal(loc=0.0, scale=1.0, size=sample_size).reshape((sample_size, 1))
    f += noise
  return x, f

dim = 3
sample_size_hf = 10
sample_size_lf = sample_size_hf * 10

x_hf, f_hf = getTrainData(sample_size_hf, dim, False)
x_lf, f_lf = getTrainData(sample_size_lf, dim, True)

builder = gtdf.Builder()
builder.set_logger(loggers.StreamLogger())

model = builder.build(x_hf, f_hf, x_lf, f_lf, options={'GTDF/loglevel': 'Info'})

print(model)

model.save('model.gtdf')
loaded_model = gtdf.Model('model.gtdf')


def calc_mean_error(model, x_sample, f_sample):
  errors = f_sample-model.calc(x_sample)
  return np.std(errors)

sample_size_test = 1000
x_hf, f_hf = getTrainData(sample_size_test, dim, False)
x_lf, f_lf = getTrainData(sample_size_test, dim, True)
print('-'*80)
print('RMS errors:')
print('HF RMS error: %.15g' % calc_mean_error(model, x_hf, f_hf))
print('LF RMS error: %.15g' % calc_mean_error(model, x_lf, f_lf))