# Dataset Benchmarking

This notebook is a behind-the-scenes benchmarking notebook, mainly for use by developers. The recommended way for users to interact with the dataset is via the `Measurement` object and its associated context manager. See the corresponding notebook for a comprehensive toturial on how to use those.

In [1]:
%matplotlib inline
from pathlib import Path

import numpy as np

import qcodes as qc
from qcodes.dataset import (
 ParamSpec,
 initialise_or_create_database_at,
 load_or_create_experiment,
 new_data_set,
)

In [2]:
qc.config.core.db_location

'~/experiments.db'

In [3]:
initialise_or_create_database_at(Path.cwd() / "benchmarking.db")

## Setup

In [4]:
exp = load_or_create_experiment("benchmarking", sample_name="the sample is a lie")
exp

benchmarking#the sample is a lie#3@C:\Users\jenielse\experiments.db
-------------------------------------------------------------------
11-benchmark_data-1--0
12-doubledata-2-x,y,z-1000
13-singledata-3-y,z,x-100
14-zerodata-4-y,z,x-1
15-array1Ddata-5-y,z,x-800
16-array0Ddata-6-y,z,x-81
17-many_data-7-x,y,z-810000
68-benchmark_data-8--0
69-doubledata-9-y,z,x-1000
70-singledata-10-y,z,x-100
71-zerodata-11-y,z,x-1
72-array1Ddata-12-y,z,x-800
73-array0Ddata-13-y,z,x-81
74-many_data-14-y,z,x-810000
127-benchmark_data-15--0
128-doubledata-16-y,z,x-1000
129-singledata-17-y,z,x-100
130-zerodata-18-y,z,x-1
131-array1Ddata-19-y,z,x-800
132-array0Ddata-20-y,z,x-811
133-many_data-21-y,z,x-810000
170-benchmark_data-22--0
171-doubledata-23-z,y,x-1000
172-singledata-24-z,y,x-100
173-zerodata-25-z,y,x-1
174-array1Ddata-26-z,y,x-800
175-array0Ddata-27-z,y,x-811
176-many_data-28-z,y,x-810000
229-benchmark_data-29--0
230-doubledata-30-x,z,y-1000
231-singledata-31-y,z,x-100
232-zerodata-32-y,z,x-1
233-arr

Now we can create a dataset. Note two things:

 - if we don't specfiy a exp_id, but we have an experiment in the experiment container the dataset will go into that one.
 - dataset can be created from the experiment object
 

In [5]:
dataSet = new_data_set("benchmark_data", exp_id=exp.exp_id)
exp

benchmarking#the sample is a lie#3@C:\Users\jenielse\experiments.db
-------------------------------------------------------------------
11-benchmark_data-1--0
12-doubledata-2-x,y,z-1000
13-singledata-3-y,z,x-100
14-zerodata-4-y,z,x-1
15-array1Ddata-5-y,z,x-800
16-array0Ddata-6-y,z,x-81
17-many_data-7-x,y,z-810000
68-benchmark_data-8--0
69-doubledata-9-y,z,x-1000
70-singledata-10-y,z,x-100
71-zerodata-11-y,z,x-1
72-array1Ddata-12-y,z,x-800
73-array0Ddata-13-y,z,x-81
74-many_data-14-y,z,x-810000
127-benchmark_data-15--0
128-doubledata-16-y,z,x-1000
129-singledata-17-y,z,x-100
130-zerodata-18-y,z,x-1
131-array1Ddata-19-y,z,x-800
132-array0Ddata-20-y,z,x-811
133-many_data-21-y,z,x-810000
170-benchmark_data-22--0
171-doubledata-23-z,y,x-1000
172-singledata-24-z,y,x-100
173-zerodata-25-z,y,x-1
174-array1Ddata-26-z,y,x-800
175-array0Ddata-27-z,y,x-811
176-many_data-28-z,y,x-810000
229-benchmark_data-29--0
230-doubledata-30-x,z,y-1000
231-singledata-31-y,z,x-100
232-zerodata-32-y,z,x-1
233-arr

In this benchmark we will assueme that we are doing a 2D loop and investigate the performance implications of writing to the dataset

In [6]:
x_shape = 100
y_shape = 100

## Baseline: Generate data

In [7]:
%%time
for x in range(x_shape):
 for y in range(y_shape):
 z = np.random.random_sample(1)

Wall time: 29.6 ms


and store in memory

In [8]:
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))

In [9]:
%%time
for x in range(x_shape):
 for y in range(y_shape):
 x_data[x,y] = x
 y_data[x,y] = y
 z_data[x,y] = np.random.random_sample()

Wall time: 10 ms


## Add to dataset inside double loop

In [10]:
double_dataset = new_data_set("doubledata", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "numeric"),
 ParamSpec("y", "numeric"),
 ParamSpec('z', "numeric")])
double_dataset.mark_started()

Note that this is so slow that we are only doing a 10th of the computation

In [11]:
%%time
for x in range(x_shape//10):
 for y in range(y_shape):
 double_dataset.add_results([{"x": x, 'y': y, 'z': np.random.random_sample()}])

Wall time: 3.99 s


## Add the data in outer loop and store as np array

In [12]:
single_dataset = new_data_set("singledata", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "array"),
 ParamSpec("y", "array"),
 ParamSpec('z', "array")])
single_dataset.mark_started()
x_data = np.zeros(y_shape)
y_data = np.zeros(y_shape)
z_data = np.zeros(y_shape)

In [13]:
%%time
for x in range(x_shape):
 for y in range(y_shape):
 x_data[y] = x
 y_data[y] = y
 z_data[y] = np.random.random_sample(1)
 single_dataset.add_results([{"x": x_data, 'y': y_data, 'z': z_data}])

Wall time: 521 ms


## Save once after loop

In [14]:
zero_dataset = new_data_set("zerodata", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "array"),
 ParamSpec("y", "array"),
 ParamSpec('z', "array")])
zero_dataset.mark_started()
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))

In [15]:
%%time
for x in range(x_shape):
 for y in range(y_shape):
 x_data[x,y] = x
 y_data[x,y] = y
 z_data[x,y] = np.random.random_sample(1)
zero_dataset.add_results([{'x':x_data, 'y':y_data, 'z':z_data}])

Wall time: 40.3 ms


1

## Array parameter

In [16]:
array1D_dataset = new_data_set("array1Ddata", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "array"),
 ParamSpec("y", "array"),
 ParamSpec('z', "array")])
array1D_dataset.mark_started()
y_setpoints = np.arange(y_shape)

In [17]:
%%timeit
for x in range(x_shape):
 x_data[x,:] = x
 array1D_dataset.add_results([{'x':x_data[x,:], 'y':y_setpoints, 'z':np.random.random_sample(y_shape)}])

497 ms ± 61.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [18]:
x_data = np.zeros((x_shape, y_shape))
y_data = np.zeros((x_shape, y_shape))
z_data = np.zeros((x_shape, y_shape))
y_setpoints = np.arange(y_shape)

In [19]:
array0D_dataset = new_data_set("array0Ddata", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "array"),
 ParamSpec("y", "array"),
 ParamSpec('z', "array")])
array0D_dataset.mark_started()

In [20]:
%%timeit
for x in range(x_shape):
 x_data[x,:] = x
 y_data[x,:] = y_setpoints
 z_data[x,:] = np.random.random_sample(y_shape)
array0D_dataset.add_results([{'x':x_data, 'y':y_data, 'z':z_data}])

10.3 ms ± 444 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Insert many

In [21]:
data = []
for i in range(100):
 for j in range(100):
 data.append({'x': i, 'y':j, 'z':np.random.random_sample()})

In [22]:
many_Data = new_data_set("many_data", exp_id=exp.exp_id,
 specs=[ParamSpec("x", "numeric"),
 ParamSpec("y", "numeric"),
 ParamSpec("z", "numeric")])
many_Data.mark_started()

In [23]:
%%timeit
many_Data.add_results(data)

43.2 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
