This page was generated from docs/examples/DataSet/Dataset_Performance.ipynb. Interactive online version: .
DataSet Performance¶
This notebook shows the trade-off between inserting data into a database row-by-row and as binary blobs. Inserting the data row-by-row means that we have direct access to all the data and may perform queries directly on the values of the data. On the other hand, as we shall see, this is much slower than inserting the data directly as binary blobs.
First, we choose a new location for the database to ensure that we don’t add a bunch of benchmarking data to the default one.
[1]:
import os
import qcodes as qc
cwd = os.getcwd()
qc.config["core"]["db_location"] = os.path.join(cwd, "testing.db")
Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename : /home/runner/.qcodes/logs/command_history.log
Mode : append
Output logging : True
Raw input log : False
Timestamping : True
State : active
Qcodes Logfile : /home/runner/.qcodes/logs/241218-9249-qcodes.log
[2]:
%matplotlib inline
import time
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import qcodes as qc
from qcodes.dataset import (
Measurement,
initialise_or_create_database_at,
load_or_create_experiment,
)
from qcodes.parameters import ManualParameter
[3]:
initialise_or_create_database_at(Path.cwd() / "dataset_performance.db")
exp = load_or_create_experiment(experiment_name="tutorial_exp", sample_name="no sample")
Here, we define a simple function to benchmark the time it takes to insert n points with either numeric or array data type. We will compare both the time used to call add_result
and the time used for the full measurement.
[4]:
def insert_data(paramtype, npoints, nreps=1):
meas = Measurement(exp=exp)
x1 = ManualParameter("x1")
x2 = ManualParameter("x2")
x3 = ManualParameter("x3")
y1 = ManualParameter("y1")
y2 = ManualParameter("y2")
meas.register_parameter(x1, paramtype=paramtype)
meas.register_parameter(x2, paramtype=paramtype)
meas.register_parameter(x3, paramtype=paramtype)
meas.register_parameter(y1, setpoints=[x1, x2, x3], paramtype=paramtype)
meas.register_parameter(y2, setpoints=[x1, x2, x3], paramtype=paramtype)
start = time.perf_counter()
with meas.run() as datasaver:
start_adding = time.perf_counter()
for i in range(nreps):
datasaver.add_result(
(x1, np.random.rand(npoints)),
(x2, np.random.rand(npoints)),
(x3, np.random.rand(npoints)),
(y1, np.random.rand(npoints)),
(y2, np.random.rand(npoints)),
)
stop_adding = time.perf_counter()
run_id = datasaver.run_id
stop = time.perf_counter()
tot_time = stop - start
add_time = stop_adding - start_adding
return tot_time, add_time, run_id
Comparison between numeric/array data and binary blob¶
Case1: Short experiment time¶
[5]:
sizes = [1, 500, 1000, 2000, 3000, 4000, 5000]
t_numeric = []
t_numeric_add = []
t_array = []
t_array_add = []
for size in sizes:
tn, tna, run_id_n = insert_data("numeric", size)
t_numeric.append(tn)
t_numeric_add.append(tna)
ta, taa, run_id_a = insert_data("array", size)
t_array.append(ta)
t_array_add.append(taa)
Starting experimental run with id: 1.
Starting experimental run with id: 2.
Starting experimental run with id: 3.
Starting experimental run with id: 4.
Starting experimental run with id: 5.
Starting experimental run with id: 6.
Starting experimental run with id: 7.
Starting experimental run with id: 8.
Starting experimental run with id: 9.
Starting experimental run with id: 10.
Starting experimental run with id: 11.
Starting experimental run with id: 12.
Starting experimental run with id: 13.
Starting experimental run with id: 14.
[6]:
fig, ax = plt.subplots(1, 1)
ax.plot(sizes, t_numeric, "o-", label="Inserting row-by-row")
ax.plot(sizes, t_numeric_add, "o-", label="Inserting row-by-row: add_result only")
ax.plot(sizes, t_array, "d-", label="Inserting as binary blob")
ax.plot(sizes, t_array_add, "d-", label="Inserting as binary blob: add_result only")
ax.legend()
ax.set_xlabel("Array length")
ax.set_ylabel("Time (s)")
fig.tight_layout()
As shown in the latter figure, the time to setup and and close the experiment is approximately 0.4 sec. In case of small array sizes, the difference between inserting values of data as arrays and inserting them row-by-row is relatively unimportant. At larger array sizes, i.e. above 10000 points, the cost of writing data as individual datapoints starts to become important.
Case2: Long experiment time¶
[7]:
sizes = [1, 500, 1000, 2000, 3000, 4000, 5000]
nreps = 100
t_numeric = []
t_numeric_add = []
t_numeric_run_ids = []
t_array = []
t_array_add = []
t_array_run_ids = []
for size in sizes:
tn, tna, run_id_n = insert_data("numeric", size, nreps=nreps)
t_numeric.append(tn)
t_numeric_add.append(tna)
t_numeric_run_ids.append(run_id_n)
ta, taa, run_id_a = insert_data("array", size, nreps=nreps)
t_array.append(ta)
t_array_add.append(taa)
t_array_run_ids.append(run_id_a)
Starting experimental run with id: 15.
Starting experimental run with id: 16.
Starting experimental run with id: 17.
Starting experimental run with id: 18.
Starting experimental run with id: 19.
Starting experimental run with id: 20.
Starting experimental run with id: 21.
Starting experimental run with id: 22.
Starting experimental run with id: 23.
Starting experimental run with id: 24.
Starting experimental run with id: 25.
Starting experimental run with id: 26.
Starting experimental run with id: 27.
Starting experimental run with id: 28.
[8]:
fig, ax = plt.subplots(1, 1)
ax.plot(sizes, t_numeric, "o-", label="Inserting row-by-row")
ax.plot(sizes, t_numeric_add, "o-", label="Inserting row-by-row: add_result only")
ax.plot(sizes, t_array, "d-", label="Inserting as binary blob")
ax.plot(sizes, t_array_add, "d-", label="Inserting as binary blob: add_result only")
ax.legend()
ax.set_xlabel("Array length")
ax.set_ylabel("Time (s)")
fig.tight_layout()
However, as we increase the length of the experiment, as seen here by repeating the insertion 100 times, we see a big difference between inserting values of the data row-by-row and inserting it as a binary blob.
Loading the data¶
[9]:
from qcodes.dataset import load_by_id
As usual you can load the data by using the load_by_id
function but you will notice that the different storage methods are reflected in shape of the data as it is retrieved.
[10]:
run_id_n = t_numeric_run_ids[0]
run_id_a = t_array_run_ids[0]
[11]:
ds = load_by_id(run_id_n)
ds.get_parameter_data("x1")
[11]:
{'x1': {'x1': array([0.40271564, 0.40271564, 0.41659577, 0.41659577, 0.06822103,
0.06822103, 0.76357502, 0.76357502, 0.6363597 , 0.6363597 ,
0.00261214, 0.00261214, 0.83604598, 0.83604598, 0.79155355,
0.79155355, 0.62535147, 0.62535147, 0.04472983, 0.04472983,
0.99940086, 0.99940086, 0.45888668, 0.45888668, 0.81814751,
0.81814751, 0.46551045, 0.46551045, 0.15003391, 0.15003391,
0.25542101, 0.25542101, 0.49039513, 0.49039513, 0.4767279 ,
0.4767279 , 0.82920744, 0.82920744, 0.09897097, 0.09897097,
0.87093264, 0.87093264, 0.47108689, 0.47108689, 0.8246582 ,
0.8246582 , 0.20407373, 0.20407373, 0.41224884, 0.41224884,
0.28078486, 0.28078486, 0.1441797 , 0.1441797 , 0.5877104 ,
0.5877104 , 0.96754723, 0.96754723, 0.64319998, 0.64319998,
0.54208585, 0.54208585, 0.1311416 , 0.1311416 , 0.51736558,
0.51736558, 0.28813805, 0.28813805, 0.03210308, 0.03210308,
0.5126972 , 0.5126972 , 0.09932107, 0.09932107, 0.20220562,
0.20220562, 0.15650138, 0.15650138, 0.55124328, 0.55124328,
0.38515139, 0.38515139, 0.68147151, 0.68147151, 0.68741393,
0.68741393, 0.21174085, 0.21174085, 0.35036837, 0.35036837,
0.86264343, 0.86264343, 0.2361293 , 0.2361293 , 0.4351933 ,
0.4351933 , 0.79415113, 0.79415113, 0.74527527, 0.74527527,
0.34207761, 0.34207761, 0.13466501, 0.13466501, 0.73466278,
0.73466278, 0.39685941, 0.39685941, 0.08475472, 0.08475472,
0.91294709, 0.91294709, 0.37638025, 0.37638025, 0.17113271,
0.17113271, 0.870925 , 0.870925 , 0.75315748, 0.75315748,
0.5745248 , 0.5745248 , 0.76230061, 0.76230061, 0.74637438,
0.74637438, 0.3480118 , 0.3480118 , 0.05879027, 0.05879027,
0.19185356, 0.19185356, 0.12606556, 0.12606556, 0.96191108,
0.96191108, 0.77736881, 0.77736881, 0.45132901, 0.45132901,
0.47742092, 0.47742092, 0.13661635, 0.13661635, 0.54883725,
0.54883725, 0.49316593, 0.49316593, 0.05852227, 0.05852227,
0.46818828, 0.46818828, 0.2712438 , 0.2712438 , 0.76788648,
0.76788648, 0.50239748, 0.50239748, 0.29853555, 0.29853555,
0.24526863, 0.24526863, 0.32985617, 0.32985617, 0.22905139,
0.22905139, 0.76332593, 0.76332593, 0.28818064, 0.28818064,
0.7923485 , 0.7923485 , 0.55135841, 0.55135841, 0.76497899,
0.76497899, 0.90681896, 0.90681896, 0.92620572, 0.92620572,
0.45251546, 0.45251546, 0.42105833, 0.42105833, 0.61495458,
0.61495458, 0.50454738, 0.50454738, 0.76923638, 0.76923638,
0.00988627, 0.00988627, 0.62919346, 0.62919346, 0.44313314,
0.44313314, 0.90227017, 0.90227017, 0.04824198, 0.04824198])}}
And a dataset stored as binary arrays
[12]:
ds = load_by_id(run_id_a)
ds.get_parameter_data("x1")
[12]:
{'x1': {'x1': array([[0.72895376],
[0.72895376],
[0.68060185],
[0.68060185],
[0.43442867],
[0.43442867],
[0.54332758],
[0.54332758],
[0.86538986],
[0.86538986],
[0.06953796],
[0.06953796],
[0.43445748],
[0.43445748],
[0.55475085],
[0.55475085],
[0.69010069],
[0.69010069],
[0.87082239],
[0.87082239],
[0.02271431],
[0.02271431],
[0.89110015],
[0.89110015],
[0.27742127],
[0.27742127],
[0.69294382],
[0.69294382],
[0.73575682],
[0.73575682],
[0.24291882],
[0.24291882],
[0.66932731],
[0.66932731],
[0.78833757],
[0.78833757],
[0.36380685],
[0.36380685],
[0.79359125],
[0.79359125],
[0.30890785],
[0.30890785],
[0.04615148],
[0.04615148],
[0.42035386],
[0.42035386],
[0.87328686],
[0.87328686],
[0.93796599],
[0.93796599],
[0.57954648],
[0.57954648],
[0.94993989],
[0.94993989],
[0.96092426],
[0.96092426],
[0.78310803],
[0.78310803],
[0.36231866],
[0.36231866],
[0.64703303],
[0.64703303],
[0.29428064],
[0.29428064],
[0.16497396],
[0.16497396],
[0.10198189],
[0.10198189],
[0.20710047],
[0.20710047],
[0.04834997],
[0.04834997],
[0.1123851 ],
[0.1123851 ],
[0.07602293],
[0.07602293],
[0.32640041],
[0.32640041],
[0.18276959],
[0.18276959],
[0.56163498],
[0.56163498],
[0.25438617],
[0.25438617],
[0.40932496],
[0.40932496],
[0.04120858],
[0.04120858],
[0.29982734],
[0.29982734],
[0.08721443],
[0.08721443],
[0.03088958],
[0.03088958],
[0.95870214],
[0.95870214],
[0.711921 ],
[0.711921 ],
[0.33063931],
[0.33063931],
[0.73560213],
[0.73560213],
[0.88640061],
[0.88640061],
[0.67328004],
[0.67328004],
[0.7497796 ],
[0.7497796 ],
[0.30005199],
[0.30005199],
[0.89479098],
[0.89479098],
[0.61880327],
[0.61880327],
[0.62515568],
[0.62515568],
[0.75295076],
[0.75295076],
[0.01585439],
[0.01585439],
[0.05019784],
[0.05019784],
[0.28540685],
[0.28540685],
[0.5300608 ],
[0.5300608 ],
[0.44411494],
[0.44411494],
[0.41813512],
[0.41813512],
[0.21453887],
[0.21453887],
[0.53601126],
[0.53601126],
[0.43751479],
[0.43751479],
[0.31912737],
[0.31912737],
[0.28601253],
[0.28601253],
[0.91730661],
[0.91730661],
[0.18679531],
[0.18679531],
[0.88913146],
[0.88913146],
[0.77073343],
[0.77073343],
[0.18415467],
[0.18415467],
[0.004598 ],
[0.004598 ],
[0.69232709],
[0.69232709],
[0.07123449],
[0.07123449],
[0.15636956],
[0.15636956],
[0.34323534],
[0.34323534],
[0.39934618],
[0.39934618],
[0.453628 ],
[0.453628 ],
[0.21937331],
[0.21937331],
[0.30764469],
[0.30764469],
[0.21091207],
[0.21091207],
[0.11601352],
[0.11601352],
[0.52349717],
[0.52349717],
[0.01481459],
[0.01481459],
[0.28539847],
[0.28539847],
[0.10759285],
[0.10759285],
[0.56946681],
[0.56946681],
[0.64447588],
[0.64447588],
[0.81600761],
[0.81600761],
[0.3960283 ],
[0.3960283 ],
[0.70066399],
[0.70066399],
[0.17799315],
[0.17799315],
[0.10766452],
[0.10766452],
[0.9465318 ],
[0.9465318 ],
[0.69921705],
[0.69921705],
[0.64244237],
[0.64244237]])}}
[ ]: