This page was generated from docs/examples/DataSet/Working-With-Pandas-and-XArray.ipynb. Interactive online version: .
Working with Pandas and XArray¶
This notebook demonstrates how Pandas and XArray can be used to work with the QCoDeS DataSet. It is not meant as a general introduction to Pandas and XArray. We refer to the official documentation for Pandas and XArray for this. This notebook requires that both Pandas and XArray are installed.
Setup¶
First we borrow an example from the measurement notebook to have some data to work with. We split the measurement in two so we can try merging it with Pandas.
[1]:
%matplotlib inline
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import qcodes as qc
from qcodes.dataset import (
Measurement,
initialise_or_create_database_at,
load_or_create_experiment,
)
from qcodes.instrument_drivers.mock_instruments import (
DummyInstrument,
DummyInstrumentWithMeasurement,
)
qc.logger.start_all_logging()
Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename : /home/runner/.qcodes/logs/command_history.log
Mode : append
Output logging : True
Raw input log : False
Timestamping : True
State : active
Qcodes Logfile : /home/runner/.qcodes/logs/241008-19240-qcodes.log
Activating auto-logging. Current session state plus future input saved.
Filename : /home/runner/.qcodes/logs/command_history.log
Mode : append
Output logging : True
Raw input log : False
Timestamping : True
State : active
Qcodes Logfile : /home/runner/.qcodes/logs/241008-19240-qcodes.log
[2]:
# preparatory mocking of physical setup
dac = DummyInstrument("dac", gates=["ch1", "ch2"])
dmm = DummyInstrumentWithMeasurement("dmm", setter_instr=dac)
station = qc.Station(dmm, dac)
[3]:
initialise_or_create_database_at(Path.cwd() / "working_with_pandas")
exp = load_or_create_experiment(
experiment_name="working_with_pandas", sample_name="no sample"
)
[4]:
meas = Measurement(exp)
meas.register_parameter(dac.ch1) # register the first independent parameter
meas.register_parameter(dac.ch2) # register the second independent parameter
meas.register_parameter(
dmm.v2, setpoints=(dac.ch1, dac.ch2)
) # register the dependent one
[4]:
<qcodes.dataset.measurements.Measurement at 0x7fa80dc7cd50>
We then perform a very basic experiment. To be able to demonstrate merging of datasets in Pandas we will perform the measurement in two parts.
[5]:
# run a 2D sweep
with meas.run() as datasaver:
for v1 in np.linspace(-1, 0, 200, endpoint=False):
for v2 in np.linspace(-1, 1, 201):
dac.ch1(v1)
dac.ch2(v2)
val = dmm.v2.get()
datasaver.add_result((dac.ch1, v1), (dac.ch2, v2), (dmm.v2, val))
dataset1 = datasaver.dataset
Starting experimental run with id: 1.
[6]:
# run a 2D sweep
with meas.run() as datasaver:
for v1 in np.linspace(0, 1, 201):
for v2 in np.linspace(-1, 1, 201):
dac.ch1(v1)
dac.ch2(v2)
val = dmm.v2.get()
datasaver.add_result((dac.ch1, v1), (dac.ch2, v2), (dmm.v2, val))
dataset2 = datasaver.dataset
Starting experimental run with id: 2.
Two methods exists for extracting data to pandas dataframes. to_pandas_dataframe
exports all the data from the dataset into a single dataframe. to_pandas_dataframe_dict
returns the data as a dict from measured (dependent) parameters to DataFrames.
Please note that the to_pandas_dataframe
is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_pandas_dataframe_dict
should be used.
[7]:
df1 = dataset1.to_pandas_dataframe()
df2 = dataset2.to_pandas_dataframe()
Working with Pandas¶
Lets first inspect the Pandas DataFrame. Note how both dependent variables are used for the index. Pandas refers to this as a MultiIndex. For visual clarity, we just look at the first N points of the dataset.
[8]:
N = 10
[9]:
df1[:N]
[9]:
dmm_v2 | ||
---|---|---|
dac_ch1 | dac_ch2 | |
-1.0 | -1.00 | -0.000053 |
-0.99 | 0.000569 | |
-0.98 | 0.000906 | |
-0.97 | 0.000569 | |
-0.96 | 0.000180 | |
-0.95 | -0.000655 | |
-0.94 | 0.000481 | |
-0.93 | -0.000762 | |
-0.92 | -0.000214 | |
-0.91 | 0.000138 |
We can also reset the index to return a simpler view where all data points are simply indexed by a running counter. As we shall see below this can be needed in some situations. Note that calling reset_index
leaves the original dataframe untouched.
[10]:
df1.reset_index()[0:N]
[10]:
dac_ch1 | dac_ch2 | dmm_v2 | |
---|---|---|---|
0 | -1.0 | -1.00 | -0.000053 |
1 | -1.0 | -0.99 | 0.000569 |
2 | -1.0 | -0.98 | 0.000906 |
3 | -1.0 | -0.97 | 0.000569 |
4 | -1.0 | -0.96 | 0.000180 |
5 | -1.0 | -0.95 | -0.000655 |
6 | -1.0 | -0.94 | 0.000481 |
7 | -1.0 | -0.93 | -0.000762 |
8 | -1.0 | -0.92 | -0.000214 |
9 | -1.0 | -0.91 | 0.000138 |
Pandas has built-in support for various forms of plotting. This does not, however, support MultiIndex at the moment so we use reset_index
to make the data available for plotting.
[11]:
df1.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")
[11]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
Similarly, for the other dataframe:
[12]:
df2.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")
[12]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
Merging two dataframes with the same labels is fairly simple.
[13]:
df = pd.concat([df1, df2], sort=True)
[14]:
df.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")
[14]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
It is also possible to select a subset of data from the datframe based on the x and y values.
[15]:
df.loc[(slice(-1, -0.95), slice(-1, -0.97)), :]
[15]:
dmm_v2 | ||
---|---|---|
dac_ch1 | dac_ch2 | |
-1.000 | -1.00 | -0.000053 |
-0.99 | 0.000569 | |
-0.98 | 0.000906 | |
-0.97 | 0.000569 | |
-0.995 | -1.00 | -0.000100 |
-0.99 | -0.001114 | |
-0.98 | 0.000690 | |
-0.97 | -0.000332 | |
-0.990 | -1.00 | -0.000180 |
-0.99 | -0.000012 | |
-0.98 | -0.000126 | |
-0.97 | -0.000561 | |
-0.985 | -1.00 | 0.000146 |
-0.99 | 0.000831 | |
-0.98 | -0.000832 | |
-0.97 | 0.000755 | |
-0.980 | -1.00 | -0.000450 |
-0.99 | -0.000243 | |
-0.98 | -0.000357 | |
-0.97 | -0.000511 | |
-0.975 | -1.00 | 0.000393 |
-0.99 | -0.000809 | |
-0.98 | 0.000498 | |
-0.97 | 0.000574 | |
-0.970 | -1.00 | -0.000325 |
-0.99 | 0.000862 | |
-0.98 | -0.000168 | |
-0.97 | -0.000589 | |
-0.965 | -1.00 | -0.000031 |
-0.99 | 0.000372 | |
-0.98 | 0.000243 | |
-0.97 | 0.000854 | |
-0.960 | -1.00 | -0.000221 |
-0.99 | 0.000402 | |
-0.98 | 0.000226 | |
-0.97 | -0.000051 | |
-0.955 | -1.00 | -0.000374 |
-0.99 | 0.000590 | |
-0.98 | -0.000207 | |
-0.97 | -0.000441 | |
-0.950 | -1.00 | 0.000112 |
-0.99 | 0.000102 | |
-0.98 | -0.000152 | |
-0.97 | 0.000584 |
Working with XArray¶
In many cases when working with data on rectangular grids it may be more convenient to export the data to a XArray Dataset or DataArray. This is especially true when working in multi-dimentional parameter space.
Let’s setup and rerun the above measurment with the added dependent parameter dmm.v1
.
[16]:
meas.register_parameter(
dmm.v1, setpoints=(dac.ch1, dac.ch2)
) # register the 2nd dependent parameter
[16]:
<qcodes.dataset.measurements.Measurement at 0x7fa80dc7cd50>
[17]:
# run a 2D sweep
with meas.run() as datasaver:
for v1 in np.linspace(-1, 1, 200):
for v2 in np.linspace(-1, 1, 201):
dac.ch1(v1)
dac.ch2(v2)
val1 = dmm.v1.get()
val2 = dmm.v2.get()
datasaver.add_result(
(dac.ch1, v1), (dac.ch2, v2), (dmm.v1, val1), (dmm.v2, val2)
)
dataset3 = datasaver.dataset
Starting experimental run with id: 3.
The QCoDeS DataSet can be directly converted to a XArray Dataset from the to_xarray_dataset
method. This method returns the data from measured (dependent) parameters to an XArray Dataset. It’s also possible to return a dictionary of XArray DataArray’s if you were only interested in a single parameter using the to_xarray_dataarray
method. For convenience we will access the DataArray’s from XArray’s Dataset directly.
Please note that the to_xarray_dataset
is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_xarray_dataarray
should be used.
[18]:
xaDataSet = dataset3.to_xarray_dataset()
[19]:
xaDataSet
[19]:
<xarray.Dataset> Size: 646kB Dimensions: (dac_ch1: 200, dac_ch2: 201) Coordinates: * dac_ch1 (dac_ch1) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0 * dac_ch2 (dac_ch2) float64 2kB -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0 Data variables: dmm_v1 (dac_ch1, dac_ch2) float64 322kB 6.115 6.092 6.161 ... 3.979 4.128 dmm_v2 (dac_ch1, dac_ch2) float64 322kB 1.535e-05 0.0003686 ... 0.0001548 Attributes: (12/14) ds_name: results sample_name: no sample exp_name: working_with_pandas snapshot: {"station": {"instruments": {"dmm": {"functions... guid: a5e81682-0000-0000-0000-01926aa2c3d9 run_timestamp: 2024-10-08 05:38:24 ... ... captured_counter: 3 run_id: 3 run_description: {"version": 3, "interdependencies": {"paramspec... parent_dataset_links: [] run_timestamp_raw: 1728365904.8637328 completed_timestamp_raw: 1728365915.055613
As mentioned above it’s also possible to work with a XArray DataArray directly from the DataSet. The DataArray can only contain a single dependent variable and can be obtained from the Dataset by indexing using the parameter name.
[20]:
xaDataArray = xaDataSet["dmm_v2"] # or xaDataSet.dmm_v2
[21]:
xaDataArray
[21]:
<xarray.DataArray 'dmm_v2' (dac_ch1: 200, dac_ch2: 201)> Size: 322kB array([[ 1.53500732e-05, 3.68556099e-04, 4.06404085e-04, ..., 3.71183926e-04, -6.03211209e-04, -5.61349124e-04], [-4.07104767e-04, 2.81243839e-04, 1.68712686e-04, ..., -3.15283482e-04, 2.10164392e-04, -9.93432202e-05], [ 2.82486762e-04, 9.11020363e-04, -3.52204743e-04, ..., -1.16687041e-04, -2.45199776e-04, 3.46715064e-04], ..., [-9.33049921e-04, 1.07815623e-03, -1.02522430e-04, ..., 3.33790678e-04, -3.77911965e-04, -1.57766381e-04], [-2.36027992e-04, 4.29625012e-04, -3.57931586e-04, ..., -3.72919953e-04, 5.19352969e-04, -3.45161376e-04], [ 5.93709348e-04, -4.34230680e-04, 2.99514276e-04, ..., 6.05930553e-04, 6.42845294e-05, 1.54771760e-04]]) Coordinates: * dac_ch1 (dac_ch1) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0 * dac_ch2 (dac_ch2) float64 2kB -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0 Attributes: name: dmm_v2 paramtype: numeric label: Gate v2 unit: V inferred_from: [] depends_on: ['dac_ch1', 'dac_ch2'] units: V long_name: Gate v2
[22]:
fig, ax = plt.subplots(2, 2)
xaDataSet.dmm_v2.plot(ax=ax[0, 0])
xaDataSet.dmm_v1.plot(ax=ax[1, 1])
xaDataSet.dmm_v2.mean(dim="dac_ch1").plot(ax=ax[1, 0])
xaDataSet.dmm_v1.mean(dim="dac_ch2").plot(ax=ax[0, 1])
fig.tight_layout()
Above we demonstrated a few ways to index the data from a DataArray. For instance the DataArray can be directly plotted, the extracted mean or a specific row/column can also be plotted.
Working with XArray on non gridded data.¶
Sometimes your data does not fit well on a regular grid. Perhaps you are sweeping 2 parameters at the same time or you are messuring at random points.
[23]:
# run a 2D sweep
with meas.run() as datasaver:
for v1, v2 in zip(np.linspace(-1, 1, 200), np.linspace(-1, 1, 201)):
dac.ch1(v1)
dac.ch2(v2)
val1 = dmm.v1.get()
val2 = dmm.v2.get()
datasaver.add_result(
(dac.ch1, v1), (dac.ch2, v2), (dmm.v1, val1), (dmm.v2, val2)
)
dataset4 = datasaver.dataset
Starting experimental run with id: 4.
[24]:
xaDataSet = dataset4.to_xarray_dataset()
If this is the case QCoDeS will export the data using a XArray MultiIndex.
[25]:
xaDataSet
[25]:
<xarray.Dataset> Size: 8kB Dimensions: (multi_index: 200) Coordinates: * multi_index (multi_index) object 2kB MultiIndex * dac_ch1 (multi_index) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9899 1.0 * dac_ch2 (multi_index) float64 2kB -1.0 -0.99 -0.98 ... 0.97 0.98 0.99 Data variables: dmm_v1 (multi_index) float64 2kB 6.087 6.091 6.027 ... 4.201 3.952 dmm_v2 (multi_index) float64 2kB 0.0004458 -7.323e-05 ... 0.0007926 Attributes: (12/14) ds_name: results sample_name: no sample exp_name: working_with_pandas snapshot: {"station": {"instruments": {"dmm": {"functions... guid: 3c6253e2-0000-0000-0000-01926aa2ef26 run_timestamp: 2024-10-08 05:38:35 ... ... captured_counter: 4 run_id: 4 run_description: {"version": 3, "interdependencies": {"paramspec... parent_dataset_links: [] run_timestamp_raw: 1728365915.946582 completed_timestamp_raw: 1728365915.9899907
Note how the expected coordinates can be seen above along with a coordinate called multi_index
QCoDeS has build in support for exporting such datasets to NetCDF files using cf_xarray to compress and decompress the data. Note however, that if you manually export or import such XArray datasets to / from NetCDF you will be responsible for compressing / decompressing as needed.
[ ]: