This page was generated from docs/examples/DataSet/Working-With-Pandas-and-XArray.ipynb. Interactive online version: Binder badge.

Working with Pandas and XArray

This notebook demonstrates how Pandas and XArray can be used to work with the QCoDeS DataSet. It is not meant as a general introduction to Pandas and XArray. We refer to the official documentation for Pandas and XArray for this. This notebook requires that both Pandas and XArray are installed.

Setup

First we borrow an example from the measurement notebook to have some data to work with. We split the measurement in two so we can try merging it with Pandas.

[1]:
%matplotlib inline
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import qcodes as qc
from qcodes.dataset import (
    Measurement,
    initialise_or_create_database_at,
    load_or_create_experiment,
)
from qcodes.instrument_drivers.mock_instruments import (
    DummyInstrument,
    DummyInstrumentWithMeasurement,
)

qc.logger.start_all_logging()
Logging hadn't been started.
Activating auto-logging. Current session state plus future input saved.
Filename       : /home/runner/.qcodes/logs/command_history.log
Mode           : append
Output logging : True
Raw input log  : False
Timestamping   : True
State          : active
Qcodes Logfile : /home/runner/.qcodes/logs/240507-17500-qcodes.log
Activating auto-logging. Current session state plus future input saved.
Filename       : /home/runner/.qcodes/logs/command_history.log
Mode           : append
Output logging : True
Raw input log  : False
Timestamping   : True
State          : active
Qcodes Logfile : /home/runner/.qcodes/logs/240507-17500-qcodes.log
[2]:
# preparatory mocking of physical setup
dac = DummyInstrument('dac', gates=['ch1', 'ch2'])
dmm = DummyInstrumentWithMeasurement('dmm', setter_instr=dac)
station = qc.Station(dmm, dac)
[3]:
initialise_or_create_database_at(Path.cwd() / "working_with_pandas")
exp = load_or_create_experiment(experiment_name='working_with_pandas',
                          sample_name="no sample")
[4]:
meas = Measurement(exp)
meas.register_parameter(dac.ch1)  # register the first independent parameter
meas.register_parameter(dac.ch2)  # register the second independent parameter
meas.register_parameter(dmm.v2, setpoints=(dac.ch1, dac.ch2))  # register the dependent one
[4]:
<qcodes.dataset.measurements.Measurement at 0x7ffa894b0d10>

We then perform a very basic experiment. To be able to demonstrate merging of datasets in Pandas we will perform the measurement in two parts.

[5]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(-1, 0, 200, endpoint=False):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v2, val))

dataset1 = datasaver.dataset
Starting experimental run with id: 1.
[6]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(0, 1, 201):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v2, val))

dataset2 = datasaver.dataset
Starting experimental run with id: 2.

Two methods exists for extracting data to pandas dataframes. to_pandas_dataframe exports all the data from the dataset into a single dataframe. to_pandas_dataframe_dict returns the data as a dict from measured (dependent) parameters to DataFrames.

Please note that the to_pandas_dataframe is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_pandas_dataframe_dict should be used.

[7]:
df1 = dataset1.to_pandas_dataframe()
df2 = dataset2.to_pandas_dataframe()

Working with Pandas

Lets first inspect the Pandas DataFrame. Note how both dependent variables are used for the index. Pandas refers to this as a MultiIndex. For visual clarity, we just look at the first N points of the dataset.

[8]:
N = 10
[9]:
df1[:N]
[9]:
dmm_v2
dac_ch1 dac_ch2
-1.0 -1.00 0.000611
-0.99 -0.000298
-0.98 0.000318
-0.97 0.000419
-0.96 0.000760
-0.95 -0.000013
-0.94 -0.000916
-0.93 0.000100
-0.92 -0.000149
-0.91 0.000819

We can also reset the index to return a simpler view where all data points are simply indexed by a running counter. As we shall see below this can be needed in some situations. Note that calling reset_index leaves the original dataframe untouched.

[10]:
df1.reset_index()[0:N]
[10]:
dac_ch1 dac_ch2 dmm_v2
0 -1.0 -1.00 0.000611
1 -1.0 -0.99 -0.000298
2 -1.0 -0.98 0.000318
3 -1.0 -0.97 0.000419
4 -1.0 -0.96 0.000760
5 -1.0 -0.95 -0.000013
6 -1.0 -0.94 -0.000916
7 -1.0 -0.93 0.000100
8 -1.0 -0.92 -0.000149
9 -1.0 -0.91 0.000819

Pandas has built-in support for various forms of plotting. This does not, however, support MultiIndex at the moment so we use reset_index to make the data available for plotting.

[11]:
df1.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[11]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_20_1.png

Similarly, for the other dataframe:

[12]:
df2.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[12]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_22_1.png

Merging two dataframes with the same labels is fairly simple.

[13]:
df = pd.concat([df1, df2], sort=True)
[14]:
df.reset_index().plot.scatter('dac_ch1', 'dac_ch2', c='dmm_v2')
[14]:
<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_25_1.png

It is also possible to select a subset of data from the datframe based on the x and y values.

[15]:
df.loc[(slice(-1, -0.95), slice(-1, -0.97)), :]
[15]:
dmm_v2
dac_ch1 dac_ch2
-1.000 -1.00 0.000611
-0.99 -0.000298
-0.98 0.000318
-0.97 0.000419
-0.995 -1.00 -0.000112
-0.99 -0.000048
-0.98 -0.000760
-0.97 0.000483
-0.990 -1.00 -0.000401
-0.99 -0.000636
-0.98 -0.000424
-0.97 -0.000193
-0.985 -1.00 -0.000489
-0.99 0.000140
-0.98 -0.000474
-0.97 0.000121
-0.980 -1.00 -0.000607
-0.99 0.000241
-0.98 0.000058
-0.97 -0.000030
-0.975 -1.00 -0.000430
-0.99 -0.000193
-0.98 -0.000904
-0.97 0.000555
-0.970 -1.00 -0.000283
-0.99 -0.000231
-0.98 0.000265
-0.97 -0.000423
-0.965 -1.00 0.000705
-0.99 0.000261
-0.98 0.000752
-0.97 -0.000287
-0.960 -1.00 -0.000163
-0.99 0.000118
-0.98 0.000524
-0.97 -0.000307
-0.955 -1.00 -0.000420
-0.99 0.000897
-0.98 -0.000290
-0.97 -0.000282
-0.950 -1.00 0.000105
-0.99 0.000510
-0.98 0.000088
-0.97 -0.000094

Working with XArray

In many cases when working with data on rectangular grids it may be more convenient to export the data to a XArray Dataset or DataArray. This is especially true when working in multi-dimentional parameter space.

Let’s setup and rerun the above measurment with the added dependent parameter dmm.v1.

[16]:
meas.register_parameter(dmm.v1, setpoints=(dac.ch1, dac.ch2))  # register the 2nd dependent parameter
[16]:
<qcodes.dataset.measurements.Measurement at 0x7ffa894b0d10>
[17]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1 in np.linspace(-1, 1, 200):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val1 = dmm.v1.get()
            val2 = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1),
                                 (dac.ch2, v2),
                                 (dmm.v1, val1),
                                 (dmm.v2, val2))

dataset3 = datasaver.dataset
Starting experimental run with id: 3.

The QCoDeS DataSet can be directly converted to a XArray Dataset from the to_xarray_dataset method. This method returns the data from measured (dependent) parameters to an XArray Dataset. It’s also possible to return a dictionary of XArray DataArray’s if you were only interested in a single parameter using the to_xarray_dataarray method. For convenience we will access the DataArray’s from XArray’s Dataset directly.

Please note that the to_xarray_dataset is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_xarray_dataarray should be used.

[18]:
xaDataSet = dataset3.to_xarray_dataset()
[19]:
xaDataSet
[19]:
<xarray.Dataset> Size: 646kB
Dimensions:  (dac_ch1: 200, dac_ch2: 201)
Coordinates:
  * dac_ch1  (dac_ch1) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0
  * dac_ch2  (dac_ch2) float64 2kB -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0
Data variables:
    dmm_v1   (dac_ch1, dac_ch2) float64 322kB 6.036 6.125 6.048 ... 3.861 4.083
    dmm_v2   (dac_ch1, dac_ch2) float64 322kB -0.0008854 ... -0.0009521
Attributes: (12/14)
    ds_name:                  results
    sample_name:              no sample
    exp_name:                 working_with_pandas
    snapshot:                 {"station": {"instruments": {"dmm": {"functions...
    guid:                     8fd448fa-0000-0000-0000-018f518cdb72
    run_timestamp:            2024-05-07 05:35:36
    ...                       ...
    captured_counter:         3
    run_id:                   3
    run_description:          {"version": 3, "interdependencies": {"paramspec...
    parent_dataset_links:     []
    run_timestamp_raw:        1715060136.8240752
    completed_timestamp_raw:  1715060146.7657385

As mentioned above it’s also possible to work with a XArray DataArray directly from the DataSet. The DataArray can only contain a single dependent variable and can be obtained from the Dataset by indexing using the parameter name.

[20]:
xaDataArray = xaDataSet['dmm_v2']# or xaDataSet.dmm_v2
[21]:
xaDataArray
[21]:
<xarray.DataArray 'dmm_v2' (dac_ch1: 200, dac_ch2: 201)> Size: 322kB
array([[-8.85358521e-04, -4.91221970e-04,  9.39595381e-04, ...,
        -1.99391104e-04,  1.04849408e-03, -6.33206420e-04],
       [ 5.39772675e-04,  3.91548363e-04,  3.16232566e-04, ...,
         6.65109760e-04,  8.71744099e-04, -7.60037976e-04],
       [ 3.87852349e-04, -7.12612822e-04,  1.86726466e-04, ...,
        -4.68745645e-04,  3.80408988e-04,  8.82363738e-04],
       ...,
       [ 2.40891865e-04,  1.55830225e-04,  1.02600668e-03, ...,
        -9.63259245e-04,  6.35229098e-04,  3.85014488e-04],
       [-8.16968190e-05,  1.72025247e-04, -3.79593971e-04, ...,
        -3.00493527e-05,  1.06187612e-03,  3.45835173e-04],
       [-3.68533735e-04,  4.11005109e-04,  9.73129072e-04, ...,
        -4.79655937e-04, -2.53018685e-04, -9.52134863e-04]])
Coordinates:
  * dac_ch1  (dac_ch1) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9799 0.9899 1.0
  * dac_ch2  (dac_ch2) float64 2kB -1.0 -0.99 -0.98 -0.97 ... 0.97 0.98 0.99 1.0
Attributes:
    name:           dmm_v2
    paramtype:      numeric
    label:          Gate v2
    unit:           V
    inferred_from:  []
    depends_on:     ['dac_ch1', 'dac_ch2']
    units:          V
    long_name:      Gate v2
[22]:
fig, ax = plt.subplots(2,2)
xaDataSet.dmm_v2.plot(ax=ax[0,0])
xaDataSet.dmm_v1.plot(ax=ax[1,1])
xaDataSet.dmm_v2.mean(dim='dac_ch1').plot(ax=ax[1,0])
xaDataSet.dmm_v1.mean(dim='dac_ch2').plot(ax=ax[0,1])
fig.tight_layout()
../../_images/examples_DataSet_Working-With-Pandas-and-XArray_38_0.png

Above we demonstrated a few ways to index the data from a DataArray. For instance the DataArray can be directly plotted, the extracted mean or a specific row/column can also be plotted.

Working with XArray on non gridded data.

Sometimes your data does not fit well on a regular grid. Perhaps you are sweeping 2 parameters at the same time or you are messuring at random points.

[23]:
# run a 2D sweep

with meas.run() as datasaver:

    for v1, v2 in zip(np.linspace(-1, 1, 200), np.linspace(-1, 1, 201)):

        dac.ch1(v1)
        dac.ch2(v2)
        val1 = dmm.v1.get()
        val2 = dmm.v2.get()
        datasaver.add_result((dac.ch1, v1),
                             (dac.ch2, v2),
                             (dmm.v1, val1),
                             (dmm.v2, val2))

dataset4 = datasaver.dataset
Starting experimental run with id: 4.
[24]:
xaDataSet = dataset4.to_xarray_dataset()

If this is the case QCoDeS will export the data using a XArray MultiIndex.

[25]:
xaDataSet
[25]:
<xarray.Dataset> Size: 8kB
Dimensions:      (multi_index: 200)
Coordinates:
  * multi_index  (multi_index) object 2kB MultiIndex
  * dac_ch1      (multi_index) float64 2kB -1.0 -0.9899 -0.9799 ... 0.9899 1.0
  * dac_ch2      (multi_index) float64 2kB -1.0 -0.99 -0.98 ... 0.97 0.98 0.99
Data variables:
    dmm_v1       (multi_index) float64 2kB 6.048 5.953 6.035 ... 3.93 4.052
    dmm_v2       (multi_index) float64 2kB -0.0004164 -0.0005366 ... 0.0007054
Attributes: (12/14)
    ds_name:                  results
    sample_name:              no sample
    exp_name:                 working_with_pandas
    snapshot:                 {"station": {"instruments": {"dmm": {"functions...
    guid:                     e251f393-0000-0000-0000-018f518d0635
    run_timestamp:            2024-05-07 05:35:47
    ...                       ...
    captured_counter:         4
    run_id:                   4
    run_description:          {"version": 3, "interdependencies": {"paramspec...
    parent_dataset_links:     []
    run_timestamp_raw:        1715060147.7701128
    completed_timestamp_raw:  1715060147.8179524

Note how the expected coordinates can be seen above along with a coordinate called multi_index

QCoDeS has build in support for exporting such datasets to NetCDF files using cf_xarray to compress and decompress the data. Note however, that if you manually export or import such XArray datasets to / from NetCDF you will be responsible for compressing / decompressing as needed.

[ ]: