This page was generated from docs/examples/DataSet/Working-With-Pandas-and-XArray.ipynb. Interactive online version: . View in nbviewer.

Working with Pandas and XArray¶

This notebook demonstrates how Pandas and XArray can be used to work with the QCoDeS DataSet. It is not meant as a general introduction to Pandas and XArray. We refer to the official documentation for Pandas and XArray for this. This notebook requires that both Pandas and XArray are installed.

Setup¶

First we borrow an example from the measurement notebook to have some data to work with. We split the measurement in two so we can try merging it with Pandas.

[1]:

%matplotlib inline
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

import qcodes as qc
from qcodes.dataset import (
    Measurement,
    initialise_or_create_database_at,
    load_or_create_experiment,
)
from qcodes.instrument_drivers.mock_instruments import (
    DummyInstrument,
    DummyInstrumentWithMeasurement,
)

[2]:

# preparatory mocking of physical setup
dac = DummyInstrument("dac", gates=["ch1", "ch2"])
dmm = DummyInstrumentWithMeasurement("dmm", setter_instr=dac)
station = qc.Station(dmm, dac)

[3]:

initialise_or_create_database_at(
    Path.cwd().parent / "example_output" / "working_with_pandas.db"
)
exp = load_or_create_experiment(
    experiment_name="working_with_pandas", sample_name="no sample"
)

[4]:

meas = Measurement(exp)
meas.register_parameter(dac.ch1)  # register the first independent parameter
meas.register_parameter(dac.ch2)  # register the second independent parameter
meas.register_parameter(
    dmm.v2, setpoints=(dac.ch1, dac.ch2)
)  # register the dependent one

[4]:

<qcodes.dataset.measurements.Measurement at 0x7f64c7e2d790>

We then perform a very basic experiment. To be able to demonstrate merging of datasets in Pandas we will perform the measurement in two parts.

[5]:

# run a 2D sweep

with meas.run() as datasaver:
    for v1 in np.linspace(-1, 0, 200, endpoint=False):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1), (dac.ch2, v2), (dmm.v2, val))

dataset1 = datasaver.dataset

Starting experimental run with id: 1.

[6]:

# run a 2D sweep

with meas.run() as datasaver:
    for v1 in np.linspace(0, 1, 201):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val = dmm.v2.get()
            datasaver.add_result((dac.ch1, v1), (dac.ch2, v2), (dmm.v2, val))

dataset2 = datasaver.dataset

Starting experimental run with id: 2.

Two methods exists for extracting data to pandas dataframes. to_pandas_dataframe exports all the data from the dataset into a single dataframe. to_pandas_dataframe_dict returns the data as a dict from measured (dependent) parameters to DataFrames.

Please note that the to_pandas_dataframe is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_pandas_dataframe_dict should be used.

[7]:

df1 = dataset1.to_pandas_dataframe()
df2 = dataset2.to_pandas_dataframe()

Working with Pandas¶

Lets first inspect the Pandas DataFrame. Note how both dependent variables are used for the index. Pandas refers to this as a MultiIndex. For visual clarity, we just look at the first N points of the dataset.

[8]:

N = 10

[9]:

df1[:N]

[9]:

		dmm_v2
dac_ch1	dac_ch2
-1.0	-1.00	0.000132
	-0.99	0.000366
	-0.98	0.000026
	-0.97	0.000037
	-0.96	-0.000275
	-0.95	-0.000523
	-0.94	0.000227
	-0.93	0.000198
	-0.92	-0.000247
	-0.91	-0.000676

We can also reset the index to return a simpler view where all data points are simply indexed by a running counter. As we shall see below this can be needed in some situations. Note that calling reset_index leaves the original dataframe untouched.

[10]:

df1.reset_index()[0:N]

[10]:

	dac_ch1	dac_ch2	dmm_v2
0	-1.0	-1.00	0.000132
1	-1.0	-0.99	0.000366
2	-1.0	-0.98	0.000026
3	-1.0	-0.97	0.000037
4	-1.0	-0.96	-0.000275
5	-1.0	-0.95	-0.000523
6	-1.0	-0.94	0.000227
7	-1.0	-0.93	0.000198
8	-1.0	-0.92	-0.000247
9	-1.0	-0.91	-0.000676

Pandas has built-in support for various forms of plotting. This does not, however, support MultiIndex at the moment so we use reset_index to make the data available for plotting.

[11]:

df1.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")

[11]:

<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>

../../_images/examples_DataSet_Working-With-Pandas-and-XArray_20_1.png

Similarly, for the other dataframe:

[12]:

df2.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")

[12]:

<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>

../../_images/examples_DataSet_Working-With-Pandas-and-XArray_22_1.png

Merging two dataframes with the same labels is fairly simple.

[13]:

df = pd.concat([df1, df2], sort=True)

[14]:

df.reset_index().plot.scatter("dac_ch1", "dac_ch2", c="dmm_v2")

[14]:

<Axes: xlabel='dac_ch1', ylabel='dac_ch2'>

../../_images/examples_DataSet_Working-With-Pandas-and-XArray_25_1.png

It is also possible to select a subset of data from the datframe based on the x and y values.

[15]:

df.loc[(slice(-1, -0.95), slice(-1, -0.97)), :]

[15]:

		dmm_v2
dac_ch1	dac_ch2
-1.000	-1.00	0.000132
	-0.99	0.000366
	-0.98	0.000026
	-0.97	0.000037
-0.995	-1.00	-0.000159
	-0.99	-0.000611
	-0.98	0.000634
	-0.97	0.000894
-0.990	-1.00	-0.000090
	-0.99	0.000373
	-0.98	0.000216
	-0.97	-0.000527
-0.985	-1.00	0.001068
	-0.99	-0.000001
	-0.98	-0.000250
	-0.97	0.000116
-0.980	-1.00	-0.000263
	-0.99	-0.000052
	-0.98	-0.000293
	-0.97	0.000526
-0.975	-1.00	-0.000162
	-0.99	-0.000477
	-0.98	0.000047
	-0.97	0.000383
-0.970	-1.00	0.000345
	-0.99	-0.000443
	-0.98	-0.000043
	-0.97	-0.000489
-0.965	-1.00	-0.000131
	-0.99	-0.000414
	-0.98	0.000132
	-0.97	0.000736
-0.960	-1.00	-0.000442
	-0.99	0.000091
	-0.98	-0.000141
	-0.97	0.000487
-0.955	-1.00	0.000553
	-0.99	0.000647
	-0.98	-0.000101
	-0.97	-0.000328
-0.950	-1.00	0.000143
	-0.99	0.001076
	-0.98	0.000808
	-0.97	0.000266

Working with XArray¶

In many cases when working with data on rectangular grids it may be more convenient to export the data to a XArray Dataset or DataArray. This is especially true when working in multi-dimentional parameter space.

Let’s setup and rerun the above measurment with the added dependent parameter dmm.v1.

[16]:

meas.register_parameter(
    dmm.v1, setpoints=(dac.ch1, dac.ch2)
)  # register the 2nd dependent parameter

[16]:

<qcodes.dataset.measurements.Measurement at 0x7f64c7e2d790>

[17]:

# run a 2D sweep

with meas.run() as datasaver:
    for v1 in np.linspace(-1, 1, 200):
        for v2 in np.linspace(-1, 1, 201):
            dac.ch1(v1)
            dac.ch2(v2)
            val1 = dmm.v1.get()
            val2 = dmm.v2.get()
            datasaver.add_result(
                (dac.ch1, v1), (dac.ch2, v2), (dmm.v1, val1), (dmm.v2, val2)
            )

dataset3 = datasaver.dataset

Starting experimental run with id: 3.

The QCoDeS DataSet can be directly converted to a XArray Dataset from the to_xarray_dataset method. This method returns the data from measured (dependent) parameters to an XArray Dataset. It’s also possible to return a dictionary of XArray DataArray’s if you were only interested in a single parameter using the to_xarray_dataarray method. For convenience we will access the DataArray’s from XArray’s Dataset directly.

Please note that the to_xarray_dataset is only intended to be used when all dependent parameters have the same setpoint. If this is not the case for the DataSet then to_xarray_dataarray should be used.

[18]:

xaDataSet = dataset3.to_xarray_dataset()

[19]:

xaDataSet

As mentioned above it’s also possible to work with a XArray DataArray directly from the DataSet. The DataArray can only contain a single dependent variable and can be obtained from the Dataset by indexing using the parameter name.

[20]:

xaDataArray = xaDataSet["dmm_v2"]  # or xaDataSet.dmm_v2

[21]:

xaDataArray

[22]:

fig, ax = plt.subplots(2, 2)
xaDataSet.dmm_v2.plot(ax=ax[0, 0])
xaDataSet.dmm_v1.plot(ax=ax[1, 1])
xaDataSet.dmm_v2.mean(dim="dac_ch1").plot(ax=ax[1, 0])
xaDataSet.dmm_v1.mean(dim="dac_ch2").plot(ax=ax[0, 1])
fig.tight_layout()

../../_images/examples_DataSet_Working-With-Pandas-and-XArray_38_0.png

Above we demonstrated a few ways to index the data from a DataArray. For instance the DataArray can be directly plotted, the extracted mean or a specific row/column can also be plotted.

Working with Pandas and XArray¶

Setup¶

Working with Pandas¶

Working with XArray¶

Working with XArray on non gridded data.¶