Interdependent Parameters¶
Introduction¶
At the heart of a measurement lies the concept of dependent and independent variables. A physics experiment consists in its core of varying something and observing how something else changes depending on that first varied thing. For the QCoDeS dataset to be a faithful representation of actual physics experiments, the dataset must preserve this notion of dependencies. In this small note, we present some thoughts on this subject and present the current state of the dataset.
Setting the general stage¶
In the general case, an experiment looks as follows. We seek to study how \(B\) depends on \(A\). Unfortunately, we can neither set \(A\) nor measure \(B\). What we can do, however, is to vary \(n\) parameters \(x_1,x_2,\ldots,x_n\) (\(\boldsymbol{x}\) for brevity) and make the assumption that \(A=A(\boldsymbol{x})\). Similarly, we have access to measure \(m\) other parameters, \(y_1,y_2,\ldots,y_m\) (\(\boldsymbol{y}\) for brevity) and assume that \(B=B(\boldsymbol{y})\). It generally holds that each \(y_i\) depends on \(\boldsymbol{x}\), although many such dependencies may be trivial [1]. Given \(\boldsymbol{x}\) and \(\boldsymbol{y}\) (i.e. a laboratory) it is by no means an easy exercise to find a relation \(A(B)\) for which the above assumptions hold. That search is indeed the whole exercise of experimental physics, but as far as QCoDeS and the dataset is concerned, we must take for granted that \(A\) and \(B\) exist and satisfy the assumptions.
Good scientific practice and measurement intentions¶
In this section, we assume \(A\) and \(B\) to be scalars. We treat the general case in the next section.
In a measurement of \(B\) versus \(A\), it seems tempting to simply only write down the values of \(A\) and \(B\), declare that \(A\) is abscissa for \(B\), and make a nice plot. Responsible scientific conduct principles however urge us to write down everything we did, which in terms of data saving amounts to also storing \(\boldsymbol{x}\) and \(\boldsymbol{y}\). At the same time, we would like the dataset to reflect the intention of measurement, meaning what the measurement is supposed to be about, namely that it measures \(B\) versus \(A\). Currently, this is handled by the dataset by declaring that \(B\) depends on \(A\) whereas \(A\) is inferred from \(\boldsymbol{x}\) and \(B\) is inferred from \(\boldsymbol{y}\). In code, we set up the measurement like
meas = Measurement()
meas.register_parameter(x1)
meas.register_parameter(x2)
meas.register_parameter(x3) # and so on
meas.register_parameter(y1)
meas.register_parameter(y2)
meas.register_parameter(y3) # etc
meas.register_parameter(A, inferred_from(x1, x2, x3))
meas.register_parameter(B, depends_on=(A,),
inferred_from=(y1, y2, y3))
This is shown graphically in Fig. 2.
The default plotter included in the dataset will understand the dependencies and plot \(B\) versus \(A\).
Higher dimension¶
In the previous section, \(A\) was to assumed to be a scalar. In the general case, the true independent variables \(\boldsymbol{x}\) can be grouped together in \(k\) different variables, \(A_1,\ldots,A_k\) that represent the intention of the measurement. An example would be a heatmap plotting a demodulated signal as a function of two gate voltage axes. To describe a measurement of \(B\) as \(A_1\) and \(A_2\) are varied, we set up the measurement like
meas = Measurement()
meas.register_parameter(x1)
meas.register_parameter(x2) # and so on
meas.register_parameter(y1)
meas.register_parameter(y2) # etc
meas.register_parameter(A1, inferred_from(x1, x2))
meas.register_parameter(A2, inferred_from(x1, x2))
meas.register_parameter(B, depends_on=(A1, A2),
inferred_from=(y1, y2))
Graphically:
It may of course very well be that e.g. \(A_1=x_1\) in which case there is no point of having inferred parameter for \(A_1\).
Is that really necessary?¶
It should be clear that the inferred_from
notion is a kind of
metadata. It describes a relation between the raw values that the
experimentalist can control and the desired outcome of an experiment. It
is not required by the dataset to have any inferred variables, but
we stress that it is unscientific to throw away raw measurement data.
Whatever raw values are recorded should thus be saved along with the
“interesting” parameter values, and the inferred_from
tagging is
simply a way of declaring what is derived from where.
In a perfect world, an auxiliary laboratory notebook contains all the information needed to exactly reproduce the experiment, and the dataset needs only store the numerical values of parameters and nothing else. In a sort of pragmatic recognition of how actual laboratories usually work, we have decided to put some metadata directly into the dataset. Specifically, we want the dataset to be able to hold information about
What the experimenter wishes to study as a function of what (expressed via
depends_on
).What corresponds to a raw machine setting/reading (expressed via
inferred_from
).
As complexity of the experiments grow, the second notion can be difficult to uphold. It is offered as a help to ensure good scientific practice.
It is important to note that the dataset can freely be used without any declarations of dependencies of either sort.
Plotting¶
Besides being optional metadata describing the correct interpretation of
measurement data, the direct dependencies (expressed via depends_on
)
are used to generate the default plot. We estimate that for the vast
majority of measurements to be stored in the dataset, the
experimentalist will want to be able to plot the data as they are coming
in and also have the ability to quickly bring up a plot of a particular
measurement without specifying more than the id of said measurement.
This necessitates the declaration, in the dataset itself, of what should
be plotted against what. The direct dependencies can thus be understood
in the following way: \(A\) depends on \(B\) and \(C\) means
that the default plot is of \(A\) with \(B\) on one axis and
\(C\) on the other.
Although visual plotting is not tractable for an arbitrary amount of axes, we promote the principle of having a default plot to be a logical principle about which dependencies we allow: only those resulting in a meaningful (perhaps \(N\)-dimensional) default plot are allowed.
All possible trees¶
Now that we have established a language for describing connections between parameters, and also described our aim in terms of plotting and metadat, let us review what the dataset does and does not allow.
It follows from the consideration of section Plotting that the dataset allows for a single layer of direct dependencies. The trees shown in Fig. 4 are therefore all invalid and can not be stored in the dataset.
A few words explaining why are in place.
Circular dependence. There is no way of telling what is varied and what is measured.
Independent parameters not independent. Although \(A\) clearly sits on top of the tree, the two independent variables are not independent. It is not clear whether \(C\) is being varied or measured. It is ambiguous whether this describes one plot of \(A\) with \(B\) and \(C\) as axes or two plots, one of \(A\) versus \(B\) and another of \(C\) versus \(B\) or even both situations at once.
Similarly to situation 2, \(C\) is ill-defined.
\(B\) is ill-defined, and it is not clear what \(A\) should be plotted against.
It is perhaps instructive to see how the above trees could be remedied. In Fig. 5 we show all possible valid reconfigurations that neither invert any arrows nor leave any parameters completely decoupled [2]. The fact that each tree of Fig. 4 has several valid reconfigurations exactly illustrates the ambiguity of those trees [3].
In column c of Fig. 5 we see two somewhat new graphs. In 2c, we allow two variables to depend on a third one. There is no ambiguity here, two plots will result from this measurement: \(A\) versus \(B\) and \(C\) versus \(B\). Similarly, in 3c we’ll get \(A\) versus \(B\) and \(C\) versus \(D\). The total number of trees and plots per dataset is treated in the next section.
Number of trees per dataset¶
The dataset can hold an arbitrary number of “top-level” parameters, meaning parameters with arrows only going out of them, parameters on which nothing depends. At each step of the experiment, all parameters that such a top-level parameter points to must be assigned values, if the top-level parameter gets assigned a value. Otherwise, they may be omitted. What this means in practice is illustrated in Fig. 6.
We may say that this dataset de facto contains two trees, one \(A-B-D\) tree and one \(C-B\) tree [4] . One dataset can hold as many such trees as desired. In code, Fig. 6 might take the following form:
meas = Measurement()
meas.register_parameter(D)
meas.register_parameter(B)
meas.register_parameter(A, depends_on=(B, D))
meas.register_parameter(C, depends_on=(B,))
with meas.run() as datasaver:
for b_val in b_vals:
for d_val in d_vals:
B.set(b_val)
D.set(d_val)
a_val = A.get()
datasaver.add_result((A, a_val),
(B, b_val),
(D, d_val))
c_val = C.get()
datasaver.add_result((C, c_val),
(B, b_val))
A few examples¶
Finally, to offer some intuition for the dataset’s dependency structure, we cast a few real-life examples of measurements into tree diagrams.
Conductance measurement¶
In a conductance measurement measuring conductance as a function of gate voltage, a gate voltage, \(V_\text{gate}\), is swept while a lock-in amplifier drives the DUT at a certain frequency with a drive amplitude \(V_\text{drive}\). The drive induces a current which oscillates at the drive frequency. An I-V converter converts that oscillating current back into an oscillating voltage (which a certain gain factor, \(G_{IV}\), with units \(A/V\)), and that voltage is fed back into the lock-in. Assuming no phase shift, the lock-in amplifier’s \(X\) reading is then related to the conductance, \(g\), according to
The corresponding parameter tree is shown in Fig. 7, where \(A\) is \(g\), \(B\) is \(V_\text{gate}\), and \(C\) is \(X\). One could of course argue that \(V_\text{drive}\) and \(G_{IV}\) should also be parameters that \(g\) is inferred from. We suggest the following rule: anything that is known beforehand to remain constant throughout the entire run can be omitted from the dataset and written down elsewhere [5]. The converse also holds: anything that does change during a run really should be saved along.
Compensatory sweeping¶
An interesting example that potentially does not fit so nicely into our scheme is offered by compensatory sweeping. A voltage, \(V_1\) is swept and a quantity \(S\) is measured. Since sweeping \(V_1\) has some undesired effect on the physical system, a compensatory change of another voltage, \(V_2\) is performed at the same time. \(V_2\) changes with \(V_1\) according to
Since both \(\alpha\) and \(\beta\) might change during the run via some feedback mechanism, we have four parameters apart from \(S\) to sort out.
There are two ways to go about this.
Decoupling¶
If the experimentalist really insists that the interesting plot for this measurement is that of \(S\) versus \(V_1\) and the compensation is just some unfortunate but necessary circumstance, then the unusual tree of Fig. 8 is the correct representation.
The tree of Fig. 8 does fit into the scheme of Fig. 2, the scheme we promised to represent the most general setting. There are now two possibilities. Either we were initially wrong and no dependencies save for those specifying the default plot can be defined for this measurement. Else the experimentalist is wrong, and has an untrue representation of the experiment in mind. We explore that idea in below in Restructuring.
Restructuring¶
If the space spanned by \(V_1\) and \(V_2\) has a meaningful physical interpretation [6], it might make more sense to define a new parameter, \(V_3\) that represents the path swept along in that space. After all, this is what is \(physically\) happening, \(S\) is measured as a function of \(V_3\). Then the tree of Fig. 9 emerges.