\documentclass[10pt]{article}
\usepackage{fullpage}
\usepackage{setspace}
\usepackage{parskip}
\usepackage{titlesec}
\usepackage[section]{placeins}
\usepackage{xcolor}
\usepackage{breakcites}
\usepackage{lineno}
\usepackage{hyphenat}
\PassOptionsToPackage{hyphens}{url}
\usepackage[colorlinks = true,
linkcolor = blue,
urlcolor = blue,
citecolor = blue,
anchorcolor = blue]{hyperref}
\usepackage{etoolbox}
\makeatletter
\patchcmd\@combinedblfloats{\box\@outputbox}{\unvbox\@outputbox}{}{%
\errmessage{\noexpand\@combinedblfloats could not be patched}%
}%
\makeatother
\usepackage[round]{natbib}
\let\cite\citep
\renewenvironment{abstract}
{{\bfseries\noindent{\abstractname}\par\nobreak}\footnotesize}
{\bigskip}
\titlespacing{\section}{0pt}{*3}{*1}
\titlespacing{\subsection}{0pt}{*2}{*0.5}
\titlespacing{\subsubsection}{0pt}{*1.5}{0pt}
\usepackage{authblk}
\usepackage{graphicx}
\usepackage[space]{grffile}
\usepackage{latexsym}
\usepackage{textcomp}
\usepackage{longtable}
\usepackage{tabulary}
\usepackage{booktabs,array,multirow}
\usepackage{amsfonts,amsmath,amssymb}
\providecommand\citet{\cite}
\providecommand\citep{\cite}
\providecommand\citealt{\cite}
% You can conditionalize code for latexml or normal latex using this.
\newif\iflatexml\latexmlfalse
\providecommand{\tightlist}{\setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}%
\AtBeginDocument{\DeclareGraphicsExtensions{.pdf,.PDF,.eps,.EPS,.png,.PNG,.tif,.TIF,.jpg,.JPG,.jpeg,.JPEG}}
\usepackage[utf8]{inputenc}
\usepackage[ngerman,english]{babel}
% Use this header.tex file for:
% 1. frontmatter/preamble LaTeX definitions
% - Example: \usepackage{xspace}
% 2. global macros available in all document blocks
% - Example: \def\example{This is an example macro.}
%
% You should ONLY add such definitions in this header.tex space,
% and treat the main article content as the body/mainmatter of your document
% Preamble-only macros such as \documentclass and \usepackage are
% NOT allowed in the main document, and definitions will be local to the current block.
\begin{document}
\title{A Short Guide to Using Python For Data Analysis In Experimental Physics}
\author[1]{Nathanael A. Fortune}%
\affil[1]{Smith College}%
\vspace{-1em}
\date{\today}
\begingroup
\let\center\flushleft
\let\endcenter\endflushleft
\maketitle
\endgroup
\selectlanguage{english}
\begin{abstract}
Common signal processing tasks~in the numerical handling of experimental
data include interpolation, smoothing, and propagation of uncertainty. A
comparison of experimental results to a theoretical model further
requires curve fitting, the plotting of functions and data,~ and a
determination of the goodness of fit. These tasks often typically
require an interactive, exploratory approach to the data, yet for the
results to be reliable, the original data needs to be freely available
and resulting analysis readily reproducible. In this article, we provide
examples of how to use the Numerical Python (Numpy) and Scientific
Python (SciPy) packages and interactive Jupyter Notebooks to accomplish
these goals for data stored in a common plain text spreadsheet format.
Sample Jupyter notebooks containing the Python code used to carry out
these tasks are included and can be used as templates for the analysis
of new data.~%
\end{abstract}%
\sloppy
\textbf{Navigation Tip}: For a handy table of contents,
select~\textbf{Table of Contents} from the~\textbf{Document} V pull down
menu (upper left of this page).~ Clicking on an item in the table of
contents will take you to that section.~
\par\null
\section{Introduction}
{\label{323300}}\par\null
\subsection{About this guide}
{\label{796902}}
This is a short guide to using Python to accomplish some commonly needed
tasks when working with data acquired in an experimental physics lab,
including data import/export, plotting with error bars, curve fitting of
functions to data, testing the goodness of fit of these functions to the
data (taking into account the uncertainty in the measurements), the
interpolation, smoothing, and differentiation of data , the propagation
of ``error'' in calculated quantities, and numerical calculations of
data including units and/or uncertainty. A careful discussion of how to
import needed functions from the Numerical Python (numpy) and Scientific
Python (SciPy) libraries --- what Python calls `packages' --- is also
included. The examples presented here rely heavily on functions from
these packages to simplify various analysis tasks in the examples
below.~The focus here is on using Python as a scientific and graphing
calculator for experimental data, but more experienced Python users
should still find the functions and methods presented here useful in
their~own programs.
\par\null
The goal of this guide is for you to be carry out each of the analysis
steps illustrated here using your own data, and to be able to do so by
making at most only~ a few small changes (such as the names of data
files and columns of data) to the templates included with each example.
Beginners will need to know a few basic Python rules and syntax~ --- for
example, that~\(x^2\) is written~\texttt{x**2} and NOT
as~\texttt{x\^{}2}--- but will not need to already be experienced
programmers.~ For Recommendations on how to get started with Python, see
below.~
\par\null
\subsection{How to get started}
{\label{364433}}\par\null
\subsubsection{Python}
{\label{159380}}
If Python is new to you, we recommend
the~\href{https://github.com/jbattat/Python-Tutorials}{getting started
with Python tutorials} developed by our friends and rivals in the
Wellesley College Physics Department.
Go~\href{http://www.smithpioneers.com/}{Smith College Pioneers}! These
highly focused, highly practical tutorials are~in the form of
interactive Jupyter notebooks (see below) that allow you to try Python
out as you learn; we use them in our own courses and they form the
foundation for the methods presented here. We also recommend the
unusually lucidly written (and most generously,~ freely provided)
introductory
chapter~~\href{http://www-personal.umich.edu/~mejn/cp/chapters/programming.pdf}{Python
programming for physicists} from Mark Newman's Numerical Python-based
textbook~\href{http://www-personal.umich.edu/~mejn/cp/index.html}{Computational
Physics}~\cite{mark2013}. Newman's text provides ``an introduction to
the Python language at a level suitable for readers with no previous
programming experience''~ with an emphasis on the application of
computational methods to typical theoretical problems in undergraduate
physics.~~
\par\null
\subsubsection{Jupyter Notebooks}
{\label{875394}}
The examples provided in this guide~ run Python within user friendly but
somewhat oddly named
~\href{https://jupyter.readthedocs.io/en/latest/tryjupyter.html}{Jupyter~}notebooks
mentioned above. In some ways they look and act like Mathematica\selectlanguage{ngerman}®
notebooks, including the use of `cells' (paragraphs of text, equations,
or code) and the use of the~\texttt{shift+enter} keys~ to execute the
code in a cell. Jupyter notebooks are a spin-off of what is known as~
interactive Python (iPython) and in older examples on the web, you will
still encounter references to iPython Notebook instead of Jupyter. This
is also why Jupyter is spelled with a `py' and~ why Jupyter notebook
filenames still end with~\texttt{.ipynb}.~~
\par\null
If the Jupyter notebook (formerly iPython notebook) interface is new to
you,~ take time now to try
the~\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/binder/Index.ipynb}{online
interactive~ tutorial} on how to run Python inside Jupyter
notebooks.~\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/binder/Index.ipynb}{Go
ahead}, we'll wait. The most important web pages in that tutorial are
the first
two:~~\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/examples/Notebook/Notebook\%20Basics.ipynb}{Notebook
Basics}
and\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/examples/IPython\%20Kernel/Beyond\%20Plain\%20Python.ipynb}{iPython:
beyond plain Python}. Once you are ready to learn how to add your own
text and equations to Jupyter notebooks, see the third page
(on~\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/examples/Notebook/Working\%20With\%20Markdown\%20Cells.ipynb}{Markdown
Cells}).~
\par\null
Note that Jupyter notebooks are not the only way to write and run Python
code. A bare-bones command line editor called IDLE is used by many
beginning computer science students (and in the
text~\href{http://www-personal.umich.edu/~mejn/cp/index.html}{Computational
Physics}) and~ a MATLAB® style interface
called~\href{https://pythonhosted.org/spyder/}{Spyder} (again with the
`py'!) is used by many advanced Python programmers.~ The~ Python
programs provided here will run in any of these environments.~ But this
also means you can run Python programs encountered elsewhere~within
Jupyter notebooks! ~
\par\null
Within the Physics Department here at Smith College, we use the Jupyter
notebook interface~ for two primary reasons: (1)~ running Python within
the notebook provides a reproducible,~ self-documenting method of
analyzing collected data, and (2) the ability to add explanatory notes,
tables, figures, and equations within Jupyter notebooks means we can
also use them as electronic lab notebooks for courses. ~~
\par\null
\subsubsection{Try it out!~}
{\label{890878}}\par\null
\emph{On a webserver}
\par\null
Jupyter notebooks are viewed and run~ using a web browser such
as~\href{https://www.mozilla.org/en-US/firefox/}{Firefox} (just like
this article in~\href{https://www.authorea.com/}{Authorea}). That means
they can also be run on a webserver that you access from a web browser
and, as a result, it isn't necessary to have Python installed on your
own computer (provided you have internet access and all the Python
packages you wish to use are already installed on the webserver).~ ~
\par\null
To try out the~ Python programs presented
here~~\href{https://www.authorea.com/users/9932/articles/11070-how-to-insert-jupyter-and-ipython-notebooks-in-authorea-articles}{using
a Jupyter webserver hosted by Authorea}, click
the~~\texttt{\textless{}/\textgreater{}\ Code} button found to the left
of many of the figures in this guide. This will reveal the
~\texttt{.ipynb} Jupyter notebook (and associated data files) containing
the Python code used to generate that figure. Clicking on the notebook
file name will launch the notebook in a new tab or window within your
web browser . Clicking within a `cell' (a block of text, equations, and
or code, outlined by a rectangular border) and hitting SHIFT-ENTER will
run any code in that cell and advance to the next one.~ Alternatively,
you can make edits to the code in notebook (for example, adjust a
smoothing parameter or change the name of a file) then
select~\texttt{Run\ All} from the~\texttt{Cell} menu
or~\texttt{Restart\ \&\ Run\ All} from the~\texttt{Kernel} menu to rerun
the program.~ For additional help, see
the~\href{https://hub.mybinder.org/user/ipython-ipython-in-depth-8sxrynu1/notebooks/binder/Index.ipynb}{online
interactive~ tutorial} on how to run Python inside Jupyter notebooks.~
\par\null
Many institutions~\emph{host and configure their own Jupyter webservers}
suitable for Python programming. For
example,~\href{https://www.smith.edu/physics/}{Smith College physics
students} can upload and run any of the Jupyter notebooks included in
this guide on the webserver~\url{https://jove.smith.edu}, as~ this
particular webserver has all the packages used here preinstalled. (Note
to Smith students: contact the course instructor for an account).~
\par\null
\emph{On your own computer}
The Authorea server will work for all of the examples except the (aptly
named) Python
packages~\href{https://pint.readthedocs.io/en/stable/}{Pint}
and~\href{https://pythonhosted.org/uncertainties/}{Uncertainties} used
for numerical calculations using units and/or uncertainties, as those
packages are not currently installed on that server (but maybe
someday?). That said, you will want to write, run, and store your own
data and programs on your own system.~ If you don't have access to a
local Jupyter webserver with the packages you need, you will want to
install and run Python and Jupyter on your own computer. The good news
is that everything needed is available for free (except the computer)!
See~Section~{\ref{662059}}:~Installing Python for
instructions on how to do this.~
\section{Importing and exporting
data}
{\label{225119}}\par\null
There are many ways to import, export, and represent data in files. Here
we provide just enough to get you started but it might very well be all
you need. In these examples, we assume you have first entered the data
into a spreadsheet program, then exported that data in `CSV' (comma
separated variable) file format. We use the CSV file format here because
it is ubiquitous: all spreadsheets have the ability to export and import
data in the CSV file format. Data in other common plain text file
formats (such as tab delimited) can also be imported by making a few
small modifications to the examples provided below.~
\par\null\par\null
\subsection{CSV spreadsheet file
format}
{\label{313963}}\par\null
Let's look at a particular CSV data file
titled~\texttt{Calibration\_650nm.csv} . The file consists of a single
header row of text which we need to skip over when loading the numerical
data, followed by three columns of data, one for each measured variable.
\par\null
Here's what the data looks like in spreadsheet form (grid lines not
shown):
\par\null\selectlanguage{english}
\begin{table}[h!]
\centering
\normalsize\begin{tabulary}{1.0\textwidth}{CCC}
angle & V\_pd & V\_pd\_error \\
0.0e+00 & 3.24e+01 & 2.49e-02 \\
1.5e+01 & 3.01e+01 & 1.54e-01 \\
3.0e+01 & 2.4e+01 & 2.53e-01 \\
4.5e+01 & 1.59e+01 & 2.85e-01 \\
6.0e+01 & 7.78e+00 & 2.41e-01 \\
7.5e+01 & 1.96e+00 & 1.33e-01 \\
9.0e+01 & 3.57e-02 & 1.64e-02 \\
1.05e+02 & 2.58e+00 & 1.53e-01 \\
1.2e+02 & 8.95e+00 & 2.52e-01 \\
1.35e+02 & 1.74e+01 & 2.85e-01 \\
1.5e+02 & 2.56e+01 & 2.41e-01 \\
1.65e+02 & 3.13e+01 & 1.34e-01 \\
1.8e+02 & 3.3e+01 & 2.49e-02 \\
1.95e+02 & 3.03e+01 & 1.54e-01 \\
2.1e+02 & 2.39e+01 & 2.53e-01 \\
2.25e+02 & 1.56e+01 & 2.85e-01 \\
2.4e+02 & 7.55e+00 & 2.41e-01 \\
2.55e+02 & 1.88e+00 & 1.33e-01 \\
2.7e+02 & 3.38e-02 & 1.64e-02 \\
2.85e+02 & 2.46e+00 & 1.53e-01 \\
3.0e+02 & 8.51e+00 & 2.52e-01 \\
3.15e+02 & 1.66e+01 & 2.85e-01 \\
3.3e+02 & 2.47e+01 & 2.41e-01 \\
3.45e+02 & 3.05e+01 & 1.34e-01 \\
3.6e+02 & 3.25e+01 & 2.49e-02 \\
\end{tabulary}
\caption{{This is a caption
{\label{296817}}%
}}
\end{table}Here's what the first few lines of~ the same~ CSV spreadsheet file~looks
like when opened in a text editor (notice the commas separating the
values within each row):
\par\null
\begin{quote}
angle, V\_pd, V\_pd\_error
0.000000000000000000e+00,3.236249999999999716e+01,2.492398249033692462e-02
1.500000000000000000e+01,3.008079999999999998e+01,1.536232648449061544e-01
3.000000000000000000e+01,2.402850000000000108e+01,2.527732747539992997e-01
\end{quote}
\par\null
\subsection{loading data from a CSV
file~}
{\label{993016}}\par\null
We are now going to use the Numerical Python (numpy)
command~\texttt{loadtxt} to load data from this~ CSV text file.~ Each
column of data will become a Numerical Python (\texttt{numpy}) array.~~
\par\null
The process consists of two steps:
\par\null
\begin{enumerate}
\tightlist
\item
~importing the \texttt{loadtxt} command from Numerical Python
\item
using the~\texttt{loadtxt} command to transfer the data from the file
\end{enumerate}
\par\null
\subsubsection{importing commands~}
{\label{946997}}\par\null
\texttt{loadtxt} is not part of the core Python language but is instead
part of the Numerical Python package (\texttt{numpy}).~ We must
therefore ``import'' it into our program before using. There are two
common methods of doing this.~
\par\null
In the~first method, each individual function~ or module (such as
\texttt{loadtxt} from \texttt{numpy}) is imported as needed . We call
this the~\textbf{direct numpy import style}.~
``` \# example: direct import method
from numpy import loadtxt \#import just the numpy command loadtxt \#from
numpy import * \#an alternative method that imports \emph{all} numpy
commands. Be careful!
This method is used,~ for example,~ in the excellent Python-based
textbook~\href{http://www-personal.umich.edu/~mejn/cp/index.html}{Computational
Physics} by Mark Newman~\cite{mark2013}.~
\par\null
Here is another example:
\par\null
\begin{verbatim}
#example: direct import method
from numpy import sin, cos, array, pi # import a few needed functions from the numpy package
angle_in_degrees = array([0, 30, 60, 90]) # create an array with elements corresponding to 0, 30, 60, and 90 degrees
angle_in_radians = angle_in_degrees * pi / 180 # convert to radians
x = cos(angle_in_radians) # calculate cosine for each element in array, assign values to an array called 'x'
y = sin(angle_in_radians) # calculate sine for each element in array, assign values to an array called 'y'
print(y)
\end{verbatim}
with result
\begin{verbatim}
[0. 0.5 0.8660254 1.]
\end{verbatim}
\par\null\par\null\par\null
In the~~second method, we first import all of the~\texttt{numpy} package
( and, optionally, provide an abbreviation for~\texttt{numpy} such
as~\texttt{np}.)~ We then add ~\texttt{numpy.} (or the
abbreviation~\texttt{np.}) as a prefix when using a function from the~
\texttt{numpy} package.~ We call this the~\textbf{traditional import
style}.~
\begin{verbatim}
# example: traditional import method
import numpy as np # if you include 'as np', 'numpy' is replaced with the abbreviation 'np'
\end{verbatim}
We call this the~\textbf{traditional} method because this is what you
will find in the examples included in
the~\href{https://docs.scipy.org/doc/numpy/}{official numpy user
guide~}and~\href{https://docs.scipy.org/doc/numpy/user/quickstart.html}{quickstart
tutorials}.~
\par\null
The key to using the traditional method is to be sure to specify that
you are using a function~ or module from the numpy package by preceding
it either with~\texttt{numpy} or the abbreviation of your choice (such
as~\texttt{np}).~ This is why the~\texttt{loadtxt} commands used in the
examples below are written~\texttt{np.loadtxt} instead
of~\texttt{loadtxt}.~~
\par\null
Here is another example:
\par\null
\begin{verbatim}
#example: traditional import method
import numpy as np # you only need to type this once in each program
angle_in_degrees = np. array([0, 30, 60, 90]) # create an array with elements corresponding to 0, 30, 60, and 90 degrees
angle_in_radians = angle_in_degrees * np.pi / 180 # convert to radians
x = np.cos(angle_in_radians) # calculate cosine for each element in array, assign values to an array called 'x'
y = np.sin(angle_in_radians) # calculate sine for each element in array, assign values to an array called 'y'
print(y)
\end{verbatim}
with the same result as before:
\par\null
\begin{verbatim}
[0. 0.5 0.8660254 1.]
\end{verbatim}
One one hand, the~direct import method has the advantage of being lean,
clean, and easy to read. Computer scientists love it. On the other hand,
it does not work well if we need to use multiple packages containing
identically named functions.~ For example, the math
package~\texttt{math}, the complex mathematics package~\texttt{cmath},
the numerical python package~\texttt{numpy}, and the numerical
calculation of uncertainties package~\texttt{uncertainties} all have
trigonometric functions called~\texttt{sin} and~\texttt{cos}.~ To keep
clear which function we are using from which package, and when, we will
usually use the~ traditional import style for packages.
\par\null
\subsubsection{importing the data file}
{\label{172560}}
We now show how to use~\texttt{loadtxt}to~ import the~ csv format
spreadsheet file~\texttt{Calibration\_650nm.csv}. Unless specified
otherwise, the data file is assumed to be in the same file folder as the
Python program.
\par\null
\emph{The loadtxt command:}
\par\null
\begin{verbatim}
# example: traditional import method
import numpy as np # only need this once per program
file_name = 'Calibration_650nm.csv' # replace with the name of your csv data file
file_folder = '' # use this if your data file is in the same folder as your python program
#file_folder = '/Users/nfortune/data' # use this if data file is in a folder called 'data'
# inside the folder 'nfortune' within the 'Users' directory
# such as when using the Jupyter webserver jove.smith.edu
# this is called 'absolute addressing'
#file_folder = 'data_subfolder/' # you can use this if data file is in a _subfolder_ called 'data_subfolder'
# this is called 'relative addressing'
data_file = file_folder + file_name
angle, V_pd, V_pd_error = np.loadtxt(data_file, delimiter = ',', skiprows = 1, usecols = (0, 1, 2), unpack = True)
\end{verbatim}
\emph{What these commands do}:
\par\null
\texttt{data\_file\ =\ file\_folder\ +\ file\_name} tells Python what
the name of the file is and where to find it!~ Tip: If you have trouble
determining how to specify the file\_folder location for your data, an
easy workaround is to first put the~data in the same folder as your
Python program, then~ either (a)
set~\texttt{file\_folder\ =\ \textquotesingle{}\textquotesingle{}} as
in~ the example above or (b)
replace~\texttt{np.loadtxt(file\_folder\ +\ file\_name,\ ...)}
with~\texttt{np.loadtxt(file\_name,\ ...)~}.~~
\par\null
\texttt{delimiter\ =\ \textquotesingle{},\textquotesingle{}\ }tells
\texttt{loadtxt} that your data is in comma separated variable format
(CSV). The character used to~ separate one column of data from another
is called the `delimiter.' See
the~\href{https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html?highlight=loadtxt}{numpy.loadtxt}
manual page for details.~
\par\null
\texttt{skiprows\ =\ 1} tells~\texttt{loadtxt}~\emph{to skip over one
row} of text in the CSV file before looking for data to load into
arrays. This is because the first row of text contains names for each
column of data~ instead of data values (as shown in
Table~{\ref{296817}}).~
\par\null
\texttt{unpack\ =\ True} tells~\texttt{loadtxt} that the data is in a
`packed' data format (in which each~\emph{variable} corresponds to a
different~\emph{column} instead of to a different~\emph{row} )and
therefore needs to be `unpacked' when loaded. This is the typical
arrangement for data in spreadsheets. Use \texttt{unpack\ =\ False} if
the data is in an `unpacked' data format ~ (in which
each~\emph{variable} corresponds to a different \emph{row} instead of a
different \emph{column} ).~
\par\null
\texttt{usecols\ =\ (0,\ 1,\ 2)} says the data you are looking for is in
the first 3 columns, which are numbered 0, 1, and 2 (because Python
always starts from zero when counting).~ In our case, since there are
only 3 columns of data and we want to use all three, this command is
unnecessary. You could leave it out and everything would work just fine
for this particular data file.~ If, however,~ you wanted to load data
from a file with many columns but only needed data from column number~
0, column number 3, and column number 4, you would need to include
~\texttt{usecols\ =\ (0,3,4)} within the~\texttt{loadtxt} command.~
\par\null
\texttt{angle,\ V\_pd,\ V\_pd\_error~\ =\ np.loadtxt(...)~}tells
Numerical Python to create an array called~\texttt{angle} and fill it
with values from the first column of data in the CSV spreadsheet file,
then create an array called~\texttt{V\_pd} and fill it with values from
the second column of data, and finally create an array called
\texttt{V\_pd\_error} and fill it with values from the third column of
data. The result is three shiny new numpy data arrays we can use in our
calculations.
\par\null
What if your data is in a different text file format (such as tab
delimited)? In that case you can still use ~\texttt{loadtxt} to import
your data as long as you modify the~\texttt{delimiter}command to match
the file format.~ See
the~\href{https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html?highlight=loadtxt}{numpy.loadtxt}
manual page for details.~
\par\null
\subsection{saving data to a CSV file}
{\label{883907}}
~
For completeness, we now provide an example of how to use
the~\texttt{savetxt} command from \texttt{numpy}~
to~\emph{save}~\emph{data} in csv format. There are many ways to save
files in spreadsheet format but this is the simplest method we've found
so far using \texttt{numpy}.~
\par\null
Direct numpy import style:~
\begin{verbatim}
from numpy import savetxt, array #assumes you haven't already imported these commands
output_filename = 'output.csv' #provide a name for the new file
header_row_text = 'angle, V_pd, V_pd_error' #make first row of file be a list of column names. Optional.
comment_text = '' #do not start header row with a '#'. Optional.
#comment_text = '#' #start the header row with a '#' . Default setting.
data = array([angle, V_pd, V_pd_error]).T #create a 2D matrix and transpose rows and columns (clever trick)
savetxt(output_filename, data, delimiter = ',', header = header_row_text, comments = comment_text)
\end{verbatim}
Traditional numpy import style:
\begin{verbatim}
import numpy as np #don't import numpy again if already done once before
output_filename = 'output.csv' #provide a name for the new file
header_row_text = 'angle, V_pd, V_pd_error' #make first row of file be a list of column names. Optional.
comment_text = '' #do not start header row with a '#'. Optional.
#comment_text = '#' #start the header row with a '#' . Default setting.
data = np.array([angle, V_pd, V_pd_error]).T #create a 2D matrix and transpose rows and columns (clever trick)
np.savetxt(output_filename, data, delimiter = ',', header = header_row_text, comments = comment_text)
\end{verbatim}
Why do we need the line
~~\texttt{data\ =\ array({[}angle,\ V\_pd{]}).T~}? We need it because
ordinarily~\texttt{savetxt} would save the data in what Python calls~
`unpacked' format, a format in which each~\emph{variable} corresponds to
a different~\emph{row} instead of to a different~\emph{column}.~ This is
often convenient but is~\emph{not} what we wanted~ in this particular
case.~ We therefore did the following~\textbf{clever trick} before
saving the data to a file: we created a 2D matrix of our data with the
numpy command~\texttt{array({[}angle,\ V\_pd{]})}, then used
the~\texttt{.T}\emph{~}command to transpose the matrix , thereby
flipping the rows and columns.~~
\par\null
\subsection{Other file handling
methods}
{\label{289630}}
For more advanced data handling of spreadsheet data files, large data
sets, and/or the handling of binary data, you may wish to try the
commands provided by the very popular~ Python Data Analysis Library
package ~\href{http://pandas.pydata.org/}{pandas} or the big data
Hierarchical Data Format (HDF5)~ using the Python interface
package~\href{http://www.h5py.org/}{h5py}~ (instead of those provided
by~\href{https://docs.scipy.org/doc/numpy/reference/routines.io.html}{numpy}).~
\par\null
\section{Plotting data using error
bars}
{\label{344573}}
The most commonly used plotting package in Python
is~\href{https://matplotlib.org/index.html}{Matplotlib}. Here's an
example of how to use it to generate plots with error bars representing
the uncertainty in each data point.~ We use~\texttt{angle} from the
file~\texttt{650\ nm\ calibration.csv} for the x-axis values, we
use~\texttt{V\_pd} for the y-axis values, and we
use~\texttt{V\_pd\_delta} for the uncertainty in the y-axis values.~
\par\null
\begin{verbatim}
%matplotlib inline
import matplotlib as mpl
from matplotlib import pyplot as plt #this is the traditional method
mpl.rc('xtick', labelsize = 18) #use 18 point font for numbering on x axis
mpl.rc('ytick', labelsize = 18) #use 18 point font for numbering on y axis
plt.figure(figsize = (7,5)) #specify figure size as 7 x 5 inches
#for default size, type plt.figure()
plt.xlabel(r"$\theta$ [degrees]", fontsize = 18) #label axis (using LaTeX commands)
plt.ylabel(r"$V_{pd}$ [volts]", fontsize = 18) #use 18 point font for label text
plt.errorbar(angle, V_pd,
xerr=None, yerr=V_pd_error,
linestyle = 'none',
color = 'blue',
capsize = 3, capthick = 1)
plt.show()
\end{verbatim}
The result is shown in Fig.{\ref{525200}} below.~
\par\null
What these commands do:
\par\null
\begin{quote}
\texttt{\%matplotlib\ inline} is an interactive Python~ (iPython)
'\href{http://ipython.readthedocs.io/en/stable/interactive/magics.html}{magic
command}' used when running Python within a Jupyter notebook. It allows
the display of data plots within the notebook.~
\par\null
Tip:~\href{https://www.authorea.com/users/18589/articles/307487-matplotlib-pyplot-test}{When
running Jupyter notebooks within Authorea} ,
the\texttt{\%matplotlib\ inline} command must precede
the~\texttt{from\ matplotlib\ import\ pyplot} command.~
\par\null
\texttt{plt.figure()} signifies the beginning of the plotting
instructions specific to that figure.~~
\par\null
\texttt{plt.errorbar(angle,\ V\_pd,\ xerr\ =\ None,\ yerr=V\_pd\_delta)}~
is the command that instructs matplotlib to generate a x-y plot with
error bars (as opposed to a bar graph or scatter plot, for example).~
All four parameters (\texttt{x,\ y,\ xerr,\ yerr}) are required.
\par\null
\texttt{linestyle} and~\texttt{color} are used to customize the
appearance of the data points.~\texttt{Linestyle\ =\ None} means there
are no connecting lines between points. \texttt{Color} means the color
of the error bar lines and caps. The standard colors are blue, green,
red, cyan, magenta, yellow, black, and white (with corresponding
abbreviations `b', `g', `r', `c', `m', `y', `k', and `w').~
\par\null
\texttt{capsize}, and~\texttt{capthick} are ~used to customize the
appearance of the error bars.~~~ ~\texttt{capsize\ =3} sets the width of
the error bars to `3' (in typesetter points),
and~~\texttt{capthick=1}sets the thickness of the drawn bars to `1'
(again in typesetter points). Note that if~ the ~\texttt{capsize}
and~\texttt{capthick} commands are omitted,~ matplotlib will draw~ lines
indicating the given uncertainty but will omit the bars!~
\par\null
\texttt{plt.show()} signifies the end of the plotting instructions and
causes matplotlib to plot the data.~
\end{quote}
\par\null
For additional examples and information about other commands and options
(including~\href{https://matplotlib.org/api/_as_gen/matplotlib.pyplot.grid.html\#matplotlib.pyplot.grid}{grid}
lines,~\href{https://matplotlib.org/api/_as_gen/matplotlib.pyplot.xticks.html\#matplotlib.pyplot.xticks}{tick}
marks,~~\href{https://matplotlib.org/api/_as_gen/matplotlib.pyplot.semilogx.html\#matplotlib.pyplot.semilogx}{semilog}
plots,
and~\href{https://matplotlib.org/api/_as_gen/matplotlib.pyplot.loglog.html\#matplotlib.pyplot.loglog}{log-log}
plots), please click on the links supplied here or search
the~\href{https://matplotlib.org/api/pyplot_summary.html\#the-pyplot-api}{online
matplotlib.pyplot documentation}.~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Calibration-Plot-with-error-bars/Calibration-Plot-650nm-corrected}
\caption{{Photodiode voltage as a function of relative polarizer angle for light
at 650 nm passing through two polarizers.~\(V_{pd}\) corresponds
to the voltage measured across a resistor placed in series with a
photodiode and is linearly proportional to light intensity under these
conditions. A change in polarization angle of the light (due to the
Faraday effect, for example) can be detected as a change in photodiode
voltage but the size of the change in voltage for a given change in
polarization angle depends on the relative orientation of the two
polarizers.~
{\label{525200}}%
}}
\end{center}
\end{figure}
\section{Curve fitting}
{\label{822768}}\par\null
\subsection{mathematical modeling}
{\label{219511}}\par\null
~Curve fitting requires a mathematical model, initial estimates for the
adjustable parameters in the model, and estimates of the uncertainty in
the values of each data point.~ Here, we construct a mathematical model
based on
"\href{http://hyperphysics.phy-astr.gsu.edu/hbase/phyopt/polcross.html\#c3}{Malus's
Law}" to describe light passing through two linear polarizers, as
measured by a powered photodiode:~
\begin{equation}
\label{eq:Malus}
I(\theta) = I_0 \cos^2(\theta - {\theta}_0)+ C = \frac{1}{2}I_0\left[1 + \cos(2 (\theta - \theta_0))\right]+ C
\end{equation}
where ~\(\theta_0\) is the (fixed) angle of the first (reference)
linear polarizer,~\(\theta\) is the (variable) angle of the
second linear polarizer,~\(\theta\ -\theta_0\) is the relative angle
between them, and~~\(I_0\) is the maximum intensity of the
light passing through both polarizers ( which occurs
when~\(\theta=\theta_0\)).~ The offset \emph{C} represents any `dc'
offset in the signal from the photo-detector (resulting in a nonzero
signal in the absence of light), any additional light reaching the
photo-detector that didn't pass through the polarizers, and if the
polarizers are not perfect (ideal), the fraction of~ light passing
through but not polarized by both polarizers.~
\par\null
Rewriting in terms of the measured output photodiode
voltage~\(V_{pd}\) , the relative polarizer
angle~~\(\phi=\theta-\theta_0\) , and assuming a linear response between the
photo-detector output voltage and light intensity, we have
\par\null
\begin{equation}
\label{eq:Photodiode_Voltage}
V_{pd}(\phi) = V_{0} \cos^2(\phi)+ V_{1} = \frac{1}{2}V_0\left[1 + \cos(2 \phi)\right]+ V_1
\end{equation}
If, in addition,~ the
variables~\(V_0\),\(V_1\),~\(\theta\)
and~\(\theta_0\)and~\(\phi\) are independent of each
other (which will be the case for ideal polarizers), then~ to first
order, the uncertainty~\(\delta V_{pd}\) is given by~\cite{Hughes2010}
\par\null\par\null
\begin{equation}
\label{eq:deltaV}
\left(\delta V_{pd}\right) = \sqrt{\left( \frac{\partial{V_{pd}}}{\partial{V_0}}\right)^2 (\delta V_0)^2 + \left( \frac{\partial{V_{pd}}}{\partial{\phi}}\right)^2 (\delta \phi)^2 + \left( \frac{\partial{V_{pd}}}{\partial{V_1}}\right)^2 (\delta V_1)^2}
\end{equation}
\par\null
where~\(\delta\phi=\sqrt{\left(\delta\theta\right)^2+\left(\delta\theta_0\right)^2}\) and which we will approximate
as~\(\delta\theta\) as we are holding \(\theta_0\) constant.
\par\null
Evaluating the partial derivatives in Eq.~
{\ref{eq:deltaV}},
\begin{equation}
\label{eq:delta_V_Malus}
\delta V_{pd} = {V_0} \sqrt{ {\left(\cos{\phi}\right)}^4\left(\frac{\delta V_0}{V_0}\right)^2 + \left(-2 \cos{\phi}\sin{\phi} \right)^2(\delta \phi)^2+ \left(\frac{\delta V_1}{V_0}\right)^2}
\end{equation}
\par\null
The corresponding Python functions are
\begin{verbatim}
# import numpy as np
def polarization_model( phi_array, V_0, phi_0, V_1):
return V_0 * (1 + np.cos(2 * (phi_array - phi_0)))/2 + V_1
\end{verbatim}
and
\begin{verbatim}
# import numpy as np
def photodiode_error(phi_array, delta_V_0, delta_phi, delta_V_1, V_0, phi_0):
V_0_error= (delta_V_0 / V_0) * (np.cos(phi_array - phi_0))**2
phi_error = (delta_phi) * (2 * np.cos(phi_array- phi_0) * np.sin(phi_array - phi_0))
V_1_error= (delta_V_1 / V_0)
fractional_error = np.sqrt(V_0_error **2 + phi_error **2 + V_1_error **2)
return fractional_error * V_0
\end{verbatim}
\par\null
We can determine~\(\delta V_1\) and~\(\delta V_0\)
experimentally by measuring the statistical spread in the minimum and
maximum values of~\(V_{pd}\) (when~\(\phi=\ \frac{\pi}{2}\) radians
and~~\(\phi=\ 0\), respectively). The polarizer angle
\(\theta\) is mechanically set, so the
uncertainty~\(\delta\phi\) depends on the accuracy of a nominally
15\selectlanguage{ngerman}° step change in angle \(\theta\) with the apparatus at hand.~
\par\null
Now we encounter an apparent conundrum: we want to use the curve fit to
determine a best estimate for~~\(V_0\) but solving
for~\(\delta V\) requires us to already have a numerical value
for~\(V_0\). How do we determine the numerical values for
~\(\delta V_{pd}\left(\theta\right)\) needed~ to find a best estimate
of~\(V_0\) if we don't already know~\(V_0\)?
This, however, ~ is not the impasse that it might seem. The curve
fitting algorithm already requires that we supply an initial guess for
the parameters. In this situation,~ then, we can carry out the curve
fitting algorithm a second or third time instead of just once, each time
using the best fit values output by the algorithm in the previous fit as
our new initial values for the new fit. Once the output values match the
input values (within uncertainty), we stop. When the output values match
the input values, we say the results are~\emph{self-consistent.}
\par\null
\textbf{Here is the procedure for finding self-consistent `best fit'
values from curve-fitting:}
\begin{enumerate}
\tightlist
\item
Make a rough initial guess for the
parameters~\(V_0\),~\(V_1\),
and~\(\theta_0\) from a graph of the data.
\item
Use the values of~ \(V_0\),~\(V_1\),
and~\(\theta_0\) output by the curve-fitting routine as a new
`initial guess'
\item
Repeat the curve fit~ (using each output as a new input)
until~\(V_0\) stops changing.~
\end{enumerate}
Note: if you know how to do programming in Python, this would be a great
place to simplify your life by introducing a~ \textbf{while loop} into
the code that repeats the curve fit until the results become
self-consistent. We plan to add a section illustrating how to do that~
in a future version of this guide.~
\par\null
\subsection{fitting the model to data}
{\label{205170}}
\subsubsection{calculating best fit
values}
{\label{305318}}\par\null
We now turn to the actual Python code for non-linear curve fitting.
Notice that~ this is a~ ``weighted'' fit, in that the stated uncertainty
of each data point is taken into account during the fit. Practically
speaking, this means the curve-fitting routine tries harder to match the
model to the data at points with a smaller uncertainty (although it may
not succeed) because those points are given greater importance
('weight'). This is as it should be, and is also needed to calculate a
numerically accurate chi-square value for a determination of the
``goodness of fit.''
\par\null
Here we assume that values have already been experimentally determined
for uncertainties in~~\(V_0\),~\(V_1\),
and~\(\theta\). We will therefore leave these unchanged
throughout the curve-fitting process.~
\begin{verbatim}
# measured uncertainties
delta_V0 = 0.020 # mV, after averaging
delta_V1 = 0.014 # mV, after averaging
delta_theta = 0.5 * np.pi / 180 # 0.5 degrees, in radians
\end{verbatim}
Now let's take care of the initial setup:
\begin{verbatim}
# initial guess for polarization models
V0 = 30.0 #initial guess, in mV
V1 = 0.02 #initial guess, in mV
theta = angle * np.pi / 180 # convert from degrees to radians
theta0 = -2.0 * np.pi / 180 #initial guess for offset angle of 2 degree, in radians
initial_guess = np.array([V0, theta0, V1])
initial_error = np.array([delta_V0, delta_theta, delta_V1])
old_fit = np.copy(initial_guess) # save a copy to compare new with old
estimated_error = photodiode_error(theta, delta_V0, delta_theta, delta_V1,
V0, theta0) #propagate uncertainty using initial values
\end{verbatim}
Finally let's run
the~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html\#scipy.optimize.curve_fit}{curve
fit} command:~
\begin{verbatim}
# load curve_fit routine from scipy
from scipy.optimize import curve_fit # import method used here
# alternative method (as recommended in https://docs.scipy.org/doc/scipy/reference.api.html)
#from scipy import optimize
#fit, covariance = optimize.curve_fit(...)
#run curve_fit for polarization_model
fit, covariance = curve_fit(polarization_model, theta, V_pd,
p0 = initial_guess,
sigma = estimated_error, absolute_sigma = True)
error = np.sqrt(np.diag(covariance))
print(old_fit)
print(fit)
old_fit = np.copy(fit)
print()
print('V_0 = ','{:.3f}'.format(fit[0]), '±', '{:.3f}'.format(error[0]), ' mV')
print('V_1 = ','{:.4f}'.format(fit[2]), '±', '{:.3f}'.format(error[2]), ' mV')
print('theta_0 = ','{:.4f}'.format(fit[1]), '±', '{:.4f}'.format(error[1]), 'radian')
print(' = ','{:.4f}'.format(fit[1]*180/np.pi), '±', '{:.4f}'.format(error[1]*180/np.pi), 'degrees')
\end{verbatim}
Here are the results after the first iteration:~~
\begin{verbatim}
V_0 = 32.631 ± 0.024 mV
V_1 = 0.023 ± 0.016 mV
theta_0 = -0.0203 ± 0.0018 radian
= -1.162 ± 0.102 degrees
\end{verbatim}
We now use the output values for ~\(V_0\)
and~\(V_1\) as the new input values and recalculate the
estimated error for each~\(V_{pd}\left(\theta\right)\) value:
\begin{verbatim}
new_initial_values = np.array([fit[0], fit[1], fit[2]])
estimated_error = photodiode_error(theta, delta_V0, delta_theta, delta_V1,
fit[0], fit[1]) # propagate error using new values for V0, etc
fit, covariance = curve_fit(polarization_model, theta, V_pd,
p0 = new_initial_values,
sigma = estimated_error, absolute_sigma = True)
error = np.sqrt(np.diag(covariance))
print(old_fit)
print(fit)
old_fit = np.copy(fit)
V_pd_model = polarization_model(theta, fit[0], fit[1], fit[2])
residual = V_pd - V_pd_model
data_uncertainty = photodiode_error(theta, delta_V0, delta_theta, delta_V1, fit[0], fit[1])
chisq = sum((residual/ data_uncertainty)**2) #typo corrected
degrees_of_freedom = len(residual) - len(initial_guess)
reduced_chisq = chisq / degrees_of_freedom # this should be close to one
CDF = chi2.cdf(chisq, degrees_of_freedom) # this should be close to 50 percent
print('chi-square = ',chisq)
print('degrees of freedom = ',degrees_of_freedom)
print('reduced chi-square = ',reduced_chisq)
print('fractional probability of chisq \selectlanguage{english}[?]', chisq, 'for ', degrees_of_freedom, 'dof is', CDF)
\end{verbatim}
and continue in this way until the value V\_0 stops changing.
\subsubsection{graphing the results}
{\label{528993}}
Our final numerical results are:
\begin{verbatim}
V_0 = 32.629 +- 0.020 mV
V_1 = 0.022 +- 0.013 mV
theta_0 = -0.0202 +- 0.0020 radian
= -1.155 +- 0.112 degrees
\end{verbatim}
We can now graphically compare the original data with our model
(Eq.~~{\ref{eq:Photodiode_Voltage}} ) using the best
fit values for the parameters:
\begin{verbatim}
plt.figure(figsize = (11,8)) #specify figure size as 7 x 5 inches
#for default size, type plt.figure()
plt.xlabel(r"$\theta$ [degrees]", fontsize = 18) #label axis (using LaTeX commands)
plt.ylabel(r"$V_{pd}$ [volts]", fontsize = 18) #use 18 point font for label text
# plot the data as before in blue
plt.errorbar(angle, V_pd,
xerr=None, yerr=V_pd_error,
linestyle = 'none',
color = 'blue',
capsize = 3, capthick = 1, label = "data")
#create curve showing fit to data
angle_fit = np.linspace(0, 360, 180)
theta_fit = angle_fit * np.pi / 180
V_pd_fit = polarization_model(theta_fit, fit[0], fit[1], fit[2])
#plot the curve fit in red
plt.errorbar(angle_fit, V_pd_fit, xerr = None, yerr = None, color = 'red', label = 'fit' )
plt.xlim(-15, 375)
plt.ylim(-2.5, 40)
plt.xticks([0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360],
('0', '', '', 90, '', '', 180, '', '', 270, '', '', 360))
plt.legend(loc = 'best')
plt.show()
\end{verbatim}
The results are plotted in Fig.~{\ref{627293}} below.~
\begin{quote}
\par\null
\end{quote}\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/CurveFitToCalibration-650nm/corrected-fit}
\caption{{Curve fit of Eq.~{\ref{eq:Photodiode_Voltage}} to
calibration data for 650 nm using best-fit values for parameters.
{\label{627293}}%
}}
\end{center}
\end{figure}
At first glance, the model appears to describe the data quite well, with
the possible exception of the points where~\(V_{pd}\left(\theta\right)\) is a
maximum.~ To quantify the goodness of fit, however, we will need to
evaluate the fit residuals and chi-square test goodness of fit. See
section~{\ref{304456}}.~
\par\null
Note: there are additional features of the curve\_fit function not
demonstrated here, including the ability to set lower and upper bounds
on each of the adjustable parameters. For example, setting the
option~\texttt{bounds\ =\ ({[}0,\ -np.pi,~\ -1.0{]},\ {[}np.inf,\ +np.pi,\ 1.0{]})}
~would then force the best fit parameters to stay in the range ~
\(0\ \le I_0\) ,~\(-\pi\le\theta_0\ \le\pi\) in rad, and~\(-1\ \le\text{offset}\ \le+1\).~
See
the~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html\#scipy.optimize.curve_fit}{scipy.optimize.curve\_fit}
documentation for details.~
\subsection{Evaluating the goodness of
fit}
{\label{304456}}\par\null
We now have a best fit of the model to the data. How good a fit is it?
That is, how well does the model describe the data, taking into account
the uncertainty in each data point? Two quick ways to test the goodness
of fit of the model to the data are through calculations of the
residuals and the reduced chi-square value.~ See~\cite{Hughes2010}
and~\cite{herman2011}.~
\par\null
\subsubsection{residuals}
{\label{982282}}\par\null
The calculation of the residual errors is easy once the model is defined
and the fitting parameters determined.~ Here is an example:
\begin{verbatim}
# theta = angle * np.pi / 180 # convert to radians
V_pd_model = polarization_model_1(theta, fit[0], fit[1], fit[2])
residual = V_pd - V_pd_model
\end{verbatim}
The corresponding plot is shown in Fig.~{\ref{676696}}
below.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Residual-plot/residual-error}
\caption{{residual error compared with estimated measurement uncertainty
for~\(\pm\sigma\) (in red) and~\(\pm2\sigma\) (in orange).~
Most points are within one or two standard deviations of zero residual
error, as expected for a good fit to the data using Malus's law, but the
points do not appear randomly distributed. Instead, there is a weak
angular dependence suggesting a small uncorrected systematic
misalignment of the light source and/or polarizers with respect to the
detector.
{\label{676696}}%
}}
\end{center}
\end{figure}
The residuals are generally within one standard deviation of zero but do
not appear randomly distributed. Instead there is a weak residual
angular dependence. One possibility is that light beam, polarizers, and
detector are not perfectly centered and the polarization is not
completely uniform across the entire surface of the polarizer. Malus's
law does not however appear to be in question. Rather, the residuals
indicate a small level of polarizer-angle dependent systematic error.
\subsubsection{}
{\label{871337}}
\subsubsection{chi-square test}
{\label{871337}}\par\null
Once the residuals have been calculated, the determination of the
corresponding chi-square value for goodness of fit is straightforward.
Here is an example:
\par\null
\begin{verbatim}
from scipy.stats import chi2 # 'chi-square' goodness of fit calculation
chisq = sum((residual/ data_uncertainty)**2) # typo corrected 2/1/2019
degrees_of_freedom = len(residual) - len(initial_guess)
reduced_chisq = chisq / degrees_of_freedom # this should be close to one
CDF = chi2.cdf(chisq, degrees_of_freedom) # this should be close to 50 percent
print('chi-square = ',chisq)
print('degrees of freedom = ',degrees_of_freedom)
print('reduced chi-square = ',reduced_chisq)
print('fractional probability of chisq [?]', chisq, 'for ', degrees_of_freedom, 'dof is', CDF)
\end{verbatim}
Ideally, if the model describes the data AND the estimated uncertainty
of each point has been accurately determined, the reduced chi-square
value would be approximately equal to one (for large~\(\nu\))
and the fractional probability of~\(\chi^2\) values larger and
smaller than that measured would both be approximately 50\%.~
Statistically, however, we expect some reasonable random variation from
the expected mean value of~\(\nu\) for~\(\chi_{\min}^2\),
meaning that statisticians generally will not reject the ``null
hypothesis'' (that the model describes the data and all deviations can
be reasonably attributed to expected random variation) as long
as~\(\chi_{\min}^2\) is within 2 standard deviations
(~\(2\sigma\) )~ of the mean value. Here, ~~\(\chi_{mean}^2\ =\ \nu\)
and~~\(\sigma_{\nu}=\sqrt{2\nu}\)~\cite{Hughes2010} , so the criterion can be
reexpressed as
\begin{equation}
\label{eq:Goodness_of_fit_criteria}
\nu - 2 \sqrt{2 \nu} \le {{\chi}^2}_{min} \le \nu + 2 \sqrt{2 \nu}
\end{equation}
\par\null
Here are the results for the fit shown in Fig.
{\ref{627293}}:
\par\null
\begin{quote}
chi-square test value~~\(\chi_{\min}^2\) = 15.3
degrees of freedom~~\(\nu\)~ = 25 - 3 = 22
reduced chi-square~ ~ = 0.694
fractional probability of~\(\chi^2\) [?] 15.3 is~ 15.0\%~
fractional probability of~\(\chi^2\) \textgreater{} 15.3 is~
85.0\%~
\end{quote}
\par\null
Here,~\(\sigma=\ \sqrt{2\ \nu}=6.6\),~~\(\chi_{mean}^2\ -\ \sigma\ =\ 15.4\), and~\(\chi_{mean}^2\ +\ \sigma\ =\ 28.6\),
indicating that our result is very slightly outside~~one standard
deviation of the mean but well within two standard deviations. We
conclude that we have acceptably good agreement between model and
experiment at this level of experimental precision but that further
improvements in apparatus design, measurement method, and reduction in
uncertainty are merited.~~
\par\null
\section{Data Smoothing and
Differentiation}
{\label{151285}}\par\null
\subsection{Savitzky-Golay filters}
{\label{766247}}\par\null
Data smoothing is a way of reducing time varying noise in measured data
so as to reveal the underlying dependence of the measured variable on a
different variable (such as an applied voltage or an external magnetic
field). In most cases, the computer is acting as a digital low pass
filter. It differs from an analog low pass output filter in that the
filtering is done by us using a computer after the measurement is
complete rather than by the instrument itself. Like the signal averaging
done by a digital oscilloscope --- the point by point averaging of a
periodic signal --- data smoothing seeks to separate random noise from a
reproducible signal, but here the measured `signal' to be smoothed is
not assumed to be periodic and is in fact usually a function of a
variable other than time.~ An example would be the measurement of
current as a function of applied voltage for a non-ohmic device.
\par\null
The SciPy (for scientific python) package offers an exhaustive list of
~\href{https://docs.scipy.org/doc/scipy/reference/signal.html}{signal
processing functions} useful for the digital filtering.~ If you have
some programming experience in Python, you may wish to design your own
filter, perhaps using the lower level
scipy.signal~\href{https://docs.scipy.org/doc/scipy/reference/signal.html}{filter
design tools}.~ There are, for example, a wide variety of spectral
analysis and peak finding routines for frequency data that we will not
explore further here.
\par\null
In this getting start guide, we will focus on the use of a particularly
useful and easy to use ready-made filter for~\textbf{data smoothing}
of~\(y\left(x_i\right)\) data equally spaced in~\(x\):~
the~~\href{https://en.wikipedia.org/wiki/Savitzky\%E2\%80\%93Golay_filter}{Savitzky-Golay}
low pass filter~\cite{Savitzky_1964,Steinier_1972}. This filter works by fitting a
subset of data points adjacent to a particular data point with a
low-degree polynomial, evaluating the polynomial at that point, and then
repeating the process for each data point. The set of points centered
about a particular data point is called the `filter window' and the
number of points in the filter window is called
the~\emph{window}~\emph{length} (although~\emph{width} would seem more
apt). The picture here might be~ of a small window in a cabin looking
out upon a nearby data stream. The window~ reveals a small~ subset of
points at a time as the data ``streams'' by. The fit is carried out on
the points seen through the window, and repeated for each point in the
stream.~
\par\null
As a bonus, the SciPy Savitzky-Golay filter
function~\texttt{scipy.signal.savgol\_filter} can also be used to
numerically~\textbf{differentiate}~\textbf{the smoothed data} (again,
provided the data is ``equally spaced'').~ This is a particularly useful
feature. For example, we might directly measure~\(I\left(V\right)\) for a
particular non-ohmic device (such as a light bulb or LED) but be more
interested theoretically in the differential current~\(\frac{dI}{dV}\)
as a function of applied voltage~\(V\). Experimentally,
the noise fluctuations in a measurement of ~~\(I\) as a
function of~\(V\)might preclude a direct calculation
of~\(\frac{dI}{dV}\). By first smoothing the data using a
Savitzky-Golay filter to reduce the noise, a meaningful calculation
of~\(\frac{dI}{dV}\) as a function of~\(V\) can now be
carried out.~~
There are a few practical constraints:
\begin{enumerate}
\tightlist
\item
the window length must be an odd number
\item
the polynomial order (1, 2, 3, \ldots{}) must be less than the window
length
\item
the data points are assumed to be ``evenly spaced'' (for example in
time, voltage, or magnetic field)~
\end{enumerate}
For example, when measuring current~I as a function of applied voltage
V~ from -10 to +30 V in equal steps of 0.1 volts, specifying a window
length of 21 points~ and a second order polynomial fit would correspond
to a fitting a quadratic function to a set of 21 points spanning a 2 V
range centered about a particular data point (including that point),
then repeating that process for each subsequent data point. In the SciPy
implementation, this repeated fitting over a moving data stream is done
automatically for us.~
\par\null
As noted above, the number of points included in the subset and the
order of the polynomial must be~ specified by the user. In all data
smoothing routines, there is a tradeoff between the degree of smoothing
and the ability to resolve small features in the underlying signal; data
smoothing is not a substitute for working as carefully as possible to
minimize the measurement noise and maximize the accuracy and precision
of the originally measured data! In the end, the~ `best values' for
these parameters are usually determined by trial and error. Keep track
of the values used when determining the resolution of your experiment!~
\par\null
\subsubsection{Scipy.Signal.Savgol\_filter}
{\label{201802}}\par\null
This filter can
be~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html}{implemented
in Python} with just two lines of code: one to import from
the~\texttt{scipy.signal~}module the function~\texttt{savgol\_filter},
and one more to carry out the data smoothing.~
\par\null
We first need to specify two parameters:
~
\begin{enumerate}
\tightlist
\item
\emph{window\_length},~ the number of adjacent data points we wish to
include in the fit
\item
\emph{polyorder}, the~order of the polynomial fit (where 0 = constant,
1 = linear, 2 = quadratic, and 3 = cubic)
\end{enumerate}
If we also want to calculate the slope of the data at each point by
taking a derivative, we then need to specify two additional parameters
:~
\par\null
\begin{enumerate}
\tightlist
\item
~\emph{deriv}, the~order of the derivative (where 0 = no derivative, 1
= first derivative, 2 = 2nd derivative,\ldots{})
\item
\emph{delta}, the~spacing\textbf{~}of the samples to which the filter
will be applied.~
\end{enumerate}
As an example, suppose we want again to determine~\(\frac{dI}{dV}\)
from a measurement of~\(I\left(V\right)\) for applied voltages~ between
-10 V and +30 V and that measurements were made evenly spaced in voltage
with a step size of 0.1 V. If so, we then specify that deriv = 1 and
delta = 0.1 within the \texttt{savgol\_filter} function.~
\par\null
Finally, for the computed derivative to be physically and quantitatively
meaningful,~
\par\null
\begin{enumerate}
\tightlist
\item
the order of the polynomial fit needs to be greater than the order of
derivative (!)~
\item
The delta value needs to be~ correctly specified. If unspecified, it
is assumed that delta = 1.~
\end{enumerate}
As usual, there are additional optional settings, including
~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html}{options}
for how to handle data near the~ endpoints (see~\texttt{mode}): the
default (\texttt{mode\ =\ interp}) is to fit the last window\_length / 2
points to a polynomial of order polyorder.~ For this and other details,
see
the~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.savgol_filter.html}{Scipy
reference manual page}.
\par\null
\subsubsection{Sample Python code}
{\label{988459}}
Here is an example of how to use the SciPy function
\texttt{savgol\_filter} for data smoothing.
\begin{verbatim}
from scipy.signal import savgol_filter
window_width = 25 # set number of points over which data is fit and smoothed equal to 25
# window_width must be an odd number
polynomial_order = 2 # set order of polynominal used to fit data equal to 2
# polynomial_order must be less than window_width
data_spacing = 0.1 # data_spacing = x_1 - x_0 for data y(x_0), y(x_1), ....
smoothed_data = savgol_filter(noisy_data, window_width, polynomial_order) #smooth data
data_derivative = savgol_filter(noisy_data, window_width, polynomial_order, deriv = 1, delta = 0.1) #take 1st derivative, assuming spacing in x = 0.1 for data points y(x_0), y(x_1), ...
\end{verbatim}
An example of data smoothing using the Savitzky-Golay low pass filter is
shown in Fig~{\ref{987576}}. Since for real
experimental data the original `clean signal' is unknown, the degree of
smoothing is always a matter of judgement. A comparison of several
different smoothing values (not shown) can be useful in discerning the
best choice of parameters and the effective resolution of your processed
data (and any derivatives) .~\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SVG-smoothing/SVG-smoothing}
\caption{{Use of a Savitzky-Golay low pass filter to smooth (artificially
generated) noisy data.~
{\label{987576}}
{\label{987576}}
{\label{987576}}%
}}
\end{center}
\end{figure}
In general, increasing the window width increases the degree of
smoothing, causing small features in the original `clean signal' to
disappear; decreasing the window width decreasing the degree of
smoothing, causing artificial features (arising from the noise) to
emerge. Increasing the order of the polynomial while holding the window
width constant decreases the degree of noise reduction but is needed for
if there is significant data curvature over the width of the smoothing
window. To see this for yourself, this click first on the
\textless{}/\textgreater{} Code button, then on the file
name~\texttt{Smooth\_and\_differentiate.ipynb} to open and view the
underlying~ Jupyter notebook containing the Python code used to generate
this figure.~ Once opened, you can vary the smoothing parameters and
re-run the notebook to see the changes. You can also download the
notebook to your own computer as a template.
\par\null
\subsection{Butterworth filters}
{\label{634831}}\par\null
Everyone has there own favorite low pass filter smoothing routine. The
Savitky-Golay method presented here is a useful and intuitive place to
start but it is not necessarily better or worse than any other smoothing
routine. For more advanced filtering methods and examples,~ see
the~\href{http://scipy-cookbook.readthedocs.io/items/idx_signal_processing.html}{SciPy
Cookbook} and the SciPy signal
processing~\href{https://docs.scipy.org/doc/scipy/reference/signal.html\#module-scipy.signal}{reference
guide}.
\par\null
Two examples of particular interest from the SciPy Cookbook
involve~\href{https://en.wikipedia.org/wiki/Butterworth_filter}{Butterworth
filters}, ~ which are designed to have as flat a frequency response as
possible over the range of frequencies to be passed while still
filtering out unwanted frequencies (such as high frequency noise):~
\par\null
\begin{enumerate}
\tightlist
\item
a~\href{http://scipy-cookbook.readthedocs.io/items/ButterworthBandpass.html}{Butterworth
bandpass filter} to filter out high frequency noise, low frequency
noise, and dc drift
\item
a
\href{http://scipy-cookbook.readthedocs.io/items/FiltFilt.html}{Butterworth
low pass filter} for general data smoothing ~~
\end{enumerate}
\par\null
Note: The use of a
\href{https://en.wikipedia.org/wiki/Butterworth_filter}{Butterworth
filter} requires the specification of a cutoff frequency (or
frequencies). For a digital Butterworth filter, varying the value of
the~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html\#scipy.signal.butter}{normalized
Nyquist\_frequency} between zero and one will change the degree of
smoothing provided.~ Lower values produce greater smoothing of the data.
\par\null
The example in Fig.~{\ref{897852}} below uses a 3rd
order digital Butterworth low-pass filter with a Nyquist frequency of
0.13 (see attached code) to smooth the same signal as in
Fig.~{\ref{987576}} ; the
original\href{http://scipy-cookbook.readthedocs.io/items/FiltFilt.html}{SciPy
cookbook example} uses a frequency of~ 0.05.~See
the~\href{http://scipy-cookbook.readthedocs.io/items/FiltFilt.html}{SciPy
cookbook}~ and the Jupyter notebook attached to this figure for
additional details.
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/FiltFiltResult/FiltFiltResult}
\caption{{Use of a
\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.butter.html\#scipy.signal.butter}{digital
Butterworth low-pass filter} (and
some~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.filtfilt.html\#scipy.signal.filtfilt}{fancy
footwork}) to smooth the same data as in
Fig.~{\ref{987576}}.~ ~
{\label{897852}}%
}}
\end{center}
\end{figure}
\subsection{Differentiation}
{\label{606982}}
One of the most common reasons for data smoothing is to then be able to
differentiate the data.~ As noted earlier in
section~{\ref{201802}}, one of the advantages of~
using~\texttt{scipy.signal.savgol\_filter} is that the data smoothing
and differentiation can be done in a single step (when working with
evenly spaced data). Here is an example (for the data originally shown
in Fig. {\ref{987576}}):
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/SVG-derivative/SVG-derivative}
\caption{{Filtered signal and first derivative of data shown in
FIg.~{\ref{987576}} . Numerical results calculated
(from noisy signal) using the SciPy Savitzky-Golay filter~ function
(window width = 25, polyorder = 2).~
{\label{583719}}%
}}
\end{center}
\end{figure}
\par\null
More generally, however, the numerical derivative of a suitably filtered
signal~\(y\left(x\right)\) can also be evaluated at
each~\(x\) value using the~\texttt{numpy}
function~\texttt{gradient} where for a 1D array of data the gradient
will be the same as the derivative~\(\frac{dy}{dx}\). According to the
~\texttt{numpy.gradient}
~\href{https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html\#numpy.gradient}{reference
page} , this function ``calculates the gradient using second order
accurate central differences in the interior points and either first or
second order accurate one-sides (forward or backwards) differences at
the boundaries. '' It also has the advantage of not requiring equally
spaced data values.~See
the~\href{https://docs.scipy.org/doc/numpy/reference/generated/numpy.gradient.html\#numpy.gradient}{reference
page} for further examples and details.~
\par\null
Here is a simple example of how to use \texttt{gradient} to numerically
calculate~\(\frac{df}{dx}\) from smoothed~\(f\left(x_i\right)\) data
without knowledge of the function~\(f\left(x\right)\), assuming~ that the
data is equally spaced with a sample distance~\(dx\) of
0.1:~
\par\null
\begin{verbatim}
import numpy as np
data_derivative_array = np.gradient(smoothed_data_array, 0.1)
\end{verbatim}
Gradient can also used with unequally spaced values, if those values are
provided.~ Here is one example:
\begin{verbatim}
import numpy as np
x = np.array([0., 1., 1.5, 3.5, 4., 6.], dtype = float)
f = np.array([1, 2, 4, 7, 11, 16], dtype = float)
dfdx = np.gradient(f,x)
\end{verbatim}
\par\null\par\null\par\null
For a more general guide to numerical differentiation and integration~
(in which you also learn how to code your own routines using numpy), see
~\href{http://www-personal.umich.edu/~mejn/cp/chapters/int.pdf}{Chapter
5: Integrals and Derivatives} from the Python-based
textbook~\href{http://www-personal.umich.edu/~mejn/cp/index.html}{Computational
Physics} by Mark Newman~\cite{mark2013}.~
\par\null
\section{Interpolation and Peak
Finding}
{\label{135595}}\par\null
\subsection{interp1d}
{\label{512330}}\par\null
Perhaps the most commonly used interpolating function in python
is~\texttt{scipy.interpolate.interp1d}.~ The default option is linear
interpolation but for sparsely separated data a cubic spline is often
preferable. See the
\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html}{interp1d
manual page} for additional options and details.
\par\null
Here is an example of how to use interp1d to construct a cubic spline
interpolation of sparse data.
\par\null
\begin{quote}
First, import the necessary python packages and data:
\end{quote}
\begin{verbatim}
#setup Jupyter notebook
%matplotlib inline
#import packages and functions
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from scipy.interpolate import interp1d # this is the 1d iterpolation function
#specify data file location
file_name = 'Calibration_650nm_result.csv'
file_folder = '/Users/nfortune/data/'
#import data from CSV text file
angle, V_pd, data_uncertainty = np.loadtxt(
file_folder + file_name,
delimiter = ',', skiprows = 1,
usecols = (0, 1, 2), unpack = True)
\end{verbatim}
\begin{quote}
Second, construct an interpolating function from the data, then use it
to generate new (interpolated) values
\end{quote}
\begin{verbatim}
#construct an interpolating function from the data
interpolating_function = interp1d(angle, V_pd, kind = 'cubic') # create interpolation function
#create array of new angle values for interpolation
new_angle_values = np.linspace(0, 360, 180) # in degrees
#evaluate at new angle values
interpolated_data = interpolating_function(new_angle_values)
\end{verbatim}
\begin{quote}
Third, graph the results:
\end{quote}
\par\null
\begin{verbatim}
plt.figure(figsize = (11,8)) #specify figure size as 7 x 5 inches
#for default size, type plt.figure()
plt.errorbar(angle, V_pd, xerr=None, yerr=data_uncertainty,
linestyle = 'none', color = 'blue', capsize = 3, capthick = 2,
label = "original data points")
plt.errorbar(new_angle_values, interpolated_data, xerr = None, yerr = None,
color = 'black',
label = 'cubic spline interpolation')
plt.xlabel(r"$\theta$ [degrees]", fontsize = 18) #label axis (using LaTeX commands)
plt.ylabel(r"$V_{pd}$ [mV]", fontsize = 18) #use 18 point font for label text
plt.xlim(-15, 375)
plt.ylim(-2.5, 40)
plt.xticks([0, 30, 60, 90, 120, 150, 180, 210, 240, 270, 300, 330, 360],
('0', '', '', 90, '', '', 180, '', '', 270, '', '', 360))
plt.legend(loc = 'best')
plt.show()
\end{verbatim}
The results are shown in Fig. {\ref{238634}}.\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Cubic-Spine-interpolation/Cubic-Spine-interpolation}
\caption{{cubic spline interpolation of 650 nm calibration data using
\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html}{scipy.interpolate.interp1d}
{\label{238634}}%
}}
\end{center}
\end{figure}
\subsection{InterpolatedUnivariateSpline}
{\label{402871}}\par\null
Not surprisingly, the function~\texttt{interp1d} is just one of many
spline functions and classes, one-dimensional (univariate) and
multidimensional (multivariate) interpolation classes, and Lagrange,~
Taylor, and Pade polynomial interpolators. For a comprehensive list, see
the~\href{https://docs.scipy.org/doc/scipy/reference/interpolate.html}{scipy.interpolate
reference manual}.
\par\null
Two of these might be of particular interest to you in the analysis of
1D data:
\begin{enumerate}
\tightlist
\item
~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html\#scipy.interpolate.UnivariateSpline}{UnivariateSpline},
which~ constructs a 1D smoothing spline of degree k to the provided
x,y data
\item
~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.InterpolatedUnivariateSpline.html\#scipy.interpolate.InterpolatedUnivariateSpline}{InterpolatedUnivariateSpline},
which constructs a 1D spline that passes through all data points
\end{enumerate}
Their advantages include
\begin{itemize}
\tightlist
\item
the ability to construct a new spline representing the derivative of
the original spline
\item
the ability to construct a new spline representing an integral of the
original spline
\item
and for 3rd order splines, the ability to find the roots (zero
crossings) of the spline
\end{itemize}
Note: these `object-oriented' interpolating functions are technically
what Python calls classes rather than functions. For us that just means
there are a few differences in syntax and usage we will need to pay
attention to, but in exchange, we get a much more powerful interpolation
routine.
Here we show how to use~\texttt{InterpolatedUnivariateSpline} to
interpolate the same data as before using a 4th order spline, find a
(3rd order) derivative, find the roots of that derivative, and then use
that information to identify the extrema~ for the original interpolating
spline. For corresponding examples using~\texttt{UnivariateSpline}, see
the~\href{https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.UnivariateSpline.html\#scipy.interpolate.UnivariateSpline}{reference
manual page}.
\par\null
Let us assume we have imported the data as before and are now ready to
interpolate the data. Interpolation is again a two-step process:
construct an interpolating function, then apply it to generate new data
values.
\par\null\par\null
\begin{verbatim}
from scipy.interpolate import InterpolatedUnivariateSpline
#construct interpolating function
InterpolatingUnivariateSpline_function = InterpolatedUnivariateSpline( angle, V_pd, k = 4) # 4th order spline
#create array of new angle values for interpolation
new_angle_values = np.linspace(0, 360, 180) # in degrees
# generate new data values
IUS_interpolated_data = InterpolatingUnivariateSpline_function(angle_fit)
\end{verbatim}
When graphed, the new interpolated spline fit is indistinguishable from
the previous one using interp1d:
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/UnivariateSplineFit/UnivariateSplineFit}
\caption{{Comparison of two spline interpolations with data. The two methods give
nearly equivalent results (so the interp1d cubic spline interpolation is
black is largely overwritten by the InterpolatingUnivariateSpline 4th
order interpolation in blue).~
{\label{314863}}%
}}
\end{center}
\end{figure}
As will be shown below, the true value of the
InterpolatingUnivariateSpline method is in the ability to find
derivatives and roots. Here we first construct a generating function for
the 1st derivative of the original univariate spline, then use that to
evaluate the derivative at the same points as before:
\par\null\par\null
\begin{verbatim}
spline_derivative_function = InterpolatingUnivariateSpline_function.derivative()
data_derivative = spline_derivative_function(angle_fit)
\end{verbatim}
\par\null
Next, we find the roots of the derivative and the corresponding extrema
of the original interpolating spline:
\par\null
\begin{verbatim}
zero_crossings = InterpolatingUnivariateSpline_function.derivative().roots()
extrema_values = InterpolatingUnivariateSpline_function(zero_crossings)
zero_values = np.zeros(len(zero_crossings)) # generate an array of zeros
\end{verbatim}
~The results are shown in Fig {\ref{754178}} for the
derivative ~
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/UnivariateDerivativeWithZeros/UnivariateDerivativeWithZeros}
\caption{{Derivative of the interpolating spline function
{\label{754178}}%
}}
\end{center}
\end{figure}
\par\null
and in FIg.~{\ref{442657}} for the original
interpolating spline:
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/UnivariateSplineWithExtrema/UnivariateSplineWithExtrema}
\caption{{Locations of extrema (peaks and valleys) for the interpolated data
{\label{442657}}%
}}
\end{center}
\end{figure}
\subsection{peak finding}
{\label{643243}}\par\null
This can be a handy way to peaks and valleys in your data but depending
on the noise level~ it may identify a sequence of points rather than a
single value. You may be able to address this by smoothing the data
before interpolation, but~ for more advanced peak finding routines
(including determination of peak widths and relative degree of
prominence), see the links to the~\textbf{peak
finding}~\textbf{routines} on
the~\texttt{scipy.signal}~\href{https://docs.scipy.org/doc/scipy/reference/signal.html}{reference
manual page.~}Examples can be found at the bottom of the reference pages
for each function.
\section{}
{\label{822768}}
\section{Working with units and
uncertainties}
{\label{822768}}\par\null
When using Python in place of a calculator, we have the ability to
directly include information about dimensions, units and uncertainties
in the calculations. When used, these abilities ~offer the advantage of
allowing us to check algebraic calculations, automatically propagate
uncertainties in calculated quantities, and avoid unit conversion
errors, but the necessary Python modules are not included in the
standard Anaconda Navigator Python installation and must be added by
hand if needed.
\par\null
Note for Smith students:~ these additional modules~ are already
installed on the classroom computers for PHY 350 and the Smith Physics
Jupyter webserver at~~\url{https://jove.smith.edu}. (The https is
necessary).
\subsection{Calculations with units}
{\label{423713}}\par\null
In the~ PHY350 Experimental Physics course here at Smith College, we
make extensive use of the Python module
~\href{http://pint.readthedocs.io/en/stable/index.html}{Pint}~by Hernan
Greco.~ A tutorial for Pint is
available~\href{http://pint.readthedocs.io/en/stable/tutorial.html}{here}.~~
Here is an example of using Pint for calculator-like calculations with
units: ~
\begin{verbatim}
import numpy as np
import pint
unit = pint.UnitRegistry() # for clarity, we use 'unit' instead of the default 'ureg'
Q_ = unit.Quantity
g = 9.8 * unit.newton / unit.kg # define quantities with units
m = Q_(1.0, 'kg') # an alternative method
force = m * g #define calculated quantity
print(force)
9.8 newton # sample output
\end{verbatim}
Pint also
includes~\href{http://pint.readthedocs.io/en/stable/numpy.html}{support
for numpy arrays}, as shown in this example:
\begin{verbatim}
>>> mass_magnitudes = np.array([0.1, 0.2, 0.5, 1.0, 2.0, 5.0, 10.0])
>>> masses = mass_magnitudes * unit.kg
>>> print('{:~P}'.format(masses))
[ 0.1 0.2 0.5 1. 2. 5. 10. ] kg
>>> masses.magnitude
array([ 0.1, 0.2, 0.5, 1. , 2. , 5. , 10. ])
>>> print('{:~P}'.format(masses * g))
[ 0.98 1.96 4.9 9.8 19.6 49. 98. ] N
\end{verbatim}
You can view the complete notebook by clicking on the
\textless{}/\textgreater{} Code button to the left of Fig.~ below, then
clicking on the file named~
~\href{http://data.authorea.com:33014/notebooks/PintTest.ipynb}{PintTest.ipynb}
. Doing so will open up the notebook in Jupyter webserver hosted by
Authorea. ~
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=1.00\columnwidth]{figures/Sample-Pint-Notebook/Sample-Pint-Notebook}
\caption{{an extract of the Jupyter notebook PintTest.ipynb~ demonstrating the use
of Pint to do calculations in Python using units.
{\label{625931}}%
}}
\end{center}
\end{figure}
\par\null
Note: in some cases, Python notebooks such as these can be run and
modified within the Authorea Jupyter notebook webserver. In this case,
however, you must first download the code to your own computer, as Pint
is not yet included in the Authorea's python installation.~ Note about
print formatting commands: the \textasciitilde{} command instructs Pint
to output units in abbreviated form. The P command adds subscript and
superscript formatting.~ See the
\href{https://pint.readthedocs.io/en/stable/tutorial.html}{Pint
tutorial}.
\par\null
\subsection{Calculations with
uncertainties}
{\label{393002}}
For ~simple first-order ('linear') error propagation involving
quantities with units, ~you can use Pint's handy
~\href{http://pint.readthedocs.io/en/latest/measurement.html}{plus\_minus()~}~operator.
It allows absolute and relative uncertainties:~
\begin{verbatim}
import numpy as np
from pint import UnitRegistry
unit = UnitRegistry()
width = (10. * unit.centimeter).plus_minus(.1, relative=True) # 10 percent uncertainty
length = (20. * unit.centimeter).plus_minus(2.) # 2 cm uncertainty
area = length * width
print(width)
print(length)
print('{:~P}'.format(area))
(10.0 +/- 1.0) centimeter
(20.0 +/- 2.0) centimeter
(200 \selectlanguage{ngerman}± 28) cm²
\end{verbatim}
For details, see
the~\href{http://pint.readthedocs.io/en/stable/measurement.html}{Pint
measurements tutorial}.~
\par\null
For general propagation of uncertainty tasks , we use
~\href{https://pythonhosted.org/uncertainties/}{Uncertainties}: a Python
package written by Eric. O. Lebigot, the same package Pint uses ``under
the hood'' for calculations invoking \texttt{.plus\_minus()}. The
uncertainties module returns its result with the uncertainty specified
by
linear~\href{http://en.wikipedia.org/wiki/Propagation_of_uncertainty}{error
propagation theory}, taking into account any direct correlations between
variables.~ Quoting from the uncertainties website,
\begin{quote}
\par\null
The standard deviations and nominal values calculated by this package
are thus meaningful approximations as long as \textbf{uncertainties are
``small''}. A more precise version of this constraint is that the final
calculated functions must have \textbf{precise linear expansions in the
region where the probability distribution of their variables is the
largest}. Mathematically, this means that the linear terms of the final
calculated functions around the nominal values of their variables should
be much larger than the remaining higher-order terms over the region of
significant probability (because such higher-order contributions are
neglected).
For example, calculating~\texttt{x*10} with~\texttt{x} = 5±3 gives
a~\emph{perfect result} since the calculated function is linear\ldots{}~
Another example is~\texttt{sin(0+/-0.01)}, for
which~\texttt{uncertainties} yields a meaningful standard deviation
since the sine is quite linear over 0±0.01.
However,~\texttt{cos(0+/-0.01)}, yields an approximate standard
deviation of 0 because it is parabolic around 0 instead of linear; this
might not be precise enough for all applications.
\end{quote}
Here we provide a demonstration of how to use
\href{https://pythonhosted.org/uncertainties/}{Uncertainties} to
calculate the uncertainty values (error bars) for the data shown in
Fig.~{\ref{525200}}).
\par\null
First, \emph{import the data and assign values, including specification
of uncertainties ~}to~ the input
parameters~\(V_0\),~\(V_1\),~\(\theta_0\)
and the data array~\(\theta\).
\begin{verbatim}
%matplotlib inline
#import packages
from matplotlib import pyplot as plt
from numpy import *
from uncertainties import ufloat, unumpy # these are extensions of numpy floating point numbers and arrays
#import x, y data
filename = 'Calibration_650nm.csv'
angle_data, V_pd_data = loadtxt(filename, delimiter = ',', skiprows = 1, usecols = (1, 2), unpack = True)
#specify directly measured values, including uncertainties: V_0 ± delta_V_0, V_1 ± delta_V_1, etc
V_0 = ufloat(32.631, 0.024) # first element is the nominal value, the second is the standard dev, both in mV
V_1 = ufloat(0.023, 0.016) # first element is the nominal value, the second is the standard dev, both in mV
theta_0 = ufloat(-1.16, 0.11) * pi / 180 # convert from degrees to radians
#convert to radians for use in trigonometric functions
theta_data = angle_data * pi / 180
delta_theta = 0.5 * pi / 180
# create an array of angle values with uncertainty
theta_array = unumpy.uarray(theta_data, delta_theta)
\end{verbatim}
Next, use uncertainties to~\emph{automatically}~\emph{propagate
uncertainty.~}The uncertainties package
automatically~\href{https://pythonhosted.org/uncertainties/user_guide.html\#derivatives}{calculates
derivatives} as needed (see, for example,
Eq.~{\ref{eq:deltaV}}), following the standard rules
for~\href{https://pythonhosted.org/uncertainties/tech_guide.html\#linear-propagation-of-uncertainties}{propagation
of error}:
\begin{verbatim}
#to calculate values while also automatically taking into account uncertainties, use unumpy instead of numpy
V_pd_theory = (1/2)*V_0 * (1 + unumpy.cos(2*(theta_array - theta_0 ))) + V_1 #notice use of unumpy.cos
values = unumpy.nominal_values(V_pd_array) #creates an array with best estimates of V_pd
uncertainties = unumpy.std_devs(V_pd_array) #creates an array with uncertainties for V_pd
\end{verbatim}
Here are the first three calculated values of~\(V_{pd}\)
corresponding to~~\(\theta=0,\ 15,\ 30\) degrees that result (including the
calculated uncertainty) :~
\begin{verbatim}
print(angle_data[0:3])
[ 0. 15. 30.]
print(V_pd_theory[0:3])
[32.64055493822073+/-0.03115762246979197
30.125380043639208+/-0.1578587386098956
23.916021775219797+/-0.25858767540002414]
\end{verbatim}
and a corresponding graph comparing the data (in blue) with the
calculated values (in red), including calculated uncertainties (shown as
error bars):
\par\null\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/Uncertainties/Uncertainties}
\caption{{Comparison of data (in blue) with calculated values~
for~\(V_{pd}\left(\theta\right)\) from
Eq.~{\ref{eq:Photodiode_Voltage}}. Error bars for
calculated values were determined automatically from the uncertainties
in the input parameters~~(using
the~\href{https://pythonhosted.org/uncertainties/}{uncertainties}
package and Eq.~{\ref{eq:Photodiode_Voltage}} instead
of Eq.~{\ref{eq:delta_V_Malus}}).
{\label{318739}}%
}}
\end{center}
\end{figure}
Notice that our original approach in
section~{\ref{219511}}~ of explicitly calculating the
uncertainty in \(V_{pd}\) from Eq.
{\ref{eq:delta_V_Malus}} and this new approach of
using the \texttt{uncertainties} package give equivalent results, but
that in both cases we needed a good estimate of~\(V_0\pm\delta V_0\) to
determine~\(V_{pd}\pm\delta V_{pd}\) at each value of~\(\theta\).~
\par\null
In general, either of these two approaches will be sufficient for most
of our work with data, but notice that because both involve linear
expansions of functions about their nominal values, both yield an
unrealistically low uncertainty of zero for small variations
in~\(V_0\) and~\(\phi\)
(neglecting~\(\delta V_1\) for now) at~\(\phi=\ \theta-\ \theta_0=0\) . This is
because an Taylor expansion of~\(\cos\left(\phi\right)\)
around~\(\phi=0\)~ yields~~\(\cos\left(0\pm\delta\phi\right)=\ 0\ +\ 0\ \cdot\delta\phi+\ \frac{1}{2}\cdot\left(\delta\phi\right)^2+\ ...\ =\ 0\) in the linear
approximation limit (which treats terms of order~\(\left(\delta\phi\right)^2\) and
higher as being negligibly small).
\par\null
If you need still more advanced approaches
to\href{https://en.wikipedia.org/wiki/Propagation_of_uncertainty}{propagation
of uncertainty}, the author of
~\href{https://pythonhosted.org/uncertainties/}{uncertainties}
recommends looking
at~~\href{https://pypi.python.org/pypi/soerp}{soerp}~and~\href{https://pypi.python.org/pypi/mcerp}{mcerp}.~
According to the uncertainties website,
"the~\href{https://pypi.python.org/pypi/soerp}{soerp} package
performs~\emph{second-order} error propagation: this is still quite
fast, but the standard deviation of higher-order functions like
f(x)~=~x\textsuperscript{3} for x~=~0\selectlanguage{ngerman}±0.1 is calculated as being exactly
zero (as with~\texttt{uncertainties}).
The~\href{https://pypi.python.org/pypi/mcerp}{mcerp} package performs
Monte-Carlo calculations, and can in principle yield very precise
results, but calculations are much slower than with approximation
schemes."
\par\null
\section{Installing Python}
{\label{662059}}\par\null
This section is only relevant if you are planning on running Python on
your own computer. If you are running Python within a Jupyter notebook
on a webserver or a computer account which has already been configured
for your use (such as~\url{https://jove.smith.edu} for Smith College
physics) , this section can be skipped.~
\par\null
\subsection{Python distributions}
{\label{734994}}
Our focus in this article~ is on the use of Python to expedite the~
analysis of your experimental data and not, for example, the specifics
of various Python distributions and their relative merits for
computational physics in terms of speed, accuracy, and memory
requirements.~
\par\null
We therefore recommend that if you need to install a Python distribution
for scientific data analysis in physics on your own computer, you choose
an installer that will automatically install
\href{https://www.python.org/}{Python}, interactive Python~
(\href{https://ipython.org/}{iPython}) and
\href{https://jupyter.org/}{Jupyter} notebooks,~ scientific python
development environments (editing, testing, debugging)~ such
as~\href{https://pythonhosted.org/spyder/}{Spyder} or
\href{https://www.enthought.com/product/canopy/canopy-interactive-debugger}{Canopy},~
and~ essential~\href{http://www.scipy.org}{Scientific Python packages}
(such as \href{http://www.numpy.org/}{numpy},
\href{https://matplotlib.org/}{matplotlib} and
\href{https://www.scipy.org/scipylib/index.html}{scipy}) in a single
step, rather than building this from scratch. This provides ease of
installation, ease of use,~ and a comprehensive curated set of
preinstalled and easily added packages.~
Two of the most popular distributions and installers are
from~~\href{https://www.anaconda.com/download/}{Anaconda~}
and~\href{https://www.enthought.com/product/canopy/}{Enthought}.~ Both
are freely available for Mac OS, Linux, and Windows .~ Either should do
what you need. That said, the examples presented below for installation
of additional Python packages assume the use
of~\href{https://www.anaconda.com/download/}{Anaconda Navigator}.~
Also, if you are using this for a Smith Physics course, we ask you to
use Anaconda Navigator if you wish us to be able to provide support with
installation and programming (since that is what we are using).
\par\null
\subsubsection{Installing Anaconda
Navigator}
{\label{541387}}\par\null
Download the latest version
of~\href{https://www.anaconda.com/download/}{Anaconda Navigator} here.~
Be sure to chose the Python 3.x version of Python (and not the older
version 2.7), as all of the examples provided here assume the use of
Python 3. After installation, restart and launch the Anaconda Navigator
app. You should then be able to launch a Jupyter notebook from either
the graphical interface or environments tab (see Figure
{\ref{231311}}) to be officially off and running.~
\par\null
OK, you also need to install some supplementary packages if you want to
be able to do~ calculations that include units and/or uncertainties.~
But it isn't difficult within Anaconda to do that. See section
{\ref{273995}} below.~
\par\null
\subsubsection{Building~ your own
distribution}
{\label{731142}}\par\null
Still prefer to build your own clean, lean, and mean Python installation
and willing to blaze (and maintain) your own trail ?
Here's~\href{http://www-personal.umich.edu/~mejn/cp/chapters/AppendixA.pdf}{a
guide to getting started} that results in an exceptionally lean
installation suitable for use in computational physics. If you want to
run the example code~ provided here on your own computer, however, you
will also want to install
the~\href{https://ipython.org/}{iPython},~\href{https://jupyter.org/}{Jupyter},~
and~\href{https://www.scipy.org/scipylib/index.html}{Scipy} packages~
(at a minimum).~ Other packages may be needed as well; one tried and
true but tedious way to do this is to attempt to run the code and then
let the computer tell you what you are missing!~
\par\null
Finally, if~ you are an experienced and independent-minded Python
programmer (and would you have read the previous paragraph if you
weren't?), you may ask, ``Why Jupyter?''~ Why not use the Python or
iPython command line directly, use a bare-bones editor such as IDLE , or
a more comprehensive MATLAB like programming environment like Spyder? If
you are in one of our Smith Physics courses, the answer is because we
use Jupyter notebooks not only to run Python code~ but also~ to generate
a partially ``self-documenting'' electronic lab notebook as we do so.
~If you are doing a lot of programming requiring extensive re-writing
and debugging, you may prefer to first work in~ development environment
like Spyder, then run the finished product within Jupyter to generate a
notebook version of the data analysis, but in experimental physics, you
still need to keep an comprehensive electronic notebook of your
instrument setup,~ measurement, and data analysis steps! Finally, note
that the command line~ is convenient for quick calculations and tests of
code but is inadequate for serious editing and debugging, and does not
provide a reproducible record of your results. ~ As a calculator it is
great but as a notebook it is not! There's really no good reason as an
experimental physicist with a computer at your disposal not to use an
electronic notebook instead of a calculator in the first place. Get with
the program!~ :)
\par\null
\subsection{Using the Anaconda
installer}
{\label{140511}}
Packages included in the Anaconda Python distribution can be~ installed
and updated using the Anaconda Navigator installer. This has an easy to
use graphical interface. As a bonus, any needed auxiliary files will~
updated at the same time . Potential conflicts between packages will
also be identified ahead of time. See the section of
the~~\href{https://docs.anaconda.com/anaconda/navigator/getting-started}{Getting
Starting guide} provided by Anaconda
titled~\href{https://docs.anaconda.com/anaconda/navigator/getting-started\#managing-packages}{Managing
Packages} for step by step instructions.~
\par\null
\subsection{Using command line
installers}
{\label{273995}}
Packages not included in the standard ~Anaconda Navigator installer can
still be installed using Anaconda Navigator. To do so,~
\par\null
\begin{enumerate}
\tightlist
\item
\textbf{open a terminal window} from with Anaconda Navigator (see
figure~{\ref{231311}} below )
\item
\textbf{issue the installation command}. See below for examples
using~\textbf{conda}~ (section~{\ref{240339}}) and~
\textbf{pip~}(section~{\ref{262745}})
\end{enumerate}
Be sure to closely follow the installation instructions provided with
the documentation for the package. In most cases it is straightforward
but sometimes extensions are also needed. If~ the package can be
installed using either~\textbf{conda} or~\textbf{pip},
choose~\textbf{conda} (as~\textbf{conda} will also install needed
extensions). ~
\par\null\selectlanguage{english}
\begin{figure}[h!]
\begin{center}
\includegraphics[width=0.70\columnwidth]{figures/OpenTerminalWindowInAnacondaNavigator/OpenTerminalWindowInAnacondaNavigator}
\caption{{How to open a~terminal window within Anaconda Navigator.
{\label{231311}}%
}}
\end{center}
\end{figure}
\subsubsection{conda install}
{\label{240339}}
Here is an example of how to~ use the \textbf{conda} command while in
the terminal window to install
\href{https://pint.readthedocs.io/en/stable/}{Pint},~ a Python package
written by Hernan E. Grecco to define, operate and manipulate physical
quantities: the product of a numerical value and a unit of measurement.
\begin{verbatim}
conda install -c conda-forge pint
\end{verbatim}
\subsubsection{pip install}
{\label{262745}}
Here is an example of how to~ use the~\textbf{pip} command while in the
terminal window to
install~\href{https://pint.readthedocs.io/en/stable/}{uncertainties},~ a
Python package written by Eric. O. Lebigot that~\textbf{transparently}
handles calculations with~\textbf{numbers with uncertainties} (like
3.14\selectlanguage{ngerman}±0.01).
\par\null
\begin{verbatim}
pip install --upgrade uncertainties
\end{verbatim}
Upgrades can be done in a similar way. To upgrade Pint, for example,
type the following while in a terminal window:
\par\null
\begin{verbatim}
pip install -U pint
\end{verbatim}
Note that you shouldn't be running any notebooks within Anaconda when
you do this! Best practice is to restart Anaconda Navigator first, then
install upgrades, then open a Jupyter notebook to launch Python.
\selectlanguage{english}
\FloatBarrier
\bibliographystyle{plainnat}
\bibliography{bibliography/converted_to_latex.bib%
}
\end{document}