A quick introduction to version control with Git and GitHub

and 2 collaborators

# Introduction to version control

Many scientists write code as part of their research. Just as experiments are logged in laboratory notebooks, it is important to document the code you use for analysis. However, a few key problems can arise when iteratively developing code that make it difficult to document and track which code version was used to create each result. First, you often need to experiment with new ideas, such as adding new features to a script or increasing the speed of a slow step, but you do not want to risk breaking the currently working code. One often utilized solution is to make a copy of the script before making new edits. However, this can quickly become a problem because it clutters your filesystem with uninformative filenames, e.g. `analysis.sh`

, `analysis_02.sh`

, `analysis_03.sh`

, etc. It is difficult to remember the differences between the versions of the files, and more importantly which version you used to produce specific results, especially if you return to the code months later. Second, you will likely share your code with multiple lab mates or collaborators and they may have suggestions on how to improve it. If you email the code to multiple people, you will have to manually incorporate all the changes each of them sends.

Fortunately, software engineers have already developed software to manage these issues: version control. A version control system (VCS) allows you to track the iterative changes you make to your code. Thus you can experiment with new ideas but always have the option to revert to a specific past version of the code you used to generate particular results. Furthermore, you can record messages as you save each successive version so that you (or anyone else) reviewing the development history of the code is able to understand the rationale for the given edits. Also, it facilitates collaboration. Using a VCS, your collaborators can make and save changes to the code, and you can automatically incorporate these changes to the main code base. The collaborative aspect is enhanced with the emergence of websites that host version controlled code.

In this quick guide, we introduce you to one VCS, Git (git-scm.com), and one online hosting site, GitHub (github.com), both of which are currently popular among scientists and programmers in general. More importantly, we hope to convince you that although mastering a given VCS takes time, you can already achieve great benefits by getting started using a few simple commands. Furthermore, not only does using a VCS solve many common problems when writing code, it can also improve the scientific process. By tracking your code development with a VCS and hosting it online, you are performing science that is more transparent, reproducible, and open to collaboration \cite{23448176, 24415924}. There is no reason this framework needs to be limited only to code; a VCS is well-suited for tracking any plain-text files: manuscripts, electronic lab notebooks, protocols, etc.

Charge Density Waves: Models, Current Characteristics and Applications

Municipalidad como centro de ciencia: Participación ciudadana, investigación local y amplificar la cultura científica.

and 2 collaborators

What's Open Access Good For? Absolutely everything!

and 5 collaborators

The effect of carbon subsidies on marine planktonic niche partitioning and recruitment during biofilm assembly

and 1 collaborator

Homework Portfolio

p/2 (b) As R is bounded, the closure of R is closed and bounded. So we can apply the extreme value theorem which means f is bounded on the closure of R. In particular, f is bounded on R. f is also integrable on R; in fact ∫RfdA = 0. Apply the lower bound property, ∫RfdA ≥ p₁(Area(R)) holds. (c) Suppose f is positive at (a, b). From (a), there is a disc Rof nonzero radius on which f(x, y)>f(a, b)/2 > 0. From (b), ∫RfdA ≥ (f(a, b)/2)·area(R)>0 But we assumed that ∫RfdA = 0 for all smoothly bounded sets R, it comes to a contradiction. Therefore f cannot be positive at any point. (d) As we know that −f is continuous, and that for all smoothly bounded regions R, by linearity, we have −fdA = −fdA = −0 = 0 . From(c),we know that −f cannot be positive at any point. Thus, we conclude that f cannot be negative at any point. (e) Therefore, for any (a, b), f(a, b) is defined and is neither positive nor negative, so it must be 0. 7. -6.44. Justify the following steps to prove that if f is integrable on R₂ and g is a continuous function with 0 ≤ g ≤ f then g is integrable on R₂. (a) ∫D(n)gdA exsits (b) 0 ≤ ∫D(n)gdA ≤ ∫D(n)fdA (c) The numbers ∫D(n)gdA are an increasing sequence bounded above. (d) limn → ∞∫D(n)gdA exsits Answer: Check D : D = R² unbounded, g 0, continuous, so we need to prove limn → ∞∫D(n)gdA exsits. (a) g ≥ 0 is continuous on R₂ and D(n) is bounded for each n so g is integrable over D(n) (b) By theorem 6.9 Larea(D)≤I(f, D) and the fact 0g, we know that 0area(D)≤∫D(n)gdA so if 0 ≤ f(x, y)−g(x, y) then $$ 0=0 area(D)\leq f(x,y)-g(x,y) dA \leq f dA - g dA $$ Therefore, 0 ≤ ∫D(n)gdA ≤ ∫D(n)fdA (c) Let Cn = ∫D(n)gdA. Because g ≥ 0, D(n)≤D(n + 1). Then C₁, C₂, C₃...Cn is an increasing sequence. Since 0 ≤ ∫D(n)gdA ≤ ∫D(n)fdA and $$ f dA = f dA$$ exists, We got $$ g dA \leq f dA$$ (e) By the Monotone Convergence Theorem for sequences, ∫D(n)gdA increasing and bounded above is convergent so limn → ∞∫D(n)gdA = limn → ∞Cn exists 8. 6.50. Justify steps (a)–(d) to prove that if a continuous function f is integrable on an unbounded set D then |∫DfdA| ≤ ∫D|f|dA (a)∫DfdA = ∫Df+dA − ∫Df−dA ≤ ∫Df+dA + ∫Df−dA = ∫D|f|dA (b)∫D(−f)dA ≤ ∫D|f|dA (c)−∫DfdA ≤ ∫D|f|dA (d)|∫DfdA| ≤ ∫D|f|dA (a) By Definition 6.9, if f is continuous and integrable on an unbounded set D, then |f| is integrable on D. Rewrite f(x, y)=f+(x, y)−f−(x, y) where f+(x, y)=f(x, y) if f(x, y)≥0 and 0 otherwise, and f−(x, y)= − f(x, y)if f(x, y)≤0 and 0 otherwise. So, by the definition of ∫DfdA, $$\int_ {D} f dA=\int_ {D} f_+ dA - \int_ {D} f_- dA $$ Since ∫Df−dA is nonnegtive $$\int_ {D} f_+ dA - \int_ {D} f_- dA \leq \int_ {D} f_+ dA + \int_ {D} f_- dA$$ Since f+ ≥ 0 and f− ≥ 0 are integrable over D $$\int_ {D(n)} f_+ dA + \int_ {D(n)} f_- dA =\int_ {D(n)} (f_+ + f_-) dA$$ By the properties of limits of increasing sequence D(n), we know ∫D(n)(f+ + f−)dA converges so $$\int_ {D} f_+ dA + \int_ {D} f_- dA =\int_ {D} (f_+ + f_-) dA$$ By the equation f(x, y)=f+(x, y)−f−(x, y), we got $$\int_ {D} f dA \leq \int_ {D} \left|f \right|dA$$ (b) In the same way, we apply (a) to the functions − f to get $$\int_ {D} -f dA \leq \int_ {D} \left|-f \right|dA= \int_ {D} \left|f \right|dA$$ (c)By the properties of limits and the equation ∫D(n) − fdA=_ D(n) f dA ,weget$$- \int_ {D} f dA \leq \int_ {D} \left|f \right|dA$$(d)Ifbaand −b ≤ a then|b|≤a. From (a), we got $$\int_ {D} f dA \leq \int_ {D} \left|f \right|dA$$, From (b) and (c), we got $$- \int_ {D} f dA \leq \int_ {D} \left|f \right|dA$$ Therefore, we can conclude that $$\left| \int_ {D} f dA\right| \leq \int_ {D} \left|f \right|dA$$ 9. -4.21. Find the point on the plane $$z = x − 2y + 3$$ that is closest to the origin, by finding where the square of the distance between (0, 0) and a point (x, y) of the plane is at a minimum. Use the matrix of second partial derivatives to show that the point is a local minimum. Let $$ D=d^2 = f(x,y)= x^2+y^2+(x-2y+3)^2 $$, to find the local extrema we let $$\triangledown f = (4x−4y+6,−4x+10y−12)=0$$ at ( − 0.5, 1). so $$ H(-0.5,1)= \left[ { c c } 4 & -4 \\ -4 & 10 \right] $$ Because 4 > 0 and (4)(10) − (−4)2 = 24 > 0. So by the Theorem 4.3, it is positive definite. By theorem 4.8, If ▿f(A)=0 and the Hessian matrix [fxixj(A]) is positive definite at A, then f(A) is a local minimum. Therefore, f has a local minimum at point ( − 0.5, 1) 10. -7.32. Let S be the unit sphere centered at the origin in R³. Evaluate the following items, using as little calculation as possible (a)∫S1dσ (b)∫S||X||²dσ (c) Verify that ∫Sx₁²dσ = ∫Sx₂²dσ = ∫Sx₃²dσ using either a symmetric argument or parametrizations. Can you do this without evaluating them? (d) Use the result of parts (b) and (c) to deduce the value of ∫Sx₁²dσ Answer: (a) In geometry, ∫S1dσ means the area of the unit sphere in R³ So ∫S1dσ = π · 1³ = 4π (b) For all X S we have ||X||² = 1, therefore ∫S||X||²dσ = ∫S1dσ = 4π (c) Rotation by /2 about the x₃-axis corresponds to some transformation on the domain of the parametrization of S. We know that x₁ comes to the same position as x₂, Therefore ∫Sx₁²dσ = ∫Sx₂²dσ In the same way, make a rotation by π/2 about the x₂ , we got ∫Sx₁²dσ = ∫Sx₃²dσ Therefore, ∫Sx₁²dσ = ∫Sx₂²dσ = ∫Sx₃²dσ (d) By the definition of norm ||X||, we know that ||X|| = x₁² + x₂² + x₃² So, $$ ||X||^2 d\sigma= x_1^2 +x_2^2 +x_3^2 d\sigma= 3 x_1^2 d\sigma = 4\pi$$ Therefore,$$ x_1^2 d\sigma ={3} ||X||^2 d\sigma = {3}$$

CudaHashedNet Midterm Report

and 1 collaborator

The Design of HyperFETs

# Model

## Transistor

The transistor is modeled generically by a heavily simplified virtual-source (short-channel) MOSFET model \cite{Khakifirooz_2009}. Although this model was first defined for Silicon transistors, it has been successfully adapted to numerous other contexts, including Graphene \cite{Han_Wang_2011} and Gallium Nitride devices, both HEMTs \cite{RadhakrishnaThesis} and MOSHEMT+VO_{2} HyperFETs \cite{Verma_2017}. Following Khakifirooz \cite{Khakifirooz_2009}, the drain current *I*_{D} is expressed \begin{equation}
\frac{I_D}{W}=Q_{ix_0}v_{x_0}F_s
\end{equation} where *Q*_{iz0} is the charge at the virtual source point, *v*_{x0} is the virtual source saturation velocity, and *F*_{s} is an empirically fitted “saturation function” which smoothly transitions between linear (*F*_{s} ∝ *V*_{DS}/*V*_{DSSAT}) and saturation (*F*_{s} ≈ 1) regimes. The charge in the channel is described via the following semi-empirical form first proposed for CMOS-VLSI modeling \cite{Wright_1985} and employed frequently since (often with modifications, eg \cite{Khakifirooz_2009, RadhakrishnaThesis}): \begin{equation}
Q_{ix_0}=C_\mathrm{inv}nV_\mathrm{th}\ln\left[1+\exp\left\{\frac{V_{GSi}-V_T}{nV_\mathrm{th}}\right\}\right]
\end{equation} where *C*_{inv} is an effective inversion capacitance for the gate, *n**V*_{th}ln10 is the subthreshold swing of the transistor, *V*_{GSi} is the transistor gate-to-source voltage, *V*_{T} is the threshold voltage, and *V*_{th} is the thermal voltage *k**T*/*q*.

For precise modeling, Khakifirooz includes further adjustments of *V*_{T} due to the drain voltage (DIBL parameter) and the gate voltage (strong vs weak inversion shift), as well as a functional form of *F*_{s}. For a first-pass, we will ignore these effects, employ a constant *V*_{T}, and assume the supply voltage is maintained above the gate overdrive such that *F*_{s} ≈ 1. However, we will add on a leakage floor with conductance *G*_{leak}. Altogether, the final current expression (for the analytical part of this analysis) is \begin{equation}
\frac{I_D}{W}=nv_{x_0}C_\mathrm{inv}V_{th}\ln\left[1+\exp\left\{\frac{V_\mathrm{GSi}-V_\mathrm{T}}{nV_{th}}\right\}\right]+\frac{G_\mathrm{leak}}{W}V_\mathrm{DSi}\label{eq:transistor_iv}
\end{equation}

AEP 4830 HW9 Monte Carlo Calculations

The purpose of this homework is to explore the Monte Carlo Algorithm and apply it to the simplified protein folding model in 2D.

# Monte Carlo Method

Monte Carlo Method uses the randomly generated possible solutions to a certain problem in a solution space and test its degree of goodness based on certain physical requirements\cite{NumRec}. The ways of generating the possible solutions are usually two. First, we can generate the possible solutions totally at random. For example, we use random number generator to do Monte Carlo Integration. Second, we can generate the possible solutions from the previous step by randomly changing some parameters of the previous one. We will use the later one to generate our 2D protein structures in this homework.

The general flow of Monte Carlo Method is shown as follows. Note that we use the term “conformation space” instead of “solution space” since we are talking about protein structures here.

Start from a initial state in the conformation space.

Randomly change the previous state, subjecting to requirement 1.

Determine the degree of goodness by criterion 2.

Accept/ reject this state by physical rule 3.

If it is accepted, pass this state and repeat 2 through 5 for certain number of steps.

If it is rejected, do not pass the state and repeat 2 through 4 until the new state is accepted.

The requirement 1, criterion 2 and rule 3 are problem-specific and we will mention these in our protein folding problem.

# 2D Protein Folding

Proteins are composed of 20 different amino acids (AAs) in a polypeptide chain and due to the mutual interactions between those AAs, proteins will favor some folded states to lower the Gibbs free energy. The interactions are mostly negative because of hydrophobic effects or ion-ion interactions. In order for proteins to perform certain biological functions, their unique structures are essential. We can use a simple bead-and-chain model for a 2D protein chain\cite{S_ali_1994}, assuming that all the AAs are of the same size and the peptide bond between two AAs is rigid, being only one unit and unstretchable. Each AA occupies one grid point of the 2D space and cannot be in the same point of any other AAs. When protein folds, the non-covalent interactions apply to the two non-bonding AAs separate by one unit. An we can calculate the relative Gibbs free energy *Δ**G* by summing all the interactions of non-bonding neighbors.

\begin{equation}
\Delta G = E_0 = \sum_{(i,j)} E_{t(i)t(j)}
\end{equation} where (*i*, *j*) are the indices of two neighboring AAs of types (*t*(*i*),*t*(*j*)) and *E* is an 20 × 20 interaction matrix.

With this model in mind, we can determine the requirements mentioned in the previous section.

Requirement 1:

The modified AA cannot occupy other’s positions.

The modified AA must be one unit away from its neighbor(s).

The best way to modify an AA’s position is to move (1, 0), (1, 1), (0, 1), ( − 1, 1), ( − 1, 0), ( − 1, −1), (0, −1) and (1, −1), eight possible changes.

Criterion 2: Evaluate the interaction energy,

*E*_{0}and use this number to determine the goodness of the state. The lower, the better.

Rule 3:

If the new state has lower

*E*_{0}, the protein will adopt this state in order to reach the minimum of the folding landscape.

If the new state has higher

*E*_{0}, the protein does not favor such state. However, there is still some probability to jump from lower energy state to higher energy ones,*P*=*e*^{−(Enew − E0)/kT}.

Once the model is set and the steps are clear, we can start to do the simulation.

# Program Codes

First, we need a general random number generator, *myrand(seed)*. We will test its validity and then apply it to alter the position of a randomly selected AA. Given different *seed*, the function will give different random number sequences. Our seeds for generating interaction matrix *E* and the AA sequence in the protein are two distinct yet fixed value. So we will guarantee that we use exactly the same protein and interactions throughout the calculation. Other than those, the seed will be set by *time(NULL)* and independent of our bias.

Second, there are several subroutines to do the Monte Carlo calculations and to make sure the protein is subject to some requirements. Note that the information of proteins is stored in a 45 × 3 matrix with the first column being AA types, second the x positions and third the y positions.

*neighbor()*: inputs a Protein Vector and outputs the pairs of indices of two non-bonding AAs.

*Energy()*: input pairs of neighbor indices, Protein Vector, interaction matrix*E*and outputs the energy*E*_{0}.

*n2ndistance()*: inputs a Protein Vector and outputs the end-to-end distance of the protein.

*pcheck()*: inputs Protein Vector and the index of certain AA and check if that AA occupies others’ positions. The function outputs*true*if the protein is not allowed,*false*otherwise.

*conformationchange()*: input Protein Vector and make a position change to one of its AA and outputs a modified new Protein Vector.

# Results

First we tested the random number generator *myrand()*. The random number generator gives a uniform distribution of numbers between 0 and 1. And the points (*x*_{n + 1}, *x*_{n}) cover the 1 × 1 square without noticeable patterns as shown in Fig. 1. We further test it by estimating *π*.

\begin{equation}
\frac{\pi}{4} = \frac{N_{in}}{N}
\end{equation} where *N*_{in} is the number of points in the quarter circle and *N* is the total number of points.As the total number of points increases, the RHS will reach $\frac{\pi}{4}$ asympotically, as shown in Fig. 2.