 •  Cato

Some notes outlining progress on a question concerning Gibbs’ Paradox. Not everything will be totally cogent / concise, as they are working notes.

Project supervised by A. Grosberg at NYU, Autumn 2013.

Statement

There are two paradoxes bearing Gibbs’ name, both arising in the context of entropy in statistical physics.

The first paradox concerns the (apparently spurious) entropy gain from processes which leave the thermodynamic entropy unchanged.

The second paradox is closely related, and concerns the mixing of particles. Since the distinction between identical and non-identical particles can be made arbitrarily small, it is mysterious that there exists a dichotomy between the two cases when dealing with entropy generation.

Aside: resolution according to Jaynes

In REF?, Jaynes says that entropy increase has to be treated more “subjectively”. Entropy production is not absolute: if we cannot distinguish the properties of two mixing gases, then there is no entropy increase and no work required to un-mix them. If we can distinguish the gases, then this is no longer true. To repeat, if the particles are experimentally indistinguishable for whatever reason, Gibbs’ paradox is resolved.1

At first this seemed absurd: a colour-blind person calculates zero entropy increase when a box of green balls mixes with a box of red balls, but clearly he is wrong. We could use the mixing to do work (see section \ref{sec:workout}): would the colour-blind person also be blind to winches lifting weights?

In Smith et al. (1992), Jaynes elaborates and his story (originally due to Gibbs) makes more sense. The problem is that when the gases go from being non-identical to identical, our definitions of “reversible” and “original state” change; i.e. we have double standards. In the non-identical case, we want to separate all molecules originally in $$V_1$$ and put them back into $$V_1$$, and the same for $$V_2$$; but in the same-gases case we are happy to just reinsert the diaphragm without reference to the particles’ origins. But this is beside the point: remember that “reversible” applies to thermodynamic variables that we can measure, not to microscopic states. The entropy increase associated with the mixing process corresponds to the work required to recover the same family of microstates: in this case, the original separation of molecules into $$V_1$$ and $$V_2$$. The multiplicity $$W$$ is the size of the aforementioned “family”; but this depends on what we’re actually measuring about the macrostate.
Example2: imagine there are two types of argon, but current technology cannot distinguish them, so mixing them gives $$\Delta S=0$$. Then a new solvent, “whifnium”, is synthesised, and it is discovered that one type of argon is soluble in it, but not the other type. By putting this knowledge to use and constructing a setup involving whifnium, we can extract work from the mixing – an observable consequence which produces entropy, i.e. $$\Delta S>0$$. So the work extractable depends on “human” information. Identical (even microscopically) physical processes can be assigned different entropy depending on what we’re interested in. But more knowledge lets us extract more work. Conclusion: entropy not a physical property of microstate (as energy is), but an anthropomorphic quantity.

To do:

1. In the quantum realm, this indistinguishability may be true as a matter of principle, rather than being due to an insufficiently refined experimental capability.

2. From Section 5 of Jaynes’ paper in Smith et al. (1992).

Goal

To develop an information-theoretic approach to Gibbs’ Paradox. Quantify “distinguishability” in terms of information, and understand the thermodynamic consequences.

How much “effort” do we have to put into measuring the properties of e.g. a particle, to distinguish it from a similar particle? How does this affect the net extractable work? Does considering this effort add to our understanding of the paradox?

Try to develop a model (à la Mandal et al. (2012)) where the information of a particle is explicitly manifest (e.g. a DNA molecule, or the particle is a binary string). Show how the information allows us to extract work (or not), and what energy cost is associated with using the information.

Work from mixing gases

\label{sec:workout}

What is the entropy increase associated with mixing dissimilar gases, and how do we extract usable work from this?

Once two distinct gases, $$A$$ and $$B$$, mix, there is an increase in entropy and it will take some work to separate them again. We can also extract work from their mixing (maximum if quasistatic).

Questions:

1. What is entropy change?

2. Hence what is the maximum work extractable from the mixing process?

3. How does it vary with dissimilarity of particles?

4. How could one extract this work?

1. For distinguishable particles, mixing two volume-$$\frac{V}{2}$$ boxes of $$\frac{N}{2}$$ ideal-gas particles each incurs an entropy increase of $$\Delta S = N\ln2$$.

2. The maximum work extractable from this process is $$W_{\rm max} = NT\ln2$$. (This is also the minimum work required to recover the original state.)

3. The reason I ask is this: perhaps it is the case that increasingly dissimilar particles have more different attributes which can be harnessed to extract more work. However, there is clearly a maximum amount of work that can be extracted from any mixing process – we cannot gain infinite energy! This leads me to conclude that $$NT\ln2$$ is the maximum work out (for two containers of equal volume, with no initial pressure or temperature difference).

4. \label{pnt:mixwork} Essentially the way to extract work from mixing particles A and B is to have it so that the diaphragm is permeable to particles A and not to B. Then the partial pressure of A will push it over to B’s side, reducing the total pressure in side A. Then we can do $$p\,dV$$ work.

• How much work? Say we have two containers, 1 and 2, which have initial volumes $$V_1^{\rm i}$$ and $$V_2^{\rm i}$$ respectively. Box 1 contains $$N^{\rm A}$$ particles of type A and box 2 contains $$N^{\rm B}$$ particles of type B. Everything is in contact with a thermal reservoir at fixed temperature, and we ensure all processes happen sufficiently slowly that equilibrium is maintined at that temperature. For now, assume that the particles obey the ideal gas equation of state.
The protocol is as follows: the partition is permeable to particles of type A. They diffuse into box 2 until the partial pressures equalise. An impermeable but movable partition is then inserted, and the total pressures in the two boxes equalise by expanding the volume of box 2. This last stage is where work is done.
After some calculation, I find that $$W=-\frac{V_1^{\rm i}}{V}N^{\rm A}T\ln\frac{N^{\rm A}}{N}$$.

• For the simpler case where the boxes are initially the same size with equal particle numbers, $$W=NT\ln2$$ theoretically. But here, we get only $$W=\frac{1}{4}NT\ln2$$ for the same conditions. What’s going on?
Remember that at the end of this process, the gases are not totally mixed. Thus there is still some work left to extract. But does it add up when we take this into account? CHECK.

To do:

• Extension beyond ideal gas? – Hard, and perhaps not much to learn from the exercise.

Thoughts on work extraction

As noted in section \ref{sec:workout}, point number \ref{pnt:mixwork}, to extract energy from the mixing we have to allow particles to (isochorically) seep from one container to another without the converse process happening.

To achieve this, there needs to be some physical mechanism by which we can distinguish the two species and selectively allow ingress of one.

Some examples would be

• Diffusion through a medium in which one species is soluble and the other insoluble.

• Diffusion through a partially-permeable membrane, for which large or charged or ... particles cannot pass.

• Passive transport by membrane proteins, which operate on a lock-and-key type principle.

• Any lock-and-key process capable of distinguishing the properties of one species from the other.

Particle Sorting

Want to sort particles into two types.

Questions:

1. Does it require energy?

2. If not, can I violate T2?

3. Is there a trade-off between energy used and time taken?

4. Does the processing speed of Nature come into play?

5. Can I construct a membrane that biases mass exchange like a creel?

6. For work extraction at a give rate, what is dissipation and how does the energy out depend on difference of molecules?

1. I maintain that sorting is a Maxwell’s Demon process, and as such must require energy.

To do:

• Integrate into rest of document.

Distinguishing bit-strings

\label{sec:distinguished}

Let us model a particle as a string of bits: these bits encode all the information about the particle. Alternatively, we could think of the bits as representing a length of DNA or something (in this case, there would be two bits per base, or one “quit” per base). This is a contrived setup, difficult to realise in practice; but such a minimal model is useful for clarity.

Questions:

1. How much does it cost to write this information?

2. What is the cost of reading it?

3. What is the cost of comparing two strings with each other and noting discrepancies?

4. What is the meaning of temperature in this context?

1. Note: from my (incomplete) derivation of Landauer’s Principle in section \ref{sec:LP}, I have decided that the following reasoning is wrong. But I don’t quite know why! I leave the following here for now, but my conclusion is that it may not cost as much to write definite information if the prior bit-distribution is not flat. Unfortunately, this means that everything that comes after might need adjusting. There must be some cost to writing information, because we are modifying blank DOF (bits) into new states, which necessarily deletes pre-existing information1. This point seems to go unaddressed in the literature: writing information to an initially blank register increases the “entropy” (in the narrow sense of $$S=-\sum_ip_i\log_2p_i$$ for a very long string – see section \ref{sec:stringS}) of the register, and so there’s no reason to require any dissipation. But here’s why I think it must be there nonetheless.

• We don’t know a priori whether the writing process will result in a net increase or decrease of the string’s “entropy”. So, if we believe in cause-and-effect, and we accept that the end result is not known by the register until it has been realised, we should accept that heat dissipation takes place in the same way for each bit written. It is the result of physical processes, immutable by patterns in the outcome.

• Another way of thinking of the problem is in terms of logical reversibility. A “reset all bits to zero” operation is irreversible because we cannot recover the original data after the operation. The same applies here: we would want our writing protocol to work regardless of the initial state of the register, so we always end up with the data we want in the register.

• We are changing the physical state of a physical system by writing our data to the register.

• One might think that we are taking an initial random distribution of bits on the register and making it a determined distribution (reproducible if we re-run the measurements on the particle under scrutiny). But we really have no idea about whether the pre-existing string was random or generated by some other deterministic process. Put another way, if all we have is the bit-string, we can only ever estimate the probabilities and correlations of the string’s generator; we can’t reconstruct the process by which that sting was made (unless we know what the process was and whether it is logically reversible).

So even if the states are energy-degenerate, which can be arranged, the cost per bit written should be $$kT\ln2$$, regardless of the initial state of the register.

2. In principle, there is no cost to reading information unless you want to remember it2. If this is the case, see point 1.

3. Let us use an XOR gate; this takes two binary inputs, and outputs 0 if they are the same or 1 if they are different3.

• Let the output be written to a register (not necessarily blank). This will require the expenditure of at least $$NT\ln2$$ of work.

• When the comparison is complete, all the 1s in the register are summed (e.g. using a carry-lookahead adder). The total is a measure of the difference between the two particles (note this loses information about where the differences were, but I don’t think that’s so interesting). Traditionally, these require writing of one extra bit per addition column, but I think we can ignore this for now as it’ll probably be much smaller than writing to the register with the XOR.

Thus we find that more detailed comparison of the particles, where we inspect more bits, takes proportionally more energy. Specifically, to inspect $$N$$ bits requires at least $$NT\ln2$$ of energy.

4. Too much to tackle now – there is a vast literature on error correction. Presumably one just needs to work out how many extra erasures there are.

To do:

2. Derive energy to erase a bit as function of source entropy.

1. Landauer’s Principle (see Landauer (1961) for original work).

2. Or deal with quantum mechanics.

3. For our purposes, we would need the gate to also output 1 if only one input is given.

Information and Physical Processes

In the previous section, we thought about how to distinguish two strings of bits, and how much energy that would take a computer. In this section, we try to relate those considerations to physical processes: how does a physical process distinguish different objects?

More concretely, does a physical process need to perform work?

Simple model

A string of bits floats to a door and wants access. The door will open if the string of bits holds the “password”, i.e. some length of information which is compatible with the door’s mechanism (like a lock and key).1 How does the door check the password? Is work performed?

Real life

It is not difficult to extend this reasoning to real life – e.g. for a particle to diffuse (selectively) through a membrane, does the membrane have to do work? Does a cell get colder when molecules diffuse in and out?

To do:

1. Tidy and expand!

1. Think of facilitated diffusion through trans-membrane proteins.

Energy balance: two particles

\label{sec:net2}

Combine the results of sections \ref{sec:workout} and \ref{sec:distinguished} to find out how the extractable work compares to the energy needed to distinguish the particles.

Questions:

1. What level of “difference” between two particles allows us to extract work from them?

2. Derive an expression for the net energy balance for mixing two particles.

1. A difference between the particles of one bit would in theory be enough to extract the work, as this bit may correspond to any physical property.

2. Say that in order to establish a difference between two particles, I need to inspect them to a level of precision such that $$N$$ bits have been compared, requiring that I spend energy $$NT\ln2$$. The work which I can get out of the mixing is just $$2T\ln2$$ (assuming same $$T$$ for now), so the balance is $\Delta W = (2-N)T\ln2.$ Clearly this is not worth it.

Energy balance: $$n$$ particles

\label{sec:netn}

Extend the work of section \ref{sec:net2} to $$n$$ particles.

Let us assume there are only two types of particle (?).

Questions:

1. What is the energy balance for $$n$$ particles inspected to a level of $$N$$ bits?

2. Is there some optimum level of inspection (if we know pdf, say)?

1. For $$n$$ rather than two particles, this problem is more combinatorial. Assuming there are only two kinds of particle, and it requires inspection of $$N$$ bits to determine which category a given particle belongs to, what is the point?

To do:

1. Work out what the context for the experiment should be.

2. Simple calculation of how many bits written to compare $$n$$ particles.

Example: a 24-bit particle

Here we put the conclusions of section \ref{sec:net2} into practice.

Take two “particles”, represented by bit-strings (say 24 digits each). A bit corresponds here to the information-content of the particle, OR to the value of some binary thermodynamic variable.

Coarsen the particles by various amounts to reflect depth of measurement, so that the same particles are represented by 12, 8, 6, 4, 3 and two bits each1. How does the “effort” of measurement compare with the entropy change / work accessible?

The particles

Here is a list of particle A and particle B seen at decreasing resolution.2

24-bit
0011 1010 1000 1110 0110 1011
0011 1010 1000 1110 1110 1011

12-bit
01 01 00 11 01 01
01 01 00 11 10 11

8-bit
01 00 10 11
01 00 11 11

6-bit
01 01 11
01 01 11

4-bit
00 11
00 11

3-bit
0 1 1
0 1 1

2-bit
01
01

Inspection

We inspect the particles and note the difference between them. The energy necessary to do this is listed.

To do:

• Decide on an experimental context.

• Make a model which can extract work from the difference.

• Rather than increasing “resolution” could think about increasing compression – e.g. dynamic dictionary constructions.

1. When coarsening a section of the string with equal numbers of 0s and 1s, I should choose one of the two answers at random with equal probability.

2. There is some redundancy here, in that there are more different resolutions listed than are necessary to make the point.

Derivation of Landauer’s Principle

\label{sec:LP}

I calculate, using a Hamiltonian model of thermal reservoir and the bit itself, that the average work done to reset a bit with initial probability $$p_1=p_0=1/2$$ is $$\langle W\rangle \geq T\ln2$$ (Piechocinska 2000).

When the initial probabilities are arbitrary, with $$p_1=\gamma$$ and $$p_2=1-\gamma$$, I derived the following equation which seems wrong. $\beta\langle W\rangle \geq \int_0^\infty\alpha\ln\alpha\;dx\,dp + \ln2 - \frac{1}{2}\gamma\ln\gamma - \frac{1}{2}(1-\gamma)\ln(1-\gamma),$ where $$\alpha$$ is the initial probability distribution in $$(x,p)$$ for $$p_1=p_0=1/2$$. But the answer I want is $\beta\langle W\rangle \geq -\gamma\ln\gamma - (1-\gamma)\ln(1-\gamma),$ i.e. lose the first two terms and the factor of 1/2.

To do:

1. Correct derivation of LP for arbitrary initial probabilities.

Entropy of a bit-string

\label{sec:stringS}

Why is this interesting? Shannon entropy estimates the “surprise” value of some information; if we want to compare two information sets, we may want to build up an intuition of the properties of one set, and see how easy it is to predict the properties of the other.

Not that people are often a little loose with vocabulary – entropy is a property of distributions, not (technically) of individual sequences. We observe the sequence as a variate from some parent distribution, and estimate the entropy of the parent. For this approach to succeed in complicated cases, we will need a long long sequence (see e.g. Farach etal. 1994).

When bits are written randomly and independently, the entropy of the source is given by $H = -\sum_i p_i \log_2 p_i. \label{eqn:indep_entropy}$

However, there may be larger-scale patterns in the string. Then we have to make more effort (WHY?). For example, 0101010101 has the same entropy as 010011000111, but clearly there’s something different going on! I see two cases, which have to be treated differently:

1. The bits come in chunks. Then we have to “renormalise” by using a different basis. E.g. DNA code should be expressed in base 4. For an alphabet of $$m$$ letters (binary: $$m=2$$; DNA: $$m=4$$) the corresponding informational entropy is $H = -\sum_i p_i \log_m p_i.$ We can be more sophisticated than this with dictionary-definition algorithm e.g. Lempel-Ziv comma-counting, which captures repetitive structures (but perhaps not the right ones!). Entropy can become very low as chunks get larger (do we talk about a book in the set of books, or the information content of a book?).

2. If the patterns that arise don’t correspond to larger units of information that can be re-labelled (as in point 1), perhaps we have to experiment with conditional probabilities, i.e. assess probability strings with memory of whole sections of the string.
The simplest way to do this seems to be the so-called “Markov model” (see section \ref{sec:markov} below). But what we really need is to construct the correlation function of the data.

Markov model

\label{sec:markov}

This is actually much less useful than I originally thought, but here it is anyway.

• A zeroth-order Markov model assumes each character is independent of the one before, so the entropy is given by equation \ref{eqn:indep_entropy}.

• In a first-order Markov model, the probability of each character ($$j$$) depends on the character immediately preceding it ($$i$$). Then, $H = -\sum_i p(i) \sum_j p(j|i) \log_2 p(j|i).$

• And so on for higher orders.

Aside: counting microstates – thermodynamics versus statistical physics

This section considers the difference between thermodynamic and statistical entropy: they are not the same quantities, and perhaps researchers are too keen to force them to be so. Not sure yet whether this is relevant to my task.

If you are not careful, answers you get for statistical physics entropy disagree with the thermodynamic result (the classic example is removing the partition between two identical boxes of gas). The resolution is usually to claim that this is evidence for QM: in QM, identical particles are indistinguishable, and this changes the multiplicity by a factor $$1/N!$$. An unacceptable fudge?

Do we believe this invocation of QM into a classical theory? It seems very ad hoc.Where else does it appear? What about phase-space quantisation, where a factor of $$1/\hbar^3$$ accompanies the integrals over phase space? Note that, according to Jaynes (“ The Gibbs Paradox”, 1992), Gibbs himself resolved the paradox without recourse to quantum mechanics. MORE.

What happens for smaller $$N$$ when Stirling’s approximation is not applicable?

Does the resolution even work for QM scenario? Are all identical particles in QM indistinguishable?

The macrostate of a system is “the same” after permutation of particles – does this mean physical permutation (where particles follow a trajectory) or non-physical (where particles are switched in your mind)? Does it matter in the derivations? I think so, because deciding whether particles’ history / capacity is important (i.e. do their trajectories matter, or just snapshots of the system?), tells us how to count physically different microstates. (Does physically different imply statistical-physically different?)

A microstate may not be physically accessible from current microstate (though it has the same state vectors) – should ergodicity go unquestioned? Multiplicity should be the number of ways a microstate can be reached via a physical pathway.

In classical mechanics, I can label the particles 1, 2, ... If I draw on them, is this different from labeling them in my mind and keeping track with a videocamera? Where does entropy change come from in the latter case, given only I know about the labeling?

To do:

1. Tidy up this section.

2. Is there relevance to my question (rather than broader considerations)?

Misc

If $$S_{\rm stat} \neq S_{\rm therm}$$, can we dispense with the $$\ln$$?

$$S_{\rm stat}$$ can decrease if phase space changes through outside intervention. What does this say about Fluctuation Theorems – does their phase space ever change?

What is the effect of renormalising on the information content? Read Lesne (2011). Entropy decreases when probability distribution gets flatter, e.g. by averaging. Coarse-graining also reduces the entropy (section 2.5). Section 5.2 deals with compression.

Duhem’s Theorem: ‘whatever the number of phases, of components or of chemical reactions, the equilibrium state of a closed system, for which we know the initial masses m1, ... mc, is completely defined by two independent variables.’. The equilibrium state is affected by flows of work and heat, so we only need two state variables (?).

Some Relevant Papers

Entropy estimation

Information theory

• Ostrowski (2010) suggests that the minimum energy to (reversibly) copy one bit of information is $$\ln4/\beta$$. Uses a quantum system, and assumes low signal-to-noise.

• Bennett (2003) – reversible computation, Landauer

• Bennett (1982) – computation

• Landauer (1999) Information as physical entity

• Plenio et al. (2001) – general discussion of information and physics. Uses LP to derive results of TD and QM.

• Landauer (1961)

• Piechocinska (2000) – Landauer’s Principle derived without reference to Second Law. First considers energy-degenerate bits, and demonstrates LP for continuous, discrete, and quantum regimes. Then breaks degeneracy. Finds maximum work expended when initial state has highest entropy, and work is zero for zero entropy.

• Shizume (1995) – Landauer microscopic derivation

Foundations

• Jaynes (1965) discusses the Boltzmann and Gibbs entropy functions (defined in terms of $$6N$$-dimensional distribution functions). $$H_{\rm G}$$ seems to be more general, though more difficult to manage, as it accounts for interactions between particles limiting the weight in some regions of phase space. $$H_{\rm G}$$ is shown to be correct after all. It will agree with experimental values, up to an additive constant. Section IV: volume of “reasonably probable” phase space is independent of “reasonably” in the thermodynamic limit. Upshot: $$H_{\rm G}$$ corresponds to Boltzmann’s equation, at least in the thermodynamic limit. Entropy defined as $$S=k\ln W$$ is a generalised entropy, applicable to nonequilibrium states; it comes from Liouville’s theorem and doesn’t need any canonical distributions etc. Really interesting insights about anthropomorphic entropy – not a property of a physical system, but of the experiments we choose to perform. What is the specific question we are trying to answer?.

• Jaynes (1980)

Maxwell’s Demon