<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Deep time</title>
	<atom:link href="http://cosmicvariance.com/2005/08/22/deep-time/feed/" rel="self" type="application/rss+xml" />
	<link>http://cosmicvariance.com/2005/08/22/deep-time/</link>
	<description>Random samplings from a universe of ideas</description>
	<pubDate>Tue, 07 Oct 2008 08:25:13 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Slacker</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-2051</link>
		<dc:creator>Slacker</dc:creator>
		<pubDate>Thu, 25 Aug 2005 16:09:40 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-2051</guid>
		<description>Cox has a stupid face.</description>
		<content:encoded><![CDATA[<p>Cox has a stupid face.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steinn Sigurdsson</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1770</link>
		<dc:creator>Steinn Sigurdsson</dc:creator>
		<pubDate>Tue, 23 Aug 2005 02:55:14 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1770</guid>
		<description>Oh my, now we're all, like, serious and stuff.

First, lets remember that there is a qualitative difference between infinite time and merely a very long time.

Now, let me present a toy model of the universe. Fans of some recent speculation may feel free to treat it as a real model of the universe.


Consider a 2-D Euclidean sheet.
Divide it into unit pixels.
Without loss of generality, let each pixel have two states, 1 and 0.

Let there be some initial time, T0, and let time advance in discrete step, dT.

Each pixel may change state according to some rules, based only on those nearby pixels in "causal contact" (ie after N steps, only pixels whose distance, s, is less than NdT away affect the pixel), for some metric on the space.

Now, assume that "a priori" if you sample a patch of this space of k pixels, the probability of any microstate is just 1/2^k

Now, there are logically 4 possibilities

a) finite space and finite time

b) finite space and infinite time

c) infinite space and finite time

d) infinite space and infinite time

So, in any of the 4 possibilities above, is it logically possible for all finite substates to be generated?

For a) and b) there are only finite number of allowed states, in either case it is possible, but not necessary that all states are accessed (given long enough a finite time)

For c) clearly you can NOT access all states;

so the only remaining option is d).  For which we can ask, whether all possible finite states are reached somewhere on the sheet at some time.

The answer to that is "depends" - it depends on the sheet "initial condition" and on the rules.

Further, I would confidently claim that for such a system, in fact for any infinite system, the answer to whether all states are reached or whether some finite, or infinite, subset is never reached is formally indeterminable for most system rules for changing states (because for a lot of rules this reduces to the Turing halting problem).

So there.  We may have infinite time and either finite or infinite space, but not only is it logically possible that some states are never reached, the actual answer may be unobtainable.

Any resemblance to holography or Wolfram's speculations is a pure coincidence.

The possibility of continuous states vs discrete states is interesting; in QFT there is an assumption of asymptotice static and flat background space, this does not hold in reality, and given our actual cosmology combined with finite speed of light the question of true continuous quantum states is somewhat ill determined.</description>
		<content:encoded><![CDATA[<p>Oh my, now we&#8217;re all, like, serious and stuff.</p>
<p>First, lets remember that there is a qualitative difference between infinite time and merely a very long time.</p>
<p>Now, let me present a toy model of the universe. Fans of some recent speculation may feel free to treat it as a real model of the universe.</p>
<p>Consider a 2-D Euclidean sheet.<br />
Divide it into unit pixels.<br />
Without loss of generality, let each pixel have two states, 1 and 0.</p>
<p>Let there be some initial time, T0, and let time advance in discrete step, dT.</p>
<p>Each pixel may change state according to some rules, based only on those nearby pixels in &#8220;causal contact&#8221; (ie after N steps, only pixels whose distance, s, is less than NdT away affect the pixel), for some metric on the space.</p>
<p>Now, assume that &#8220;a priori&#8221; if you sample a patch of this space of k pixels, the probability of any microstate is just 1/2^k</p>
<p>Now, there are logically 4 possibilities</p>
<p>a) finite space and finite time</p>
<p>b) finite space and infinite time</p>
<p>c) infinite space and finite time</p>
<p>d) infinite space and infinite time</p>
<p>So, in any of the 4 possibilities above, is it logically possible for all finite substates to be generated?</p>
<p>For a) and b) there are only finite number of allowed states, in either case it is possible, but not necessary that all states are accessed (given long enough a finite time)</p>
<p>For c) clearly you can NOT access all states;</p>
<p>so the only remaining option is d).  For which we can ask, whether all possible finite states are reached somewhere on the sheet at some time.</p>
<p>The answer to that is &#8220;depends&#8221; - it depends on the sheet &#8220;initial condition&#8221; and on the rules.</p>
<p>Further, I would confidently claim that for such a system, in fact for any infinite system, the answer to whether all states are reached or whether some finite, or infinite, subset is never reached is formally indeterminable for most system rules for changing states (because for a lot of rules this reduces to the Turing halting problem).</p>
<p>So there.  We may have infinite time and either finite or infinite space, but not only is it logically possible that some states are never reached, the actual answer may be unobtainable.</p>
<p>Any resemblance to holography or Wolfram&#8217;s speculations is a pure coincidence.</p>
<p>The possibility of continuous states vs discrete states is interesting; in QFT there is an assumption of asymptotice static and flat background space, this does not hold in reality, and given our actual cosmology combined with finite speed of light the question of true continuous quantum states is somewhat ill determined.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Saucy Wench</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1768</link>
		<dc:creator>Saucy Wench</dc:creator>
		<pubDate>Tue, 23 Aug 2005 01:46:33 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1768</guid>
		<description>Yeah, I read all that.  *rolls eyes*</description>
		<content:encoded><![CDATA[<p>Yeah, I read all that.  *rolls eyes*</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Maynard Handley</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1751</link>
		<dc:creator>Maynard Handley</dc:creator>
		<pubDate>Mon, 22 Aug 2005 20:21:06 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1751</guid>
		<description>I so damn sick of people not understanding this. 
Here is an essay from my wiki (not yet on-line) that explains EXACTLY what is going on here. 
(It's in mediawiki syntax, but even so should be pretty easy to read. 
The only thing that's probably not clear is {{sc&#124;pdf}} means PDF in small caps.)


=The Second Law of Thermodynamics and Boltzmann's H-Theorem=

==The issue==
When one reads about statistical mechanics, both in textbooks and in popular
works, some misunderstandings on the subject that date from the late 1800s 
and early 1900s still remain common. These tend to cluster around three 
issues:

* The second law of thermodynamics is absolute but a particular system can evolve towards a "less likely" state.
* The reversibility and recurrence "paradoxes".
* Are the systems of classical mechanics ergodic?

Physicists of the modern world have basically made their peace with the first 
point. Although it was a big deal in the late 1800s, and considered a blow 
against kinetic/atomic theory, no-one nowadays has a problem with the reality 
that ice-cubes in the sun melt, alongside the theoretical possibility that 
just once, this time, liquid water placed under a warm sun might freeze.
Even so, the understanding of what this actually means mathematically
is pretty limited and, if pressed, the details are usually wrong.

The second point, even more so, is usually completely botched, and the 
horrible explanations given for it usually poison the understanding of the 
other two points. So with this in mind, let's examine the issue properly.

==A little history of the H theorem==

In the mid-1800s Clausius came up with the idea of entropy and the second 
law of thermodynamics. At this time, recall, the very idea of atoms was 
controversial; some felt that the concepts of thermodynamics were primal and 
did not need to be justified or derived from models of how the world was 
constructed, while at the same time others were pursuing the kinetic theory 
of gases and trying to use its successes to prove the existence of atoms. 

Against this background, the most significant thing yet proved about the 
kinetic theory of gases was Maxwell's velocity distribution. But some people 
were unhappy with various aspects of the proof. The proof then, just like the
proof one usually sees today, assumed that the velocities of interacting 
molecules were uncorrelated, something some felt was not justifiable. 
(On the other hand, by making this assumption, the proof showed the 
generality of the resultant distribution regardless of whatever details 
one might assume of the interaction of the molecules.)
Boltzmann, to deal with this, came up with the Boltzmann Transport Equation,
which more explicitly dealt with the interactions. It was fairly easy to show 
that a maxwellian distribution was static under this equation, in other words 
would not change with time. But Boltzmann wanted to show something more; that 
any other distribution would monotonically evolve towards the maxwellian 
distribution. 

To this end he defined a quantity (which we now call H), a property of a 
particular distribution, and proved two things

* dH/dt = 0 for a maxwellian distribution and
* dH/dt 
The sad thing is that this same 
mishmash of poorly thought-out arguments and counter-arguments still appears
in today's textbooks. I remember being bugged by the sloppiness of these 
arguments back almost twenty years ago when I was an undergraduate.

None of this is necessary --- there's a perfectly good, perfectly simple 
explanation for what's going on that doesn't require this handwaving.
However to get to that point, we need a slight detour. I'm going to give the detour in 
more detail than is needed just to deal with this problem because the ideas are 
interesting, worth remembering, and best understood in a non-thermodynamics
context than hasn't been poisoned with invalid arguments.

==Data compression==

Let's switch to an apparently very different problem, the problem of data 
compression as performed by computers. Data compression consists of two parts.

===modelling===
The first stage, called modeling, transforms fragments of the data in some 
way so as to generate a stream of so-called symbols. Modeling varies from 
compression scheme to compression scheme --- in JPEG, for example, it 
involves, among other things, splitting the image into 8x8 blocks and 
performing a 2D DCT (something like a Fourier transform) on the data in each
8x8 block. 

Modeling is specific to each compression scheme and the details do not 
matter to us. What matters is that after modeling the result is a stream 
of what we might abstractly call symbols. Suppose that our modeling results 
in symbols that can have values 0..255. The simplest way to store these values
would simply be to use 8 bits for each symbol. This, however, would be far 
from optimal if some symbols are very much more common than other symbols. 

===entropy coding===
What is done in data compression is to encode the symbols using what is called 
entropy coding. Entropy coding comes in two forms, Huffman coding and 
Arithmetic coding. 

{{infobox1&#124;A second part of the theory 
is that the bit stream you construct has to be readable, even though there 
are no markers between the (variable length) bit strings indicating where
one stops and the next starts. This implies that the collection of bit strings 
you use has to possess what is called the prefix property.}}
Huffman coding uses shorter strings of bits for symbols 
that are more common, and longer strings of bits for symbols that are less 
common. The theory tells you (given the probabilities of different symbols) 
the optimal way to map symbls onto bit strings. 

Arithmetic coding achieves the same goal as Huffman coding, namely using fewer
bits to encode the more common symbols, in a way that is somewhat more 
efficient than Huffman coding, but quite a bit more difficult to understand. 
However it's not relevant to our discussion.

===an example: compressing english text===
So, given what we have said above, suppose we want to compress some data. 
To avoid getting bogged down in irrelevant details, let us assume that the 
data we want to compress is English language text encoded using 8-bit ASCII
using LATIN-1 high-bit encoding,
and that we are going to ignore the modelling stage of compression. 

So the problem we have given ourselves is that we have symbols which are 8-bit 
ASCII characters, 0..255. Right away we know that some characters are going 
to be far more common than others. The characters with the high bit set (ie 
with a value &#62;127) are highly unlikely. These refer to diphthongs accented 
characters, punctuation symbols rarely used in English and so on.
Punctuation characters are less likely than many letters, and capital letters
are less frequent than lower case letters. 
Certain letters are much more likely than other letters.

====the probability distribution function for english text====
Compression is all about having an accurate mathematical model of the 
probability structure of the data.
As a first approximation, we can consider the probability of each individual
ASCII character. This gives us an array of 256 probabilities. In some vague 
sense that philosophers can argue over, there is presumably some sort of 
"ideal" probability distribution function ({{sc&#124;pdf}}) for English language text 
that incorporates all text that has been and can be written, and that's what 
our compression program is targetting. But, of course, we can't just conjure
up that ideal, so what we do is gather a large body of what we hope is 
representative English text, calculate the empirical (as opposed to ideal) 
statistics for that text, and treat those (sample) statistics as representative
of all English text and thus equal to our philosophical ideal. 
We can then use these empirical probabilities to 
construct a Huffman code (or to drive an arithmetic coder), and we have a 
way to compress English ASCII text.

===the mathematical entropy associated with any discrete {{sc&#124;pdf}}===
Now let's step back a little from this example and consider the general 
issue. As '''mathematicians''', we can define a quantity, named
the '''mathematical entropy''', for any {{sc&#124;pdf}}. The entropy is defined as 

	S=-Sum[ probability(symbol)*lb( probability(symbol) ), 
	  summed over all symbols ]

where lb() is the binary log (ie log to base 2) of a number. 

This may seem a bit much to take in, but really it's not hard. 
Let's assume we have four symbols, A, B, C, D, and that the probabilities are
	(A, 1/2) (B, 1/4), (C, 1/8), (D, 1/8) 
The entropy associated with this {{sc&#124;pdf}} is 1*.5 + 2*.25 + 3*.125 + 3*.125 =1.75. 

Note that perfect entropy 
coding of a collection of symbols with some given {{sc&#124;pdf}} means that each symbol 
will take, on average, -lb( probability(symbol) ) bits to encode. 
(Probabilities are less than one, the log is negative, so we add a minus 
sign to make the result positive.)
So perfect entropy coding of our example would utilize 
1 bit to encode an A, 2 bits to encode a B, and 3 bits to encode a C or a D. 
It should be obvious from the above calculation that the entropy of the {{sc&#124;pdf}} 
is nothing more than the average number of bits required per symbol to perfectly 
entropy encode a stream of data conforming to this {{sc&#124;pdf}}.

{{infobox&#124; arithmetic coding &#124;
In fact arithmetic coding entropy encodes data using a non-integral number 
of bits per symbol, so we can actually approach perfect entropy coding in real 
computer programs. This is a pretty neat trick, and I'd recommend you read 
up on how it is done if you have time.
}}
You may wonder what happens when the probabilities of a symbol are not nice 
power-of-two probabilities as in the example. In that case, Huffman encoding 
cannot generate perfect entropy coding results, because the length of a 
Huffman code is obviously some integer number of bits, while the perfect 
entropy code might be some irrational number of bits, say 3.7569... 
In this case the average number of bits required to Huffman encode the symbol
stream will be larger than the entropy; the entropy is a lower bound, the 
absolute best we can do.

There are, of course, different {{sc&#124;pdf}}s for different
sets of material we may consider compressing, for example the statistics, 
and thus the {{sc&#124;pdf}}, associated with the set of all photos, information very 
valid to the design of a compression scheme like JPEG, are very different 
from the statistics for English language text.

===entropy is a property of a {{sc&#124;pdf}}, not a finite sample from that {{sc&#124;pdf}}===
At this stage we now need to point out an essential point,
'''the''' essential point to understanding this stuff, both in the context
of data compression and later in the physics context:
'''The {{sc&#124;pdf}} describing the distribution of symbols is a property of some abstract infinite stream of symbols, for example some vague idea of the set of all English text.'''
Now the properties of a {{sc&#124;pdf}} will almost certainly be measured empirically, 
using as large a collection as is feasible of the type of material we want 
to compress, for example a large collection of English documents. 
From the statistics of this sample stream, an estimate of the entropy of 
the {{sc&#124;pdf}} governing these symbols is then a simple calculation.
The {{sc&#124;pdf}} is, however, some sort of ideal entity not linked to
the particular sample material we used; the particular symbol stream
used to design a compression algorithm is simply regarded as a 
representative sample from an infinite stream of symbols.

===a misleading concept. the "entropy" of a finite sample===
Switch now from the idea of all English text to focus on a particular 
piece of English text, a particular file we wish to compress.
For any '''specific''' piece of English text, we can compress the stream of 
symbols using an entropy coder and the {{sc&#124;pdf}} for English text, and the 
compressed data will have some size, meaning some average number of bits 
per symbol. 
We can call this, if we want, the entropy of this '''specific''' piece of English 
text, but it is conceptually a very different thing from the mathematical 
entropy we defined for the English language {{sc&#124;pdf}}. This specific entropy (ie the 
average number of bits required per symbol to represent the text) may be 
rather larger than the entropy of the English language {{sc&#124;pdf}} (for example the 
text may be something written by James Joyce, or an article about words to 
use in scrabble), or this specific entropy may be less than that 
of the English language {{sc&#124;pdf}} (for example the text may
be written for children, and may utilize only short simple words with 
very little punctuation).

===if you want to learn more about data compression and entropy coding===
{{infobox&#124; correlation between symbols &#124;
The most important subject we have omitted from the discussion above, 
interesting but not relevant to where we are going with this, is 
exploitation of the correlation between 
successive symbols to reduce the number of bits required for compression, 
something that gets us into Markov models. (An obvious example is that the
letter q is almost always followed by the letter u, and surely a compression
scheme should be able to exploit that somehow.) 
While Markov models are a 
theoretically powerful method of doing so, there are severe practical 
problems with using them because of a combinatorial explosion in the number 
of probabilities one has to keep track of. The major goal of modeling 
is to attempt to restructure the data stream from its initial form, where 
there are obvious correlations between various pieces of data, to some 
intermediate form whose symbols are, as far as is practical, independent of 
each other. How best to do this clearly depends on the type of data and the 
techniques used for text, still images, video, general audio or speech are 
all very different.
The rest of the book is concerned with the details of the modeling used by 
JPEG2000 --- fascinating but very dense.
}}
If you are interested in the details of entropy coding beyond what I've 
discussed, IMHO by far the best introduction is Chapter 2 of 
[http://www.amazon.com/exec/obidos/tg/detail/-/079237519X/qid=1090272956/sr=8-1/ref=sr_8_xs_ap_i1_xgl14102-5724443-9727327?v=glance&#38;s=books&#38;n=507846 the JPEG2000 book by Taubman and Marcellin].
(This is an expensive book and, unless you are really interested in the 
subject, you probably won't want to read most of it, so I'd suggest borrowing 
a copy from a library or a friend rather than buying it.)



==The H theorem refers to {{sc&#124;pdf}}s, not samples==


{&#124; style="float:right; margin-left: 1em; width:50%;" cellpadding=5 cellspacing=1 border=0
&#124;-
&#124;align=left width=100% style="background-color:#f3f3ff; border:1px solid"&#124;
'''physics entropy rather than cs entropy'''

Note that the explanation above utilized by logarithms to base 2 to calculate the 
entropy for the purposes of computer science. In physics, with a different set of 
concerns we calculate entropy using logarithms to base e, but the essential points
remain the same. 
Note also that the explanation above dealt with a discrete {{sc&#124;pdf}}. 
There are interesting technical mathematical challenges when one goes from a 
discrete {{sc&#124;pdf}} to a continuous {{sc&#124;pdf}}, like for example, a gaussian, but 
we will ignore those and focus on the important thing which is that, after all 
the pain of proving the results, the bottom line is that our ideas from discrete 
{{sc&#124;pdf}}s map over to continuous {{sc&#124;pdf}}s pretty much as we'd expect.
&#124;}

With the above detour out the way, let's return to Boltzmann; 
perhaps you can already see what the fundamental issue is.
Boltzmann's theorem refers to '''{{sc&#124;pdf}}s'''. It says that the time evolution of a 
{{sc&#124;pdf}} occurs in a certain way. 
Meanwhile the reversion and recurrence paradoxes refer to 
specific instances of a mechanical system, '''not''' to {{sc&#124;pdf}}s. As such, what they 
do or don't say is irrelevant to Boltzmann's theorem. 

===a rigorous mathematical view of the Boltzmann transport equation===
More specifically we can say that, from the point of view of a nicely 
manageable mathematical structure, we want to talk about {{sc&#124;pdf}}s. 
We can, as mathematicians, define a mathematical structure that is a function 
of space and time and that has as its value at each space-time point a value
which is a probability density function for a velocity. This is a more careful,
more explicit way of defining the function of Boltzmann's transport equation.
If we now define a way in which this {{sc&#124;pdf}}-valued function evolves with time
(the Boltzmann transport equation) we have a perfectly consistent well defined 
mathematical problem. We can now prove various properties of this 
mathematical system, one of which is that (assuming various properties of 
specific transport equation we're using), the entropy of the {{sc&#124;pdf}} associated 
with each spatial point is monotonically non-decreasing. 
(This mathematical result holds for any {{sc&#124;pdf}}, but is physically only useful for 
situations where a {{sc&#124;pdf}} plausibly suggests itself. 
For the most part such situations are either equilibrium [ie the 
pdf is the maxwellian-boltzmann distribution], or "different equilibrium at 
different points of space" eg a gas with some non-uniform temperature 
distribution. )

===a real world view of a collection of molecules===
OK, this is a fully consistent mathematical construction.
However to some extent in the real world, we don't deal with {{sc&#124;pdf}}s, 
we deal with finite collections of real atoms or molecules. 
For example a finite collection of real gas molecules does '''not''' according 
to the Boltzmann transport equation. The very idea makes no sense, since 
the entities referred to in the two situations (on the one hand a {{sc&#124;pdf}}-valued
function, on the other hand a large collection of positions and velocities) 
are completely different. 
A collection of real gas molecules evolves according to  the laws of mechanics 
rather than the Boltzmann transport equation, and therefore is indeed subject to 
the issues of reversibility and recurrence, properties that can be proved for 
mechanical (hamiltonian) systems. 

Now, going back to the transport equation, the pdf that we associate with any 
particular point of space-time at equilibrium is, of course, the maxwellian
distribution. With this distribution in mind, note that, just as we did with our 
specific piece of English text, we can calculate a '''specific''' entropy for a specific
collection of gas molecules. Such a calculation would first calculate the appropriate 
"temperature" parameter for this collection of molecules, perhaps based on the 
standard deviation of the distribution of speeds of all the moelcules. It would 
then loop over all the molecules, for each one calculating, for that molecule's
velocity, an appropriate probability from the maxwellian pdf, multiplying that 
probability by the log of that probability, and summing the results.
Just as in the case of compressing a particular piece of English text, this calculation 
might result in a value higher or lower than the entropy of the maxwellian {{sc&#124;pdf}} at 
the temperature we calculated for this system.

===connection between the mathematical ideal and the collection of molecules===
The connection between the mathematical ideal and the real world is that  
# assuming the mathematical {{sc&#124;pdf}} is chosen correctly, things happen in the real world as frequently or infrequently as the probabilities of the {{sc&#124;pdf}}, ie sampling the properties of a large number of molecules and binning the results will give you values just like what you'd expect from the {{sc&#124;pdf}} 
# the {{sc&#124;pdf}} for most physical situations is astonishingly peaked, meaning that physical configurations of molecules that don't match everyday experience have ridiculously low probabilities. (Compare, for example, the statistics of some randomly chosen piece of English text. We expect it to have statistics much like that of the English language pdf, but would not be surprised to learn that, for example, this piece of text utilizes 1% more "e"'s or 5% fewer "w"'s than the pdf tells us are the case for the entire universe of English language text. However when dealing with, of order say 10^18 molecules that have had a chance to equilibrate, we would expect to wait much longer than the age of the universe before seeing deviations of order 1% between statistics calculated for our collection of molecules as compared to the appropriate value calculated from our {{sc&#124;pdf}}}.)

==Reconciliation between thermodynamics and Boltzmann==

So in summary what we can say is that 
# Boltzmann was right, in that the H-theorem does provide a mathematical  proof of the monotonic increase of entropy  AS HE DEFINED IT.
# His opponents were right in that real mechanical systems, in theory (though hardly in practice) can reduce their entropy  AS THEY DEFINED IT.
# We would all be better off using a different word to distinguish the entropy of a {{sc&#124;pdf}}, a nice, clearly defined mathematical construction, from the "entropy" of a specific mechanical system, a rather less well defined mathematical construction. (You can come up with a consistent mathematical definition for this "specific" entropy, but the result doesn't quite mean what you probably think it means.) 
# The fact that the {{sc&#124;pdf}} entropy is (in practical terms) equal to the  (per-instance) entropy is an example of a not-infrequent situation in science: two conceptually very different mathematical ideas, when not well understood, are considered to be the same thing. At first this allows for progress, but once the field is understood, the conflating of the two ideas (which usually occurs through using language inexactly) is inexcusable. Unfortunately it is a rare case indeed where textbook writers are willing to break with the past and modify their language so as to undo this confusion.

Another view of this is to bring classical thermodynamics into the mix. 
One mathematically consistent way to look at the world is via statistical mechanics, utilizing
{{sc&#124;pdf}}s and appropriately defined entropy as I have discussed. 
Another mathematically consistent viewpoint is axiomatic thermodynamics which takes 
concepts like temperature, entropy, and the second law as unprovable starting points. 
What is not consistent, and where one gets into trouble, claiming things like "the second 
law is only true on average" is where one attempts to utilize the statistical mechanics 
viewpoint, but applies it not to the calculation of {{sc&#124;pdf}}s, but to the calculation of 
the average properties of some '''specific''' collection of molecules. 
If you're going to do this, you need to be very careful about exactly what you are claiming 
is a specific property of your collection of molecules vs what is a property of the set of 
of all collections of molecules. The astute reader will realize that Gibbs' ensembles are,
essentially, a way to deal with this issue and, that, though not using my language, he is 
concerned with calculating {{sc&#124;pdf}}'s and their properties.

=Zermelo's Criticism of the H-Theorem=

Along with the misguided attacks on the H-theorem, those that mistake the 
evolution of the pdf for the evolution of the system, that we have discussed, 
there is a more interesting attack, first presented by Zermelo. 
The argument goes thus: 
Liouville's theorem tells us that under evolution via a Hamiltonian, the 
measure of a subset of phase space does not change. It's a short step from
this to showing that this means that the H of a mechanical system cannot 
change (for any {{sc&#124;pdf}}). After thinking about this for a few seconds, this 
actually becomes quite reasonable, especially when thought of in the context 
of our description of file compression above. What we have is a system with 
a certain amount of uncertainty (the initial {{sc&#124;pdf}}) along with deterministic 
evolution in time which is not adding any more uncertainty. 
(How can one reconcile this with Boltzmann's proof of the H theorem? 
That proof includes an expression describing the scattering after 
interaction of two components, and reduces this to some sort of probabilistic
expression. If the Hamiltonian is taken as gospel, this reduction must be 
invalid, and must be ignoring correlations in the components from earlier 
interactions that, although apparently small, are actually essential.)

This is something of a kick in the pants, and strikes me as much
more problematic than the earlier attacks on the H theorem. 
My take on this matter (and I'd love to be corrected if I am wrong) is that this 
can be viewed in two ways. 
* One could attempt to argue that H (or the equivalent, entropy) has not really increased because there exist fiendishly complicated correlations between the various components of the system; these correlations are, however, not in any way apparent to our eyes, and so the system appears to have become more disordered. It's hard to keep this up, however, across all physical phenomena, for all of time. This argument is essentially claiming that the disorder of the world (and its increase) is only in our brains, not in reality. 
* Alternatively one could argue that, although these correlations between components grow for some amount of time, every so often something occurs that ruins the coherence, and that it is ultimately this something that is driving the second law. In the pre-quantum past this something was called "molecular disorder", and now we might call it "collapse of the wave function".  This is the view I espouse and is, I suspect, what most physicists would agree with if pushed. What is interesting is that so important an issue, '''the''' driver of entropy increase, is simply not mentioned in the same elementary textbooks that make such a mess of explaining the supposed problems with the second law.

The reader will, I trust, not have missed the remarkable similarity between 
this discussion and the general problem of the evolution of quantum systems.

=Ergodicity=

A final related issue that sometimes causes confusion, though more so in the past, 
is the issue of ergodicity. Ergodicity is the claim that a '''specific''' mechanical 
system, if left for long enough, evolves through all the states of the {{sc&#124;pdf}},
with the amount of time spent in the neighborhood of each state being 
proportional to the probability associated by the {{sc&#124;pdf}} with that neighborhood.

The ergodic assumption is, to clarify, not a part of the calculation of a {{sc&#124;pdf}} 
or how a {{sc&#124;pdf}} evolves in time; it is useful when trying to connect the abstract 
idea of a {{sc&#124;pdf}} to the concrete reality of a specific physical system, the idea
being something like: if the ergodic hypothesis is true, then the specific 
mechanical system (collection of gas molecules or whatever), evolves through 
enough states over a macroscopic period of time that what our senses and our 
instruments see is simply an average, moreover that average (over time, for 
this specific instance of the mechanical system), is the same as the average 
one calculates by averaging over the {{sc&#124;pdf}}. 
Maxwell and Boltzmann on occasion justified what they were doing on the basis 
of the ergodic hypothesis.

If one is even slightly familiar with measure theory, the ergodic assumption
appears to have to be false; one is trying to map a trajectory (a single
continuous line) onto a volume, and measure theory tells us that while this
can be done, it cannot be done with a continuous mapping. The bottom line is
that mathematicians fairly easily proved that the ergodic assumption was 
false. However it appears that what Maxwell and Boltzmann meant by the ergodic
assumption was not the exact ergodic assumption described above but something 
that looks pretty much the same to physicists but not to mathematicians, the quasi-ergodic 
assumption, which assumes that while the system will not pass through every 
state in available,it will passes '''arbitrarily close''' to every state available.

Even this less demanding quasi-ergodic assumption is not 
necessarily true for certain specific states of certain specific mechanical 
systems. One can imagine, for example, a collection of billiard balls arranged
so carefully (as a lattice perhaps) in such a way that as time goes by they 
continue to bounce off each other, while retaining the lattice structure, 
forever. But this is clearly a somewhat pathological example.
Mathematicians have delighted in looking at this problem in ever 
finer detail, asking if there are conditions one might place on either the 
initial state or the collection of forces (ie the hamiltonian/lagrangian) 
governing the time-evolution of the system, that will compel the system to 
either be or not be quasi-ergodic. Their conclusion seems to be that 
many interesting classical-mechanical systems are in fact quasi-ergodic. 

This discussion is, to be honest, quite irrelevant to our real-world interest in 
statistical mechanics. In the real world, what we want to know is to what 
extent the averages we can calculate easily (ie averages over a {{sc&#124;pdf}}) will 
match what we measure (ie averages over some finite volume and some finite 
timespan of the evolution of a specific instance of a mechanical system); 
sometimes we are more ambitious and also want to know the extent of the 
deviations we might expect our real world measurements to take from the {{sc&#124;pdf}} 
averages. 
Ergodicity, as the mathematicians deal with it, is basically useless for this task. 
* First of all it is clear that minute perturbations to the mechanical system, (for example gravitational effects of other planets) while presumably having no effect on large scale averages like density and pressure, have a significant effect on ergodicity or the lack thereof --- return again to our finely balanced lattice of moving billiard balls. Ergodicity is an astonishingly brittle property.
* Secondly real world systems are, of course, quantum mechanical, and while classical mechanics is frequently a fine approximation to their behavior, it's not at all obvious that a system that is proved to be ergodic or not as a classical system is actually such as a quantum mechanical system.
* Thirdly it is not relevant to know that over a _long enough_ duration of time a time average matches a {{sc&#124;pdf}} average. One wants to know behavior over a specific duration of time, eg the duration that one's experimental sensors are active. 

As far as I can tell, as real world physicists, we pretty much simply 
make the assumption that for any system we care about the myriad sources 
of randomness in the world (minute perturbations, quantum effects, the finite 
size and duration of measurements), all blend together in such a way that 
{{sc&#124;pdf}} averages are expected to match experimental results. I know of no 
mathematical results that come even close to proving that this is actually 
expected to be the case for real-world conditions, though it seems like the 
sort of thing that could be proved if one were smart enough.</description>
		<content:encoded><![CDATA[<p>I so damn sick of people not understanding this.<br />
Here is an essay from my wiki (not yet on-line) that explains EXACTLY what is going on here.<br />
(It&#8217;s in mediawiki syntax, but even so should be pretty easy to read.<br />
The only thing that&#8217;s probably not clear is {{sc|pdf}} means PDF in small caps.)</p>
<p>=The Second Law of Thermodynamics and Boltzmann&#8217;s H-Theorem=</p>
<p>==The issue==<br />
When one reads about statistical mechanics, both in textbooks and in popular<br />
works, some misunderstandings on the subject that date from the late 1800s<br />
and early 1900s still remain common. These tend to cluster around three<br />
issues:</p>
<p>* The second law of thermodynamics is absolute but a particular system can evolve towards a &#8220;less likely&#8221; state.<br />
* The reversibility and recurrence &#8220;paradoxes&#8221;.<br />
* Are the systems of classical mechanics ergodic?</p>
<p>Physicists of the modern world have basically made their peace with the first<br />
point. Although it was a big deal in the late 1800s, and considered a blow<br />
against kinetic/atomic theory, no-one nowadays has a problem with the reality<br />
that ice-cubes in the sun melt, alongside the theoretical possibility that<br />
just once, this time, liquid water placed under a warm sun might freeze.<br />
Even so, the understanding of what this actually means mathematically<br />
is pretty limited and, if pressed, the details are usually wrong.</p>
<p>The second point, even more so, is usually completely botched, and the<br />
horrible explanations given for it usually poison the understanding of the<br />
other two points. So with this in mind, let&#8217;s examine the issue properly.</p>
<p>==A little history of the H theorem==</p>
<p>In the mid-1800s Clausius came up with the idea of entropy and the second<br />
law of thermodynamics. At this time, recall, the very idea of atoms was<br />
controversial; some felt that the concepts of thermodynamics were primal and<br />
did not need to be justified or derived from models of how the world was<br />
constructed, while at the same time others were pursuing the kinetic theory<br />
of gases and trying to use its successes to prove the existence of atoms. </p>
<p>Against this background, the most significant thing yet proved about the<br />
kinetic theory of gases was Maxwell&#8217;s velocity distribution. But some people<br />
were unhappy with various aspects of the proof. The proof then, just like the<br />
proof one usually sees today, assumed that the velocities of interacting<br />
molecules were uncorrelated, something some felt was not justifiable.<br />
(On the other hand, by making this assumption, the proof showed the<br />
generality of the resultant distribution regardless of whatever details<br />
one might assume of the interaction of the molecules.)<br />
Boltzmann, to deal with this, came up with the Boltzmann Transport Equation,<br />
which more explicitly dealt with the interactions. It was fairly easy to show<br />
that a maxwellian distribution was static under this equation, in other words<br />
would not change with time. But Boltzmann wanted to show something more; that<br />
any other distribution would monotonically evolve towards the maxwellian<br />
distribution. </p>
<p>To this end he defined a quantity (which we now call H), a property of a<br />
particular distribution, and proved two things</p>
<p>* dH/dt = 0 for a maxwellian distribution and<br />
* dH/dt<br />
The sad thing is that this same<br />
mishmash of poorly thought-out arguments and counter-arguments still appears<br />
in today&#8217;s textbooks. I remember being bugged by the sloppiness of these<br />
arguments back almost twenty years ago when I was an undergraduate.</p>
<p>None of this is necessary &#8212; there&#8217;s a perfectly good, perfectly simple<br />
explanation for what&#8217;s going on that doesn&#8217;t require this handwaving.<br />
However to get to that point, we need a slight detour. I&#8217;m going to give the detour in<br />
more detail than is needed just to deal with this problem because the ideas are<br />
interesting, worth remembering, and best understood in a non-thermodynamics<br />
context than hasn&#8217;t been poisoned with invalid arguments.</p>
<p>==Data compression==</p>
<p>Let&#8217;s switch to an apparently very different problem, the problem of data<br />
compression as performed by computers. Data compression consists of two parts.</p>
<p>===modelling===<br />
The first stage, called modeling, transforms fragments of the data in some<br />
way so as to generate a stream of so-called symbols. Modeling varies from<br />
compression scheme to compression scheme &#8212; in JPEG, for example, it<br />
involves, among other things, splitting the image into 8&#215;8 blocks and<br />
performing a 2D DCT (something like a Fourier transform) on the data in each<br />
8&#215;8 block. </p>
<p>Modeling is specific to each compression scheme and the details do not<br />
matter to us. What matters is that after modeling the result is a stream<br />
of what we might abstractly call symbols. Suppose that our modeling results<br />
in symbols that can have values 0..255. The simplest way to store these values<br />
would simply be to use 8 bits for each symbol. This, however, would be far<br />
from optimal if some symbols are very much more common than other symbols. </p>
<p>===entropy coding===<br />
What is done in data compression is to encode the symbols using what is called<br />
entropy coding. Entropy coding comes in two forms, Huffman coding and<br />
Arithmetic coding. </p>
<p>{{infobox1|A second part of the theory<br />
is that the bit stream you construct has to be readable, even though there<br />
are no markers between the (variable length) bit strings indicating where<br />
one stops and the next starts. This implies that the collection of bit strings<br />
you use has to possess what is called the prefix property.}}<br />
Huffman coding uses shorter strings of bits for symbols<br />
that are more common, and longer strings of bits for symbols that are less<br />
common. The theory tells you (given the probabilities of different symbols)<br />
the optimal way to map symbls onto bit strings. </p>
<p>Arithmetic coding achieves the same goal as Huffman coding, namely using fewer<br />
bits to encode the more common symbols, in a way that is somewhat more<br />
efficient than Huffman coding, but quite a bit more difficult to understand.<br />
However it&#8217;s not relevant to our discussion.</p>
<p>===an example: compressing english text===<br />
So, given what we have said above, suppose we want to compress some data.<br />
To avoid getting bogged down in irrelevant details, let us assume that the<br />
data we want to compress is English language text encoded using 8-bit ASCII<br />
using LATIN-1 high-bit encoding,<br />
and that we are going to ignore the modelling stage of compression. </p>
<p>So the problem we have given ourselves is that we have symbols which are 8-bit<br />
ASCII characters, 0..255. Right away we know that some characters are going<br />
to be far more common than others. The characters with the high bit set (ie<br />
with a value &gt;127) are highly unlikely. These refer to diphthongs accented<br />
characters, punctuation symbols rarely used in English and so on.<br />
Punctuation characters are less likely than many letters, and capital letters<br />
are less frequent than lower case letters.<br />
Certain letters are much more likely than other letters.</p>
<p>====the probability distribution function for english text====<br />
Compression is all about having an accurate mathematical model of the<br />
probability structure of the data.<br />
As a first approximation, we can consider the probability of each individual<br />
ASCII character. This gives us an array of 256 probabilities. In some vague<br />
sense that philosophers can argue over, there is presumably some sort of<br />
&#8220;ideal&#8221; probability distribution function ({{sc|pdf}}) for English language text<br />
that incorporates all text that has been and can be written, and that&#8217;s what<br />
our compression program is targetting. But, of course, we can&#8217;t just conjure<br />
up that ideal, so what we do is gather a large body of what we hope is<br />
representative English text, calculate the empirical (as opposed to ideal)<br />
statistics for that text, and treat those (sample) statistics as representative<br />
of all English text and thus equal to our philosophical ideal.<br />
We can then use these empirical probabilities to<br />
construct a Huffman code (or to drive an arithmetic coder), and we have a<br />
way to compress English ASCII text.</p>
<p>===the mathematical entropy associated with any discrete {{sc|pdf}}===<br />
Now let&#8217;s step back a little from this example and consider the general<br />
issue. As &#8221;&#8217;mathematicians&#8221;&#8217;, we can define a quantity, named<br />
the &#8221;&#8217;mathematical entropy&#8221;&#8217;, for any {{sc|pdf}}. The entropy is defined as </p>
<p>	S=-Sum[ probability(symbol)*lb( probability(symbol) ),<br />
	  summed over all symbols ]</p>
<p>where lb() is the binary log (ie log to base 2) of a number. </p>
<p>This may seem a bit much to take in, but really it&#8217;s not hard.<br />
Let&#8217;s assume we have four symbols, A, B, C, D, and that the probabilities are<br />
	(A, 1/2) (B, 1/4), (C, 1/8), (D, 1/8)<br />
The entropy associated with this {{sc|pdf}} is 1*.5 + 2*.25 + 3*.125 + 3*.125 =1.75. </p>
<p>Note that perfect entropy<br />
coding of a collection of symbols with some given {{sc|pdf}} means that each symbol<br />
will take, on average, -lb( probability(symbol) ) bits to encode.<br />
(Probabilities are less than one, the log is negative, so we add a minus<br />
sign to make the result positive.)<br />
So perfect entropy coding of our example would utilize<br />
1 bit to encode an A, 2 bits to encode a B, and 3 bits to encode a C or a D.<br />
It should be obvious from the above calculation that the entropy of the {{sc|pdf}}<br />
is nothing more than the average number of bits required per symbol to perfectly<br />
entropy encode a stream of data conforming to this {{sc|pdf}}.</p>
<p>{{infobox| arithmetic coding |<br />
In fact arithmetic coding entropy encodes data using a non-integral number<br />
of bits per symbol, so we can actually approach perfect entropy coding in real<br />
computer programs. This is a pretty neat trick, and I&#8217;d recommend you read<br />
up on how it is done if you have time.<br />
}}<br />
You may wonder what happens when the probabilities of a symbol are not nice<br />
power-of-two probabilities as in the example. In that case, Huffman encoding<br />
cannot generate perfect entropy coding results, because the length of a<br />
Huffman code is obviously some integer number of bits, while the perfect<br />
entropy code might be some irrational number of bits, say 3.7569&#8230;<br />
In this case the average number of bits required to Huffman encode the symbol<br />
stream will be larger than the entropy; the entropy is a lower bound, the<br />
absolute best we can do.</p>
<p>There are, of course, different {{sc|pdf}}s for different<br />
sets of material we may consider compressing, for example the statistics,<br />
and thus the {{sc|pdf}}, associated with the set of all photos, information very<br />
valid to the design of a compression scheme like JPEG, are very different<br />
from the statistics for English language text.</p>
<p>===entropy is a property of a {{sc|pdf}}, not a finite sample from that {{sc|pdf}}===<br />
At this stage we now need to point out an essential point,<br />
&#8221;&#8217;the&#8221;&#8217; essential point to understanding this stuff, both in the context<br />
of data compression and later in the physics context:<br />
&#8221;&#8217;The {{sc|pdf}} describing the distribution of symbols is a property of some abstract infinite stream of symbols, for example some vague idea of the set of all English text.&#8221;&#8217;<br />
Now the properties of a {{sc|pdf}} will almost certainly be measured empirically,<br />
using as large a collection as is feasible of the type of material we want<br />
to compress, for example a large collection of English documents.<br />
From the statistics of this sample stream, an estimate of the entropy of<br />
the {{sc|pdf}} governing these symbols is then a simple calculation.<br />
The {{sc|pdf}} is, however, some sort of ideal entity not linked to<br />
the particular sample material we used; the particular symbol stream<br />
used to design a compression algorithm is simply regarded as a<br />
representative sample from an infinite stream of symbols.</p>
<p>===a misleading concept. the &#8220;entropy&#8221; of a finite sample===<br />
Switch now from the idea of all English text to focus on a particular<br />
piece of English text, a particular file we wish to compress.<br />
For any &#8221;&#8217;specific&#8221;&#8217; piece of English text, we can compress the stream of<br />
symbols using an entropy coder and the {{sc|pdf}} for English text, and the<br />
compressed data will have some size, meaning some average number of bits<br />
per symbol.<br />
We can call this, if we want, the entropy of this &#8221;&#8217;specific&#8221;&#8217; piece of English<br />
text, but it is conceptually a very different thing from the mathematical<br />
entropy we defined for the English language {{sc|pdf}}. This specific entropy (ie the<br />
average number of bits required per symbol to represent the text) may be<br />
rather larger than the entropy of the English language {{sc|pdf}} (for example the<br />
text may be something written by James Joyce, or an article about words to<br />
use in scrabble), or this specific entropy may be less than that<br />
of the English language {{sc|pdf}} (for example the text may<br />
be written for children, and may utilize only short simple words with<br />
very little punctuation).</p>
<p>===if you want to learn more about data compression and entropy coding===<br />
{{infobox| correlation between symbols |<br />
The most important subject we have omitted from the discussion above,<br />
interesting but not relevant to where we are going with this, is<br />
exploitation of the correlation between<br />
successive symbols to reduce the number of bits required for compression,<br />
something that gets us into Markov models. (An obvious example is that the<br />
letter q is almost always followed by the letter u, and surely a compression<br />
scheme should be able to exploit that somehow.)<br />
While Markov models are a<br />
theoretically powerful method of doing so, there are severe practical<br />
problems with using them because of a combinatorial explosion in the number<br />
of probabilities one has to keep track of. The major goal of modeling<br />
is to attempt to restructure the data stream from its initial form, where<br />
there are obvious correlations between various pieces of data, to some<br />
intermediate form whose symbols are, as far as is practical, independent of<br />
each other. How best to do this clearly depends on the type of data and the<br />
techniques used for text, still images, video, general audio or speech are<br />
all very different.<br />
The rest of the book is concerned with the details of the modeling used by<br />
JPEG2000 &#8212; fascinating but very dense.<br />
}}<br />
If you are interested in the details of entropy coding beyond what I&#8217;ve<br />
discussed, IMHO by far the best introduction is Chapter 2 of<br />
[http://www.amazon.com/exec/obidos/tg/detail/-/079237519X/qid=1090272956/sr=8-1/ref=sr_8_xs_ap_i1_xgl14102-5724443-9727327?v=glance&amp;s=books&amp;n=507846 the JPEG2000 book by Taubman and Marcellin].<br />
(This is an expensive book and, unless you are really interested in the<br />
subject, you probably won&#8217;t want to read most of it, so I&#8217;d suggest borrowing<br />
a copy from a library or a friend rather than buying it.)</p>
<p>==The H theorem refers to {{sc|pdf}}s, not samples==</p>
<p>{| style=&#8221;float:right; margin-left: 1em; width:50%;&#8221; cellpadding=5 cellspacing=1 border=0<br />
|-<br />
|align=left width=100% style=&#8221;background-color:#f3f3ff; border:1px solid&#8221;|<br />
&#8221;&#8217;physics entropy rather than cs entropy&#8221;&#8217;</p>
<p>Note that the explanation above utilized by logarithms to base 2 to calculate the<br />
entropy for the purposes of computer science. In physics, with a different set of<br />
concerns we calculate entropy using logarithms to base e, but the essential points<br />
remain the same.<br />
Note also that the explanation above dealt with a discrete {{sc|pdf}}.<br />
There are interesting technical mathematical challenges when one goes from a<br />
discrete {{sc|pdf}} to a continuous {{sc|pdf}}, like for example, a gaussian, but<br />
we will ignore those and focus on the important thing which is that, after all<br />
the pain of proving the results, the bottom line is that our ideas from discrete<br />
{{sc|pdf}}s map over to continuous {{sc|pdf}}s pretty much as we&#8217;d expect.<br />
|}</p>
<p>With the above detour out the way, let&#8217;s return to Boltzmann;<br />
perhaps you can already see what the fundamental issue is.<br />
Boltzmann&#8217;s theorem refers to &#8221;&#8217;{{sc|pdf}}s&#8221;&#8217;. It says that the time evolution of a<br />
{{sc|pdf}} occurs in a certain way.<br />
Meanwhile the reversion and recurrence paradoxes refer to<br />
specific instances of a mechanical system, &#8221;&#8217;not&#8221;&#8217; to {{sc|pdf}}s. As such, what they<br />
do or don&#8217;t say is irrelevant to Boltzmann&#8217;s theorem. </p>
<p>===a rigorous mathematical view of the Boltzmann transport equation===<br />
More specifically we can say that, from the point of view of a nicely<br />
manageable mathematical structure, we want to talk about {{sc|pdf}}s.<br />
We can, as mathematicians, define a mathematical structure that is a function<br />
of space and time and that has as its value at each space-time point a value<br />
which is a probability density function for a velocity. This is a more careful,<br />
more explicit way of defining the function of Boltzmann&#8217;s transport equation.<br />
If we now define a way in which this {{sc|pdf}}-valued function evolves with time<br />
(the Boltzmann transport equation) we have a perfectly consistent well defined<br />
mathematical problem. We can now prove various properties of this<br />
mathematical system, one of which is that (assuming various properties of<br />
specific transport equation we&#8217;re using), the entropy of the {{sc|pdf}} associated<br />
with each spatial point is monotonically non-decreasing.<br />
(This mathematical result holds for any {{sc|pdf}}, but is physically only useful for<br />
situations where a {{sc|pdf}} plausibly suggests itself.<br />
For the most part such situations are either equilibrium [ie the<br />
pdf is the maxwellian-boltzmann distribution], or &#8220;different equilibrium at<br />
different points of space&#8221; eg a gas with some non-uniform temperature<br />
distribution. )</p>
<p>===a real world view of a collection of molecules===<br />
OK, this is a fully consistent mathematical construction.<br />
However to some extent in the real world, we don&#8217;t deal with {{sc|pdf}}s,<br />
we deal with finite collections of real atoms or molecules.<br />
For example a finite collection of real gas molecules does &#8221;&#8217;not&#8221;&#8217; according<br />
to the Boltzmann transport equation. The very idea makes no sense, since<br />
the entities referred to in the two situations (on the one hand a {{sc|pdf}}-valued<br />
function, on the other hand a large collection of positions and velocities)<br />
are completely different.<br />
A collection of real gas molecules evolves according to  the laws of mechanics<br />
rather than the Boltzmann transport equation, and therefore is indeed subject to<br />
the issues of reversibility and recurrence, properties that can be proved for<br />
mechanical (hamiltonian) systems. </p>
<p>Now, going back to the transport equation, the pdf that we associate with any<br />
particular point of space-time at equilibrium is, of course, the maxwellian<br />
distribution. With this distribution in mind, note that, just as we did with our<br />
specific piece of English text, we can calculate a &#8221;&#8217;specific&#8221;&#8217; entropy for a specific<br />
collection of gas molecules. Such a calculation would first calculate the appropriate<br />
&#8220;temperature&#8221; parameter for this collection of molecules, perhaps based on the<br />
standard deviation of the distribution of speeds of all the moelcules. It would<br />
then loop over all the molecules, for each one calculating, for that molecule&#8217;s<br />
velocity, an appropriate probability from the maxwellian pdf, multiplying that<br />
probability by the log of that probability, and summing the results.<br />
Just as in the case of compressing a particular piece of English text, this calculation<br />
might result in a value higher or lower than the entropy of the maxwellian {{sc|pdf}} at<br />
the temperature we calculated for this system.</p>
<p>===connection between the mathematical ideal and the collection of molecules===<br />
The connection between the mathematical ideal and the real world is that<br />
# assuming the mathematical {{sc|pdf}} is chosen correctly, things happen in the real world as frequently or infrequently as the probabilities of the {{sc|pdf}}, ie sampling the properties of a large number of molecules and binning the results will give you values just like what you&#8217;d expect from the {{sc|pdf}}<br />
# the {{sc|pdf}} for most physical situations is astonishingly peaked, meaning that physical configurations of molecules that don&#8217;t match everyday experience have ridiculously low probabilities. (Compare, for example, the statistics of some randomly chosen piece of English text. We expect it to have statistics much like that of the English language pdf, but would not be surprised to learn that, for example, this piece of text utilizes 1% more &#8220;e&#8221;&#8217;s or 5% fewer &#8220;w&#8221;&#8217;s than the pdf tells us are the case for the entire universe of English language text. However when dealing with, of order say 10^18 molecules that have had a chance to equilibrate, we would expect to wait much longer than the age of the universe before seeing deviations of order 1% between statistics calculated for our collection of molecules as compared to the appropriate value calculated from our {{sc|pdf}}}.)</p>
<p>==Reconciliation between thermodynamics and Boltzmann==</p>
<p>So in summary what we can say is that<br />
# Boltzmann was right, in that the H-theorem does provide a mathematical  proof of the monotonic increase of entropy  AS HE DEFINED IT.<br />
# His opponents were right in that real mechanical systems, in theory (though hardly in practice) can reduce their entropy  AS THEY DEFINED IT.<br />
# We would all be better off using a different word to distinguish the entropy of a {{sc|pdf}}, a nice, clearly defined mathematical construction, from the &#8220;entropy&#8221; of a specific mechanical system, a rather less well defined mathematical construction. (You can come up with a consistent mathematical definition for this &#8220;specific&#8221; entropy, but the result doesn&#8217;t quite mean what you probably think it means.)<br />
# The fact that the {{sc|pdf}} entropy is (in practical terms) equal to the  (per-instance) entropy is an example of a not-infrequent situation in science: two conceptually very different mathematical ideas, when not well understood, are considered to be the same thing. At first this allows for progress, but once the field is understood, the conflating of the two ideas (which usually occurs through using language inexactly) is inexcusable. Unfortunately it is a rare case indeed where textbook writers are willing to break with the past and modify their language so as to undo this confusion.</p>
<p>Another view of this is to bring classical thermodynamics into the mix.<br />
One mathematically consistent way to look at the world is via statistical mechanics, utilizing<br />
{{sc|pdf}}s and appropriately defined entropy as I have discussed.<br />
Another mathematically consistent viewpoint is axiomatic thermodynamics which takes<br />
concepts like temperature, entropy, and the second law as unprovable starting points.<br />
What is not consistent, and where one gets into trouble, claiming things like &#8220;the second<br />
law is only true on average&#8221; is where one attempts to utilize the statistical mechanics<br />
viewpoint, but applies it not to the calculation of {{sc|pdf}}s, but to the calculation of<br />
the average properties of some &#8221;&#8217;specific&#8221;&#8217; collection of molecules.<br />
If you&#8217;re going to do this, you need to be very careful about exactly what you are claiming<br />
is a specific property of your collection of molecules vs what is a property of the set of<br />
of all collections of molecules. The astute reader will realize that Gibbs&#8217; ensembles are,<br />
essentially, a way to deal with this issue and, that, though not using my language, he is<br />
concerned with calculating {{sc|pdf}}&#8217;s and their properties.</p>
<p>=Zermelo&#8217;s Criticism of the H-Theorem=</p>
<p>Along with the misguided attacks on the H-theorem, those that mistake the<br />
evolution of the pdf for the evolution of the system, that we have discussed,<br />
there is a more interesting attack, first presented by Zermelo.<br />
The argument goes thus:<br />
Liouville&#8217;s theorem tells us that under evolution via a Hamiltonian, the<br />
measure of a subset of phase space does not change. It&#8217;s a short step from<br />
this to showing that this means that the H of a mechanical system cannot<br />
change (for any {{sc|pdf}}). After thinking about this for a few seconds, this<br />
actually becomes quite reasonable, especially when thought of in the context<br />
of our description of file compression above. What we have is a system with<br />
a certain amount of uncertainty (the initial {{sc|pdf}}) along with deterministic<br />
evolution in time which is not adding any more uncertainty.<br />
(How can one reconcile this with Boltzmann&#8217;s proof of the H theorem?<br />
That proof includes an expression describing the scattering after<br />
interaction of two components, and reduces this to some sort of probabilistic<br />
expression. If the Hamiltonian is taken as gospel, this reduction must be<br />
invalid, and must be ignoring correlations in the components from earlier<br />
interactions that, although apparently small, are actually essential.)</p>
<p>This is something of a kick in the pants, and strikes me as much<br />
more problematic than the earlier attacks on the H theorem.<br />
My take on this matter (and I&#8217;d love to be corrected if I am wrong) is that this<br />
can be viewed in two ways.<br />
* One could attempt to argue that H (or the equivalent, entropy) has not really increased because there exist fiendishly complicated correlations between the various components of the system; these correlations are, however, not in any way apparent to our eyes, and so the system appears to have become more disordered. It&#8217;s hard to keep this up, however, across all physical phenomena, for all of time. This argument is essentially claiming that the disorder of the world (and its increase) is only in our brains, not in reality.<br />
* Alternatively one could argue that, although these correlations between components grow for some amount of time, every so often something occurs that ruins the coherence, and that it is ultimately this something that is driving the second law. In the pre-quantum past this something was called &#8220;molecular disorder&#8221;, and now we might call it &#8220;collapse of the wave function&#8221;.  This is the view I espouse and is, I suspect, what most physicists would agree with if pushed. What is interesting is that so important an issue, &#8221;&#8217;the&#8221;&#8217; driver of entropy increase, is simply not mentioned in the same elementary textbooks that make such a mess of explaining the supposed problems with the second law.</p>
<p>The reader will, I trust, not have missed the remarkable similarity between<br />
this discussion and the general problem of the evolution of quantum systems.</p>
<p>=Ergodicity=</p>
<p>A final related issue that sometimes causes confusion, though more so in the past,<br />
is the issue of ergodicity. Ergodicity is the claim that a &#8221;&#8217;specific&#8221;&#8217; mechanical<br />
system, if left for long enough, evolves through all the states of the {{sc|pdf}},<br />
with the amount of time spent in the neighborhood of each state being<br />
proportional to the probability associated by the {{sc|pdf}} with that neighborhood.</p>
<p>The ergodic assumption is, to clarify, not a part of the calculation of a {{sc|pdf}}<br />
or how a {{sc|pdf}} evolves in time; it is useful when trying to connect the abstract<br />
idea of a {{sc|pdf}} to the concrete reality of a specific physical system, the idea<br />
being something like: if the ergodic hypothesis is true, then the specific<br />
mechanical system (collection of gas molecules or whatever), evolves through<br />
enough states over a macroscopic period of time that what our senses and our<br />
instruments see is simply an average, moreover that average (over time, for<br />
this specific instance of the mechanical system), is the same as the average<br />
one calculates by averaging over the {{sc|pdf}}.<br />
Maxwell and Boltzmann on occasion justified what they were doing on the basis<br />
of the ergodic hypothesis.</p>
<p>If one is even slightly familiar with measure theory, the ergodic assumption<br />
appears to have to be false; one is trying to map a trajectory (a single<br />
continuous line) onto a volume, and measure theory tells us that while this<br />
can be done, it cannot be done with a continuous mapping. The bottom line is<br />
that mathematicians fairly easily proved that the ergodic assumption was<br />
false. However it appears that what Maxwell and Boltzmann meant by the ergodic<br />
assumption was not the exact ergodic assumption described above but something<br />
that looks pretty much the same to physicists but not to mathematicians, the quasi-ergodic<br />
assumption, which assumes that while the system will not pass through every<br />
state in available,it will passes &#8221;&#8217;arbitrarily close&#8221;&#8217; to every state available.</p>
<p>Even this less demanding quasi-ergodic assumption is not<br />
necessarily true for certain specific states of certain specific mechanical<br />
systems. One can imagine, for example, a collection of billiard balls arranged<br />
so carefully (as a lattice perhaps) in such a way that as time goes by they<br />
continue to bounce off each other, while retaining the lattice structure,<br />
forever. But this is clearly a somewhat pathological example.<br />
Mathematicians have delighted in looking at this problem in ever<br />
finer detail, asking if there are conditions one might place on either the<br />
initial state or the collection of forces (ie the hamiltonian/lagrangian)<br />
governing the time-evolution of the system, that will compel the system to<br />
either be or not be quasi-ergodic. Their conclusion seems to be that<br />
many interesting classical-mechanical systems are in fact quasi-ergodic. </p>
<p>This discussion is, to be honest, quite irrelevant to our real-world interest in<br />
statistical mechanics. In the real world, what we want to know is to what<br />
extent the averages we can calculate easily (ie averages over a {{sc|pdf}}) will<br />
match what we measure (ie averages over some finite volume and some finite<br />
timespan of the evolution of a specific instance of a mechanical system);<br />
sometimes we are more ambitious and also want to know the extent of the<br />
deviations we might expect our real world measurements to take from the {{sc|pdf}}<br />
averages.<br />
Ergodicity, as the mathematicians deal with it, is basically useless for this task.<br />
* First of all it is clear that minute perturbations to the mechanical system, (for example gravitational effects of other planets) while presumably having no effect on large scale averages like density and pressure, have a significant effect on ergodicity or the lack thereof &#8212; return again to our finely balanced lattice of moving billiard balls. Ergodicity is an astonishingly brittle property.<br />
* Secondly real world systems are, of course, quantum mechanical, and while classical mechanics is frequently a fine approximation to their behavior, it&#8217;s not at all obvious that a system that is proved to be ergodic or not as a classical system is actually such as a quantum mechanical system.<br />
* Thirdly it is not relevant to know that over a _long enough_ duration of time a time average matches a {{sc|pdf}} average. One wants to know behavior over a specific duration of time, eg the duration that one&#8217;s experimental sensors are active. </p>
<p>As far as I can tell, as real world physicists, we pretty much simply<br />
make the assumption that for any system we care about the myriad sources<br />
of randomness in the world (minute perturbations, quantum effects, the finite<br />
size and duration of measurements), all blend together in such a way that<br />
{{sc|pdf}} averages are expected to match experimental results. I know of no<br />
mathematical results that come even close to proving that this is actually<br />
expected to be the case for real-world conditions, though it seems like the<br />
sort of thing that could be proved if one were smart enough.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steinn Sigurdsson</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1750</link>
		<dc:creator>Steinn Sigurdsson</dc:creator>
		<pubDate>Mon, 22 Aug 2005 19:06:35 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1750</guid>
		<description>All accessible microstates, surely.

There may well be physically possible states, which are not accessible from (some given) initial conditions. 

Maybe if it can be shown that the ensemble of universes includes all possible initial conditions, then given infinite time all allowed microstates occur; though I'd like to see a proof by construction (ie I can't convince myself that it is impossible to exclude some subset of conceivable microstates by any such evolution).

Given some particular set of "initial condition" on a "small enough" space (which I think may still be infinite), I think a heuristic proof is possible that many possible microstates are never actually reached, even given infinite time.

For a finite initial spatial extent, but infinite time, I think the proof is trivial.


Actually, surely this is trivial: even under Boltzman, we could have parity constraints - for example an infinite universe for an infinite time might still be such as to exclude anyone ever being lefthanded.  It would be physically conceivable for people to be lefthanded, but a strong constraint forbidding any actua l person from actually achieving left handedness in reality. And this would be purely arbitary in that it could have been right handedness that was excluded.

Anyway, you can see where  I am going with this...

As long as we don't get into a semantic argument about "possible".</description>
		<content:encoded><![CDATA[<p>All accessible microstates, surely.</p>
<p>There may well be physically possible states, which are not accessible from (some given) initial conditions. </p>
<p>Maybe if it can be shown that the ensemble of universes includes all possible initial conditions, then given infinite time all allowed microstates occur; though I&#8217;d like to see a proof by construction (ie I can&#8217;t convince myself that it is impossible to exclude some subset of conceivable microstates by any such evolution).</p>
<p>Given some particular set of &#8220;initial condition&#8221; on a &#8220;small enough&#8221; space (which I think may still be infinite), I think a heuristic proof is possible that many possible microstates are never actually reached, even given infinite time.</p>
<p>For a finite initial spatial extent, but infinite time, I think the proof is trivial.</p>
<p>Actually, surely this is trivial: even under Boltzman, we could have parity constraints - for example an infinite universe for an infinite time might still be such as to exclude anyone ever being lefthanded.  It would be physically conceivable for people to be lefthanded, but a strong constraint forbidding any actua l person from actually achieving left handedness in reality. And this would be purely arbitary in that it could have been right handedness that was excluded.</p>
<p>Anyway, you can see where  I am going with this&#8230;</p>
<p>As long as we don&#8217;t get into a semantic argument about &#8220;possible&#8221;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bittergradstudent</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1749</link>
		<dc:creator>bittergradstudent</dc:creator>
		<pubDate>Mon, 22 Aug 2005 18:40:17 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1749</guid>
		<description>Steinn: but doesn't that violate the spirit of the approach Boltzmann took in formulating stat mech (that all microstates have equal probability?)</description>
		<content:encoded><![CDATA[<p>Steinn: but doesn&#8217;t that violate the spirit of the approach Boltzmann took in formulating stat mech (that all microstates have equal probability?)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: dasmoo &#187;</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1748</link>
		<dc:creator>dasmoo &#187;</dc:creator>
		<pubDate>Mon, 22 Aug 2005 18:39:59 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1748</guid>
		<description>[...]  [...]</description>
		<content:encoded><![CDATA[<p>[...]  [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Steinn Sigurdsson</title>
		<link>http://cosmicvariance.com/2005/08/22/deep-time/#comment-1743</link>
		<dc:creator>Steinn Sigurdsson</dc:creator>
		<pubDate>Mon, 22 Aug 2005 17:56:24 +0000</pubDate>
		<guid isPermaLink="false">http://cosmicvariance.com/?p=166#comment-1743</guid>
		<description>Shouldn't this be in the "great errors civilians make" category?

Because given infinite time, we could still skip not only a finite subset of possibilities, we could skip an arbitary number of infinite subsets of possibilities.

In fact, annoyingly, we could just repeat a finite number of possibilities an infinite number of times and be exceedingly boring. Missing out on almost all of the infinite possibilities through sheer stubbornness.

But now I feel like I am channeling Max, so I'd better stop... ;-)</description>
		<content:encoded><![CDATA[<p>Shouldn&#8217;t this be in the &#8220;great errors civilians make&#8221; category?</p>
<p>Because given infinite time, we could still skip not only a finite subset of possibilities, we could skip an arbitary number of infinite subsets of possibilities.</p>
<p>In fact, annoyingly, we could just repeat a finite number of possibilities an infinite number of times and be exceedingly boring. Missing out on almost all of the infinite possibilities through sheer stubbornness.</p>
<p>But now I feel like I am channeling Max, so I&#8217;d better stop&#8230; <img src='http://cosmicvariance.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /></p>
]]></content:encoded>
	</item>
</channel>
</rss>
