Abstract
Two information‐theoretic principles—maximum entropy and minimum description length—dictate a computational model of associative learning that explains a wide range of conditioning phenomena. Any system for predicting future events faces the basic fact that time is one‐dimensional, and consequently, the candidate predictors come in two fundamental types—zero‐dimensional points in time, and one‐dimensional intervals of time. These "point cues" and "state cues" form the primitives of our theory. The probability distributions suited to inference about these two primitive cue types—exponential for states and Gaussian for points—follows directly from the principle of maximum entropy. The models the animal entertains to encode its experience so far and to anticipate future experience are built from these two primitive distributions, each representing a minimal state of knowledge about the two fundamental cue types. The models thus constructed each provide a different encoding of the animal's conditioning experience. Following the principle of minimum‐description‐length (Rissanen 1999), by determining which of these models best compresses the data already seen, one can optimally predict the data not yet seen. Surprisingly, therefore, the concept of data compression allows us to determine which specific state cues and point cues the animal will ultimately learn in a given protocol. They are simply the cues that appear as parameters in the lowest description length model. These theoretical results bring into sharp focus the need to focus neurobiological inquiry on the coding question in memory.