I awoke last night with a start from a nightmare that wasn’t too fun. I don’t want to get into the details, but it got me thinking about the purpose of dreams, and sleep in general. I was wide awake and probably thought about this for a good 2 hours before drifting off to sleep again. I became, and still am, enamored with a theory about why we sleep. It just seems to make sense to me, so I thought I would share it with the world. The theory stems from my information retrieval and data storage expertise I have accumulated while working on related technologies at two different search companies that I founded/co-founded, and one internet intelligence company where I worked as VP of R&D – my job was to create custom large scale data-mining applications. Not trying to brag or anything here – just trying to say I have put quite a bit of thought into these technologies, from both the hardware and software sides.
My theory comes in two parts, which I feel are both very closely related but different enough that they need to be explained separately.
Part 1: Information Indexing
When you go about creating a high performance search engine you invariably come across a point where you are trading speed for flexibility. When designing a very large scale index, you think about these tradeoffs constantly. Unless you are Google and you can afford hundreds of thousands of servers with gobs of memory, you are usually going to relegate large indexes to a physical disk. This is unfortunate because it creates massive problems when indexing a document. Sorry, let me first explain how a document is indexed: basically you invert it. Instead of a list of documents with words inside, you create lists of words with documents inside. So when you search for the word “monkey”, you just pull out the page that says “monkey” and read down the list of documents. The problem with this is that a document may have hundreds or thousands of words, and breaking up the document takes a long time because of the physical limits of the speed of the disk to which you are writing. Let’s say you have 500 words in a document, and you have a disk that is capable of locating a word’s list on the disk in 5 milliseconds (these are actually very typical numbers). The location of the word on disk is basically the average “seek time” of the disk drive. 5 milliseconds seems very fast to locate a file on disk, but when you need to do it 500 times, suddenly it takes 2.5 seconds to index a document. This is a veritable eternity in computing times, especially when you are hoping to index billions of documents. To have one server index one billion documents at this rate would take more than 79 years. Even with 1000 servers it would take nearly 29 days. This is still way too long for several million dollars in hardware. We want to do it in 1/10 that time.
So what do we do? It may have occurred to you as you read above that just because a document has 500 words, it doesn’t necessarily mean there are 500 unique words. If the word “there” is found in the document five times then we only need to do one seek to write all five of those references. Now what if we cut apart two documents at once and one document has the word “there” five times and the other has the word “there” three times. Instead of eight seeks we can cut this down to just one. This is called batch updating the index. We accumulate the info for a certain number of documents and then we can update the index all at once much more efficiently.
So this brings us to our first mention of the brain. The brain may seem like a mystical device to humans right now, but I imagine it is far from mystical. In fact it’s probably in many ways similar to a memory storage device not unlike a disk drive. We have all heard people talk about the concepts of “long term” and “short term” memory, and to me this makes perfect sense. My theory is that as we are accumulating memories throughout the day they are stored in our short term memory, and then when we sleep there is a “batch update” of our long term memory where our brain delivers the memories to the relevant sections of our brain. In order for our brains to ultimately be as efficient as possible in memory storage, it seems to me that something like this has to happen. I have done exactly ZERO research when writing this up, so I have no idea if loss of sleep has a correlation with the inability to store long term memories, but that is my guess.
Part 2: Fragmentation
If you used Windows 95, or a version of Windows roughly in that era, you may remember the program “Defrag” that was a part of Windows. What this program did was reorganize your hard drive so that all these pieces of data were in the same place. Part of this was to free up large blocks of memory, but another reason was to make sure that the data for a particular file was all stored in the same physical location on disk so that reading that data would be efficient. Maybe you noticed that if you went a long time without defragging your drive that it would take a very long time to complete, whereas if you did it every day, it wasn’t such a problem.
So this brings me back to the brain. Believing it is a physical storage device not unlike a disk drive, I tend to think that information in our brain can become fragmented over time, and that part of the job of sleep is to defrag the data in our head.
Problems with My Theory
Obviously data is not stored in our head as 1’s and 0’s like in a computer. I have no idea how it’s stored or even if it can technically be called “data”, but I do feel strongly that it is something analogous. Instead of files of data, we have nebulous clouds of something-or-others that are probably not coherent, definable, or even contiguous. The purpose of that sentence was to make you not understand it exactly. That’s kind of what I think memories are - things you can’t really understand exactly – but they probably mean something.
Other Problems With My Theory
Conclusion
It is a good idea to get lots of sleep.
