One of the defenses that corporations have used when being sued is something referred to as the “IT burden defense”, the basic premise being that the effort required to find information sought by the court is an unreasonable burden on the defendant’s presumably beleaguered IT departments. It’s an interesting premise.
Driven at least partly, one presumes, by Sarbannes-Oxley, a company named Index Engines has created a “Unified Discovery Platform” that indexes all the information across an enterprise at a rate of roughly “1TB per hour per node.”
Index over a billion objects per system? It seems that the need to clean out the digital garage is disappearing – just keep everything.
Still, it seems to me that there may be some good reasons for not saving everything in my unbounded virtual garage.
The foundational assumption behind the efforts to “save everything” is that if the “saving” is cheap and the “finding” is easy, why not save everything? Which seems reasonable, until you start thinking a little deeper about the third leg of this information stool: looking (I view “searching” as a special case of “looking”).
When searching in the virtual world you’ll always find something – whether it’s the something you want is a slippery question, in part because of the curious characteristics of looking.
If I index a few billion “objects” and what I’m looking for is an “object” and I understand the indexing system, chances are good that eventually I’ll find what I’m looking for – probably along with some stuff I’m not looking for. (Which makes me think of the difficulty of throwing out those things in the garage I didn’t know I had, and wasn’t expecting to find.)
It’s the nature of looking that every once in a while the something I wasn’t looking for becomes a critical insight into what I want to know – which is why I’m “looking” in the first place.
Everything gets reduced to “sort and filter”? How do you sort and filter 100 (or 1000) virtually identical things? When I Google “eDiscovery” I see that their search engine finds over a million entries (in a mind-boggling .29 seconds). In some manner that is hidden from me, Google decides which of those million entries to display on the first couple screens – which is as far as almost anyone will look. Google often seems to ‘know’ what I’m looking for before I do. That’s creepy.
Makes one wonder about the value of finding something you’re not looking for? Does Google know that too?