In the last post I discussed the advantages of NoSQL using a music collection example. I’d like to go back to that example and expand the conversation a bit:
Wouldn’t it be great to be able to find a song when you only remembered a few lines of the lyrics? They are part of the information that’s available after all – what if we wanted to include them in our database?
Actually, we do this all the time using search engines on the web, don’t we? So why not use the same technology against our own data?
It turns out that search really makes things much easier within the context of data management – using search technology we can start exploring data before we even model it, applying the same iterative, agile methodology used in software development to data modeling and information discovery. Using search also means that we can span structured values and free form text when we look for information (e.g. lyrics and track titles at the same time).
Here’s another example: We’ve all been bombarded by endless quantities of email, which could be quite challenging to manage at times.
Like many people, at some point I developed a system to organize my email into folders and sub-folders based on various criteria. This worked well for a while, but I soon realized that organizing the email took almost as long as reading it, and I constantly needed to tinker with the classification to accommodate new data.
Then at some point I realized that my email client could very effectively search through every header attribute, as well as the full text of each message. There was really no need to spend all this time creating a sophisticated filing systems when the data can be searched using efficient indexes that return the results in an instant.
Enterprise data has been going through a similar process: companies have spent years organizing and re-organizing their information, only to get to a point where managing it has became all but unwieldy due to increasing volume and diversity. They are now discovering that search combined with NoSQL is a much more powerful way to manage information than relational systems…