At 8byte8 we use MongoDB in many of our projects. There is an open ticket for supporting full text search natively in MongoDB, I am patiently waiting for the day that it gets implemented. In the meantime I am using a quick and dirty solution for providing the ability for users to search public lists inĀ Lystee. You can perform a simple map reduce over the collection that you wish to build a search index for. Let’s say you have a collection with documents that have a description property, here is the JavaScript map job:
function() {
var id = this._id;
if(this.description != null) {
var terms = this.description.split(/\s+/g);
for(var i = 0; i < terms.length; i++) {
var term = terms[i];
if(term != null && term != '') {
term = term.toLowerCase();
var result = { };
result[id] = 1;
emit(term, result);
}
}
}
}
Every document in the collection gets mapped over, if the document has a
description field (presumably in paragraph form), the field gets split into multiple terms by splitting on whitespace. Each term gets lowercased, then emitted. The emit maps the
_id of the document to how many times the term appears in the document.
Here is the JavaScript reduce step:
function(term, emits) {
var result = {};
for (var i = 0; i < emits.length; i++) {
var emit = emits[i];
for(var id in emit) {
if(result.hasOwnProperty(id)) {
result[id] += emit[id];
}
else {
result[id] = emit[id];
}
}
}
return result;
}
Here the emits are aggregated and stored into an index collection that you can use for providing search functionality. The index collection _id field contains the term, and the value field contains a list of mappings of document _id‘s that contain the term to the amount of times the term appears in the document. As a side bonus you can also provide autocomplete functionality by using a regex to find documents, here is the Python PyMongo code to do so:
collection.find({'_id':re.compile('^' + t + '/*', re.IGNORECASE)})
This solution lacks some more advanced features that are desirable in text search engines, such as term stemming. If you need more advanced searching I recommend using software dedicated to text search, such as Solr.