Swishd cluster system is an application that will allow swish-e to scale out to multiple machines. Thus allowing the number of indexes (or collections) to become almost limitless. By scaling out to multiple swishd nodes the index sizes can remain small as the number of documents increase. This it typically measured in millions of documents/files.
A client makes a TCP connection to the cluster_mgr (default port of 5500). The client sends a query in XML format to the cluster_mgr. Cluster_mgr will in turn connect to each swishd node indicated in the configuration file (TCP 5000) and submit the search query to each node for the collection specified in the client query. The swishd node(s) will run the search against theappropriate index and return results to cluster_mgr. cluster_mgr will in turn assemble and sort the results by rank and return in XML format back to the client.
Search Query Format
The swishd nodes can house several indexes which can be categorized into several "collections". For example there can be a document collection for sports and another forlegal documents. You may want to search for the phrase "Jason Giambi" and get news about his legal cases but you may not necessarily want news about games he has played. To do this, you would specify the collection for your legal documents in the search query.
The client sends the original query in XML format. An example of the format is as follows:
sports legal Jason Giambi
This would instruct the swishd nodes to search both the legal and the sports collections for any documents containing the query phrase.
Results Format
Cluster_mgr will return the final results to the client in XML format. An example of the format is as follows: