semagix

Since its inception, most important functionality of the worldwide web has been the capacity to search for keywords. In the very beginning, search engines catalogued the Web by using meta tags as guideposts.

Google achieved its initial preeminence in the world of Internet search by developing a mathematical algorithm that ranked websites on the basis of inbound links. Other search engines quickly followed suit with their own algorithms. Over the years these algorithms grew increasingly complex.

In recent years the big buzzword of search has been personalization i.e. results customized to meet the needs of the searcher, using as criteria sites that user has already visited and/or things that user has already searched for. Semantic search is a form of personalized search that uses the science of meaning in language rather than mathematical algorithms in order to produce relevant results. It examines sentence structure, syntax, dictionaries, and thesauri to determine how closely Web pages are connected with one another With some modifications, it is the technique used by the popular search engine Ask.com.

Sound tricky? It is. For one thing the English language is notoriously ambiguous, filled with homonyms, homographs and capitonyms. Consider the word “tear”: in context it can mean either the moisture that trickles down from your eyes if you liked “Avatar” or the savage motion with which you mutilated your ticket stub if you didn’t. Suppose a user types “tear” into a search box. How does the search engine know which meaning that user is intending? Different semantic search engines solve this dilemma in different ways. The process of figuring it out is called disambiguation.

Hakia is a self-styled “meaning-based” search engine that organizes its search results by tab into web search results, credible sites, news and images. (Credible sites are results from websites that have been specially vetted by information professionals.) For unambiguous queries, Hakia produces resumes – specialized portals with links to all sorts of information on a particular subject. For disambiguation, it uses Natural Language Processing algorithms.

SenseBot synopsizes search results into a summary focusing on the most relevant aspects ascertained through a data processing technique called text mining. A characteristic feature of this summary is the tag cloud that relates your inquiry to other relevant concepts. Sensebot does not handle disambiguation very well with the result that many of its own results are unintelligible.

Powerset is a natural-language processing technology that was recently acquired by Microsoft. It aggregates information from across multiple articles and is occasionally even capable of answering direct inquiries. Its trademark feature is a box labeled “Factz” that consists of recommendations for further research.

DeepDyve is a semantic search engine that accesses the so-called “Deep Web”, that part of the Internet that is not indexed by conventional search engines, and includes scientific databases and articles.

Cognition bases its search process on a proprietary Semantic Map and applies its capacities to four bodies of knowledge: the complete database of US Supreme Court Decisions and Court of Appeals decisions from 1950 on, MEDLINE, the English version of Wikipedia and the New English Translation including text and translators’ notes of the Biblical New Testament.