Home   Order/Download   Products   Projects   Our Technologies   Partnership   Press   Company   Russian 
SearchInform is your power over information!
buy full text information retrieval tool try full text information retrieval tool get free full text information retrieval tool know more about full text information retrieval tool corporate edition full text information retrieval tool

Information for partners

Solvable problems
of corporate network

Technologies

SearchInform Search Technology is fast and accurate full text search for documents of similar content in any data volume, the opportunity to avoid duplicating information, integration into any application systems, as well as opportunity to work out a wide range of information retrieval applications for both corporate networks and global Internet solutions. The full text search technology operates with all the popular text files formats (txt, doc, rtf, pdf, htm, html), as well as supports adding the majority of the popular data base systems (for example, Access, MS SQL, Oracle, as well as any SQL supporting data base systems).

SearchInform Search Technology of is based on a unique mathematical model of document’s structure analysis and selecting similar words, word combinations, sentences and text arrays. The search accounts for all the multitude of words, encountered in the text with proper consideration for all the possible stem forms and synonyms.

Software products

SearchInform - is a full text search utility designated for quick information search in large data volume – among documents of any types, as well as in various data bases. It’s combines the functionality of phrasal search with proper consideration for stemming and synonyms dictionary and the new SearchInform Search Technology of searching for documents of similar content. The information retrieval application is designated for data search in large data volume and considerably increases the quality of information processing and decreases the searching time. The search speed is approximately three times as high as that of all the existent full text search engines. To make search more specific, SearchInform enables the “important words” function, which will be regarded of priority relevance by the document search application, when conducting additional similarity analysis.


Articles

«Previous« | All search engine technologies articles

Corporate Search

In the recent time we have seen the rise of various full text search programs performing search of documents in various formats, information in DBMS and informational systems, email messages and other data stored both on the hard drive of a personal computer or in the local network of an enterprise, as well as in other sources of knowledge.

The need for such search engines is conditioned by an on-going growth of textual information bulk available both to the whole society, and to each of its representatives. While until recently the full text search tools were aimed at the corporate sector (for the purposes of home use "direct search" with conventional browsing through each file was quite sufficient), now developers are working towards meeting the needs of a conventional user. After all, the bulk of information has soared. Nevertheless, the priority tendency of developing full text search technologies (in addition to the Internet) is the corporate sector.

The most important parameter of any full text search system is the speed of its operation. This relates both to indexing large amounts of data and to the speed of searching documents. It goes without saying that a highly important factor is the ability to work with various data sources, lists of supported file formats and additional functionality (support of morphology, synonyms, and various types of search). However, when you consider a certain set of required functions, the overwhelming majority of competitive information retrieval programs boast of them all.

Corporate Search Engine

The problem of organizing data into a single database is partly solved with the help of DMS, CRM and special-purpose DBMS. However, the larger the enterprise and the more diverse its activity, the more complicated it is to process information from various sources. Documents on a disk, 1C, Oracle and various informational systems - the list can go on forever. Archives of html-pages, electronic correspondence and even ICQ logs have lately grown to create a substantial "informational sector" that can be easily connected to the main data warehouses within any large company. Based on the analysis of diverse sources of entering and storing textual data, two major "dataware" problems can be derived. They are the unstructured nature of information and its search. In principle, these problems are closely interrelated. Once you acquire a good full text search system of searching information in various sources, you can dramatically systematize the results you obtain.

When there is a problem, there is a solution. They are corporate information retrieval search systems working with various sources of knowledge, both on the user computer, as well as in the local network. Their main purpose is to perform quick and accurate search of documents in large data volumes. Such special-purpose information retrieval programs are in the spotlight of this article. We will not deviate to various DMS search engines elements, however splendid they may be. After all, there is no way to compare home cinemas and TV sets built into, say, refrigerators.

Indexing

The basis of modern full text search technologies is represented by two fundamental processes. First of all, they are indexing of available information and processing queries with subsequent display of results. As regards indexing, any information retrieval application (be it a desktop full text search system, a corporate informational system or an Internet search engine) creates its own search area. In other words, it processes documents and creates an index of these documents (an organized structure that contains information on processed data). This very index is further used in the work of the search engine for a quick production of a list of required documents according to the query. The rest of the process is not that simple in terms of technology, but is quite simple to understand by a common user. The information retrieval application processes the query (by the key word-phrase) and displays the list of documents that contain this key phrase. Since the information is stored in a structured index, the query is processed much (dozens and hundreds times!) faster than in the case with direct search (the selection of documents is based on the analysis of textual information in the index rather than on browsing through every file).

The full text search application displays the found documents in the resulting list on the relevance basis, i.e., the conformity of the document to the query text. Various search technologies, undoubtedly, comprise various methods of searching and determining the relevance of the document (the number of key word "inclusions" and the frequency of mentioning it in the document, the ratio of these parameters to the general number of words in the document, the distance between the words of the query phrase in the files, and so on). These parameters serve as the basis for determining the "weight" of the document and, depending on this weight, the file ends up in the list of results in a certain position. In case of Internet-search the matter is even more complicated. After all, in this case many other factors have to be taken into account (for example, Page Rank Google). However, this is the subject of a separate article; therefore we will leave the Internet alone for the time being.

Participants and Disposition

This review is called to discover the currently fastest and smartest information retrieval systems. Seven software products have been selected for the search-test: Google Desktop Search, Copernic Desktop Search, DtSearch 7.0, iSYS 7.0 and SearchInform 1.5.02. The search marathon has been comprised of 20 gigabytes of textual information (documents in the doc, txt and html formats) including fiction extracts and various news articles from the Internet. The tests were run on a state-of-the-art office computer with processor AMD Barton 2.5 MHz, 1 gigabyte of random access memory, 160 gigabyte IDE hard drive Seagate with 7200 rpm and operational system Windows XP.

dtSearch 7.0

   Developer: dtsearch Corp.
   Official site: www .dtsearch .com
   Price: $199
   Distribution package size: 23.1 Mb

A product of dtSearch Corp., dtSearch Desktop with a built-in dtSearch Spider can index and search not only files on a user computer, but also Web nodes (at preset depth), local network resources. It can also use external indexes created on other computers. As was to be expected, dtSearch recognizes various character sets, including Cyrillic, as well as a number of file formats, such as .doc, .xls, .rtf, .pdf, .html and so on. It should be noted that the full text search application is equipped with the ability to search data in databases on the whole and by contents of specific database fields in particular.

In addition to the conventional search in "natural language" or by means of formal queries, dtSearch search engine sports some other types of searching: with account of morphology, fuzzy (implying possible errors and misprints), phonetic (with account of similar sounding words) and synonymous search. However, they are all promised abilities. It should be noted, though, that no discrepancies with the declared functions have been discovered.

The test 20 gigabytes of information have been indexed by dtSearch Desktop7.0 within 6 hours 13 minutes, resulting in a 7.9 Gb index for the purposes of subsequent search.

As regards documents search per se, the full text search application has revealed no blunders. The same proved to be true with morphology and fuzzy search. The information retrieval system properly found all required documents (though with a slight pause - after all, we are talking 20 gigabytes) both by a simple one word query and when using a couple of paragraphs from a document as the key phrase. It should be noted, though, that when searching by a large text fragment (consisting of several dozens of words) the system would "freeze" for a while before reporting the result.

The strengths of  dtSearch Desktop 7.0

The weaknesses of dtSearch Desktop 7.0

+ searching with account of morphology

+ searching with account of synonyms

+ fuzzy search

+ phonetic search

+ search within databases (via ODBC)

+ support of Outlook messages

+ support of various character sets

+ work in the local network

+ indexing  Web pagesat preset depth

- inability to connect to various sources of information  (besides DBMS) and Outlook e-mail

- low speed of searching by key phrase over 50 words

iSYS 7.0

   Developer: iSYS Search
   Official site: www.isys-search.com
   Price: $570
   Distribution package size: 38.8 Mb

The iSYS company has been on the market for 16 years, and has acquired over 10 000 consumers of its products. Since the very foundation of the company the software developed by iSYS has been aimed at business users. The software range delivered by iSYS includes full text search programs on desktop computers, in corporate networks and in the Internet.

The full text search corporate search system from iSYS is designed to secure a fast and convenient search. Whether applied on a personal computer, the Internet or the corporate network of an enterprise, iSYS indexes data and performs documents search by using statements and key phrases just as in case of Internet search engines.

iSYS supports several query methods (Command Line Query, Menu-Assisted Query, Natural Language Query); uses the document relevance algorithm and the linguistic peculiarities of the language that allow introducing such features as synonyms, fuzzy search (search with errors) and so on.

iSYS supports 125 file formats (including Microsoft Office documents, WordPerfect, email, PDF, XML, databases and so on) and 30 languages, including even Chinese, Japanese and Corean.

Indexing and processing 20 gigabytes of information by iSYS 7.0 took 6 hours 13 minutes resulting in a rather good time and size of the created file - 7.9 b.

The slightly complicated method of searching with different query versions may strike a newbie as inconvenient at first (for lack of experience). However, close scrutiny resolves all questions. The point of the matter is that the full text search application refuses to search documents by a "long" query consisting of several words. This type of search is provided for by some additional features. Among the strengths of the information retrieval application is the high quality system of automatic documents rubrication. As soon as indexing was complete, iSYS assigned all processed documents to the appropriate rubrics and presented them in a convenient form.

The strengths of iSYS Desktop 7.0

The weaknesses of iSYSDesktop 7.0

+ searching with account of synonyms

+ fuzzy search

+ support of various character sets

+ support of various query methods

+ heuristic analysis

+ support of various data sources (SQL, FTP, TRIM Context, WORLDOX 2002)

+ searching information in over 30 languages

+ a sophisticated system of automatic data rubrication

+ work in the local network

- absence of morphology support

- price

Google Desktop Search + GDE Enterprise

   Developer: Google
   Official site: www.desktop.google.com/enterprise
   Price: free
   Distribution package size with TweakGDS: 1.2 Mb

A free solution from Google is intended for information retrieval on a personal computer, in the Internet and in the corporate network of an enterprise.

Google Desktop Search Enterprise proudly sports the ability to index and search documents in dozens of the most widely spread text formats, as well as electronic mail, audio and video files tags and images. To be remembered: to be able to tell the full text search application which files and folders to index, you have to install an additional component gdetweak. Without this addon Google Desktop Search Enterprise search engine will index all information on the user computer and in the network of the enterprise that it can access. Google Desktop Search managed to process 20 gigabytes of text within 8 hours 17 minutes. The size of the resulting index was 4,5 Gb. The search speed is quite satisfactory and is on the same level as other broadly acknowledged market participants.

Unlike iSYS and dtSearch, Google Desktop Search Enterprise by right boasts of the most user-friendly interface. However, as regards administering and setting up the work in the local network, it yields to its competitors, and the difference is quite tangible. The thing is, it is quite complicated to set up network operation as you would need it in a particular situation, because the full text search system tries to do everything for you. The only way to fine tune the information retrieval application is to install additional components. This is a major disadvantage. It goes without saying that as a desktop full text search system Google Desktop Search with the gdetweak component knows no equals.

But corporate search engine is still a long run from the current state. The promised search of documents with a similar content (in the Internet originally posed as similar pages), leaves much to be desired. Apparently, for this very reason it is not included either into the "global" desktop and network versions.

The strengths of  Google Desktop Search

The weaknesses of Google Desktop Search

+ searching with account of morphology

+ searching with account of synonyms

+ Support of various character sets

+ a familiar Web interface

+ work in the local network (Enterprise version)

+ indexing electronic messages, audio and video files tags and images

+ free of charge

- the structure of addons*

 

*The point of the matter is that full scale operation of the full text search application requires downloading and installing a large number of additional modules. In order to show the information retrieval application which files and folders to index, you have to install an additional component gdetweak. Without this add-in Google Desktop Search search engine will index the whole information on user computer and in the network of the enterprise that it can access. The same goes for all other features of this full text search tool. For example, support of archives.

Copernic Desktop Search

   Developer: Copernic
   Official site: www.copernic.com
   Price: free
   Distribution package size with TweakGDS: 2.56 Mb

Copernic Desktop Search allows searching various files, email messages (supporting Outlook Express 5.x/6.x, Outlook 2000/XP/2003, Windows Address Book), Word documents, Excel, PowerPoint, Acrobat PDF, music and video files, graphics etc. To top it all, the search can be performed both on a local computer and in the Internet. Built-in tools for viewing various files allow you to see the search results. For example, if you select in the main window of the information retrieval application the thumbnail of an HTML-document, Copernic Desktop Search will display its contents. Upon installation of the full text search application a small window will be displayed in the Taskbar. In the window you can enter the search query and perform quick search set-up. The speed of search engine operation is of separate notice, as well as the low level of computer resource consuming.

Copernic Desktop Search indexed 20 gigabytes of text within 10 hours 51 minutes. The size of the resulting index was 7 Gb.

The strengths of Copernic Desktop Search

The weaknesses of Copernic Desktop Search

+ searching with account of morphology

+ an exceptionally user-friendly interface

+ indexing electronic messages, audio and video files tags and images

+ processing Microsoft Outlook and Microsoft Outlook Express electronic messages.

+ free of charge

- absence of a built-in document viewer

- absence of network support

 

SearchInform 1.5

   Developer: SoftInform Ltd.
   Official site: www.searchinform.com
   Price: $199.95
   Distribution package size with TweakGDS: 15 Mb

Though last in the list, but far from being the last in efficiency the SearchInform full text search tool is presented by the SoftInform Company. SearchInform Desktop 1.5 indexed the 20 gigabytes of test data at a record time - within 3 hours and 17 minutes. By the way, the size of the resulting index was the smallest of all, 4.4 Gb.

The search engine from the SoftInform Company was developed on the basis of a patented technology «similar contents documents search» - SoftInform Search Technology. It incorporates all tools necessary for structuring disembodied information within the framework of an enterprise and is an efficient solution to any problems of searching and consolidating information.

The high indexing rate (up to 6 Gb/hour), the small size of the index (15-20% of the actual bulk of textual information), support of virtually all wide-spread text file formats (including .pdf and .html), as well as correct work with archives are delivered all in one package.

Once you consider a minor, but extremely useful feature of SearchInform – Smart Indexing that tracks in real time computer processor capacity and adjusts the level of system resources consumption in the process of indexing, SoftInform will bear the palm of supremacy by right, to say the least.

In addition, the process of indexing (unlike other information retrieval programs in the review) is very vivid and demonstrates not only the speed, but also the number of processed documents, as well as the number of unique words by which the search will be performed.

SearchInform Corporate has proven to be an incontestable leader in search speed as well. The 20 gigabytes appeared to be a piece of cake for the search engine, while it paused after the first query only (the rest of the search was completed in an instant). The relevance of search was irreproachable.

On top of it, SearchInform Corporate, developed on the basis of the unique technology SoftInform Search Technology, sports a highly interesting feature: search of documents with a content similar to query text. Thus there is no need to preliminarily select key words, the search is performed in the whole document. The search result is the display of documents that are most similar to the query text fragment, indicating relevance ratio.

The strengths of  SearchInform Desktop 1.5

The weaknesses of SearchInform Desktop 1.5

+ searching with account of morphology

+ searching with account of synonyms

+ fuzzy search

+ Important words function for pinpointing the search

+ indexing Outlook and TheBat! electronic messages.

+ search by attributes

+ rubricator + automatic rubrication of documents

+ support of various sources of information (DBMS, DMS,, CRM, and so on).

+ network operation (the Corporate version) on the basis of NTFS inheritance of Windows authentication

+ the speed of searching and indexing

+ searching documents with a similar context*

- problems with protected PDF-documents

 

*This full text search technology is based on the mathematical model of document structure analysis and selection of similar words, phrases and text arrays. The search result is the display of documents that are most similar to the query text fragment, indicating relevance ratio. Unlike the standard phrasal search, SoftInform Search Technology helps to avoid preliminary selection of key words. This feature reduces the duration of a "search session" to the minimum. Such a convenient and much called for feature is at present the prerogative of this search engine only.

Comparison of Indexing Speed

The 20 gigabytes of information were indexed by a computer with the following configuration: AMD Barton 2.5 MHz, 1 gigabyte of random access memory, a 160 gigabyte IDE hard drive Seagate with 7200 rpm and system Windows XP+SP2.

Search system

Indexing duration

Index size

DtSearch 7.0

6 hours 3minutes

8.6 Gb

iSYS Desktop 7.0

6 hours 13 minutes

7.9 Gb

Google Desktop Search

8 hours 17 minutes

4.5 Gb

Copernic Desktop Search

10 hours 51 minutes

7 Gb

SearchInform 1.5.02

3 hours 17 minutes

4.4 Gb

Summary

Close scrutiny of the functionality and speed factors of the search engines brings us to a difficult decision. It turned out that the new solution from the Russian company SoftInform works much faster and more efficiently than its Western, "time-proved" counterparts. However…

The well promoted and absolutely free GDS Enterprise can be fine-tuned and laden with additional features only via installing plug-ins. This is how support of archives is implemented. In addition, to be able to enjoy all features of this information retrieval system is full operation, developers recommend that you acquire Premium Support. And it costs, by the way, "next to nothing", $10000 a year for every 1000 users. Without well-paid experts Google will find deploying a full-fledged working enterprise information retrieval system not quite impossible, but extremely difficult. Therefore, in view of rather satisfactory speed characteristics of the document retrieval application and its user-friendly IE-like interface we would do it justice by labeling it as a great "desktop" full text search tool, and give Google its due for attempting to put into practice the dream of Bill Gates, namely to come into every home. It is excellent branding, isn't it?

The tests revealed two major rivals, if we may call them so, they are the already known products dtSearch and iSYS, and the new solution SearchInform developed by the Russian company SoftInform. These systems brag on the ability to connect to third party sources of knowledge, such as, for example, databases, high speed of indexing and searching with advanced search features.

In addition to the highest speed of indexing and searching documents seasoned with the unique feature of searching documents with a similar content, SearchInform Corporate can act as a search engine that consolidates information within the whole enterprise. The thing is, this full text search system can process not only documents on a computer disk, or in the network of an enterprise, but also utilize other data sources, such as CRM or DMS, DBMS on the basis of MS SQL and so on. It goes without saying that SearchInform Corporate is the only information retrieval application from the review that can solve both of the most burning problems of enterprise dataware - the problem of searching documents, as well as consolidating the knowledge into a single and expedient system.

«Previous« | All search engine technologies articles

  
   Press Center
On March 3 the SoftInform company released a new version of the system for full-text search and similar documents search SearchInform 2.0. SearchInform 2.0 features a complete remake of the index creating process. Detailed...
20th of January SoftInform Company has released new version of full text search program SearchInform 1.8.02. New version has a number of useful changes, in particular has been made support of .ISO files, as well as new feature to .CSS files in native format. Detailed...
» News about our search engine
   Search engine information
Check out brand new, stylish demo-movie about SoftInform Search Technology and SearchInform application features.
Download search engine demo movie

Major problems of corporate search solved by SoftInform Search Technology
Download search engine presentation
   Our search engine awards
Best Soft 2005 Award from PCMagazine
Top rated at BrotherSoft.com
Top award from BrotherSoft.com
Top rated at BrotherSoft.com

View all awards...
   Affiliate program information
We are glad to offer you our affiliate program for our SearchInform application. Start to cooperate with us and you'll receive fee for every copy of our program sold with your help. Fill out this form to join to our affiliate program.
stretcher