Overview of recent classification projects.
1. Information retrieval
Ilse Media is the most popular Internet search engine in the Netherlands after Google. Ilse uses Irion’s TwentyOne Language Expansion Box to allow web users to make linguistic searches. Language Expansion Box is a special combination of Irion modules that allows users to get meaningful linguistic feedback from the queries they type in; it can search for synonyms, hyponyms, hyperonyms, and other morpho-syntactic variations of query terms. If a user types in a query that doesn’t yield immediate results, the system will suggest other queries that are strongly related to what the user is asking for. This is an intelligent way to expand queries according to semantic areas they belong to. Ilse Media also use Irion’s automatic classification system TwentyOne Classify, to automatically classify queries along business categories: if a user types in query, the system will come back with suggestions for relevant business areas. In this way users can be pointed to relevant information, but the system also allows for query-specific advertisement generation. This type of classification is – when compared to Language Expansion Box – fully customizable for any domain by the publisher. A comparison: a query like tuna would in a semantic network perhaps lead to saltwater fish, and in a customized classification setting to famous fish restaurants. And a query like Britney Spears would in a semantic network lead to nothing, but in a customized classification setting, to online record shops. Ilse Media want to use both classification types.
This Dutch “Yellow Pages” is the Netherlands’ best used business address reference guide, and lists all businesses and institutions in the Netherlands, organized by subject and category. Klanten and businesses can use it to find each other quickly and efficiently. The guide is available online and in hard copy. Gouden Gids is improving its existing search functionality with intelligent search functions, which cater for spelling variants, abbreviations (e.g. V&D), typo’s and synonyms. To achieve this, they lease Irion’s Language Expansion Box, similar to the Irion product used by Ilse Media. In addition Irion built an automatic classification system that automatically classifies both Bedrijf descriptions and queries according to the Gouden Gids headings list (organization types).
Kluwer is a large Dutch publishing Bedrijf that specializes in high quality legal and financial publications. Kluwer SU Internet Center, a separate department of Kluwer, has carried out several experiments on legal data to test the quality and potency of Irion’s TwentyOne Classify for the automatic classification of legal documents. These experiments have lead to the insight that very high accuracy scores can be achieved with automatic classification, and after a short period of training Kluwer’s Department “Binnenlands Bestuur” are now using TwentyOne Classify for the automatic classification of their documents. The use is to be able to link customer profiles to information retrieval in order to achieve personalized information delivery.
TNO Work and Employment
TNO Work and Employment is one of the larger TNO institutes in the Netherlands. They are currently working for the Dutch government to build a new version of the web portal for health issues related to work. Irion was chosen to provide the new cross-lingual search engine TwentyOne Search for this web site. Apart from being the first real cross-lingual search engine on the web in Europe, the retrieval functionality of the search engine is enhanced with a classification system that automatically classifies the documents to be indexed according to 3 different classification schemes. In this way the user has guided navigation, by first typing in a query, and then cutting down the results list by selecting one or more of the relevant categories from the three classification schemes.
MarketXS was founded with a vision to radically change existing concepts of market data technology. MarketXS pioneered the concept of using software as a service model to provide access to market data for financial institutions and their employees and Klanten, demonstrating how the service would render conventional on-site systems obsolete. The service delivers integrated and scalable market data solutions designed from the ground up to take full advantage of the Internet to provide Klanten with rapid deployment and low total cost of ownership. Klanten include financial websites, banks, insurance companies, securities dealers, vendors and corporate Klanten. Apart from using Irion’s automatic summary generator, MarketXS also implemented Irion’s automatic classification system TwentyOne Classify, in order to automatically add metatags taken from different classification schemes (like Dow Jones, Industry Codes, etc.) to their news feeds.
Automatic classification at a large government organization
Due to a strict confidentiality agreement we cannot reveal the name of this customer. Our customer wanted to improve the search functionality of their corporate web site, by adding automated classification to the content, based on TwentyOne Classify. The amount of information was very large, and no training data were available. As a further challenge, the whole project had to be completed within a few days. Irion used special clustering technology to divide the whole database rapidly into significant clusters. These clusters were subsequently used to train the system, which could be done in just a couple of hours. In this way Irion succeeded in completing a training job that would normally take three weeks of manual work by experts within just one day!
PCM is the largest newspaper publisher in the Netherlands. PCM publishes many news articles every day in various newspapers, covering a wide range of subjects. All these articles need to be archived and classified for improved retrieval using preselection on metatags. Articles have always been classified manually, by specially educated experts (documentalists). In September 2003 PCM started a pilot project by leasing TwentyOne Classify, which allows them to classify some 400 newspaper articles per minute fully automatically. TwentyOne Classify Server has a special client application, which is implemented to enable PCM employees to launch a web browser to gain easy access to databases of newspaper articles from any place in the Bedrijf. Newspaper articles can be classified fully automatically or interactively, when TwentyOne Classify provides suggestions for thesaurus classes that users can accept or reject. The PCM pilot project was implemented in the workflow in just one day.
2. Classification for guided navigation
D-Reizen is an important brand name for the travel industry in the Netherlands. They have offices all over the Netherlands, a very frequently visited website, and are seen by many people as the travel agency with the best bargains. D-Reizen wants to expand their service to their Klanten, to give them exactly what they’re looking for with less hassle and shorter waiting times. To achieve this, they are currently implementing TwentyOne Dialogue. This is a special edition of the classification engine, which not only automatically classifies documents, but also allows end users to easily navigate through large numbers of documents in an interactive, natural language driven dialogue.
3. Classification for reasoning and intelligent knowledge management
Intrasurance is one of the fastest-growing online insurance brokers in the Netherlands (see www.verzekeruzelf.nl). They are implementing a special edition of Irion’s classification engine that enables knowledge disclosure and knowledge sharing by intelligently analyzing textual documents. Insurances are complex products, and they need a proper, personalized and dedicated customer support. The problem is, that there are so many different cases, that it is difficult to cluster them in such a way, that non-specialized people at a helps desk can perform the right support. Intrasurance are now training the system to automatically recognize the type of request with a customer, and automatically generate the best possible answer or information bundle. The customer support person at the help desk can use these automatically generated bundles as the basis for an answer without having to be specialized in the matter. In this way Intrasurance can not only give a much better support to their Klanten, but they also train their own employees at the help desk to achieve a better knowledge of the domain of insurances. Last but not least, in this way Intrasurance builds up a very valuable knowledge base, which can be used independently of specialists, so that their dependency of specialists becomes less and less, and knowledge becomes more and more transferable and manageable.
Provincie Gelderland is a large regional Dutch government organization. They are setting up an interactive website for people interested in environmental issues. This site will bring experts together and enable them to exchange information and knowledge freely and easily. An important role for the site will be to link people and connect documents by topic, and provide visitors to the site with up-to-date, relevant and complete information. Provincie Gelderland was looking for tools to automatically classify and summarize textual information, and they are now using both TwentyOne Classify and the summarization system Sinope. TwentyOne Classify is especially used to connect people to each other who share interest in particular domains. This works in the following way. If a specialist enters a document about a specific topic, this document is automatically summarized and classified, and both the assigned classes and the summary are presented tot the specialist, for post-editing. After a process of post-editing – which can be skipped by the specialist if it the automatic processes were performed correctly – the document is stored in the knowledge base, and every person linked to the forum is weighted against the profile of the document. If the subject of the document is close to the fields of interest of a user, the user will automatically receive an E-mail with a link to the new document, a summary, and a relevance percentage. In this way Provincie Gelderland can automatically share documents and knowledge and mobilize people who share interest.
Syntens is a large Dutch organization whose objective is stimulation of innovation for Dutch SME’s. They employ two hundred specialized consultants who work in various branches of industry. Syntens need to understand trends in the demand for information and knowledge in industry. Syntens uses Irion’s product TwentyOne Concepts (Taxon version) to automatically generate taxonomies from textual data, which – combined with statistical analysis – clearly show such trends.
4. Classification for filtering
Automatic filtering at a multinational oil Bedrijf
Due to a strict confidentiality agreement we cannot reveal the name of this customer. Our customer wanted a system that could automatically filter from their employees’ notebooks any information that could be judged as insulting or obstructive if examined by the authorities of countries the employees would visit. In cooperation with a partner, Irion developed a CD-ROM based on TwentyOne Classify filtering technology; in operation it is very similar to a virus scanner, but instead of scanning viruses it scans for potentially insulting or suspicious material in documents.
5. Classification for routing
Berghauser Pont is an information broker for government information, which is provided to their Klanten via the Internet. Berghauser Pont is using Irion’s service TwentyOne Search Portal to make this information searchable over the web. They also use TwentyOne Classify to automatically classify the information, and provide it in a structured way through thematic entries.
Jacobs Bedrijf is one of the most important business intelligence companies in the Netherlands, providing solutions for strategic use of information and ICT. Jacobs Bedrijf are now expanding their services with Irion’s advanced automatic classification solutions, to use classification schemes that conform to the internationally standardized IPTC (International Press and Telecom Community) codes.
V4M are a joint venture of Gouden Gids and Ilse Media (www.v4m.nl), specialized in sponsored link marketing. For them Irion carried out a project to determine how valuable keywords, i.e. keywords that are typed in frequently and should lead users to specific domains, can be generated automatically from these domains and related semantic networks, using natural language technology.
Bouwradius is an important provider of information for the building and construction industry. One problem for information brokers in this field is that the information spectrum is very wide, and users searching for information often have difficulty finding what they are looking for. The Irion Partner Aduna has developed a semantic web portal system called “Spectacle” that allows users to personalize the information structure (the taxonomy) of a portal by themselves. This requires an adequate pre-classification of all portal content, which is done using Irion’s product TwentyOne Classify. Aduna has successfully integrated Spectacle with TwentyOne Classify and is now marketing this combination.