Real-time mining of unstructured textual content isn’t simple. To work effectively, a solution must be fine-tuned to meet your organization’s specific needs and address the quirks in your company’s information.

To add value, applications require a vocabulary that accurately captures the definitions, context, and nuance of your business and the way it uses language. To work properly, data-driven systems require a tremendous amount of standardized, labeled and otherwise “structured” data.


Domain-Specific Language Processing and Weighted Topic Scoring

DataScava helps bridge those gaps by ensuring that input is relevant, thus improving the quality of results while reducing the risk of inappropriate analysis and badly informed decisions. All the while, it increases your business and data teams’ efficiency.

Domain-Specific Language Processing (DSLP) and patented Weighted Topic Scoring (WTS)  leverage the user’s own subject matter expertise to extract highly precise, domain-specific information and present it in a business’s context.

Because they make unstructured data more accessible, more understandable and, above all, more useful, they are a powerful alternative or adjunct to NLP, NLU, Semantic and Boolean Search.



Why We Don’t Use Natural Language Processing or Semantics

Although they are powerful and versatile technologies,  NLP and Semantics proponents with real-world experience will acknowledge that NLP is hard. In general, NLP is useful for relatively simple tasks, such as automating a phone attendant’s call routing or a chatbot’s responses.

AI can take it a step further, but it certainly doesn’t summarize an entire document or provide any means to compare, measure and filter so users can view and adjust how they output in normal use.

In addition, business use cases for AI solutions for large bodies of textual data largely focus on routing documents to a particular destination and/or taking some action based on their contents (initiate a process, set an alert, send an email).  This requires summarizing the document overall.

DataScava works at the document level, summarizing textual content in a usable, numerical form for routing purposes or to trigger an action using a process that is adjustable by users.


Navigational vs. Research Search

DataScava excels at navigational search  – when users want to “navigate” to the most relevant overall documents — using scored topics, keywords, and phrases that are in context, visible and under your control.

In his paper on Semantic SearchRamanathan V. Guha, responsible for products such as Google Custom Search, distinguished between two very different kinds of searches,  navigational and research, as follows:

  • Navigational Search:  the user is using the search engine as a navigation tool to navigate to a particular intended document. In this class of searches, the user provides the search engine a phrase or combination of words which s/he expects to find in the documents. There is no straightforward, reasonable interpretation of these words as denoting a concept. In such cases, the user is using the search engine as a navigation tool to navigate to a particular intended document. We are not interested in this class of searches.
  • Research Search:  the user provides the search engine with a phrase that is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about and is trying to get to. Rather, the user is trying to locate a number of documents which together will provide the desired information. Semantic search lends itself well with this approach that is closely related with exploratory search.”

This Github article What is Semantic Search? references Guha’s work and discusses why Semantic Search is not well-suited to navigational search.



Open Architecture and Highly Customizable Platform

DataScava’s open architecture makes it simple to connect and share data via SQL or the REST API in event-driven models. Whether used on its own or integrated with your existing business applications, the platform is highly customizable.

From the start, data sources you designate are accumulated in the data store, to be indexed and re-indexed as you adjust and improve the model you use.

Users identify the principal free-form inputs for the system—such as business reports, reference data, surveys, news, journals or research papers—and also the desired outputs of data for use in other platforms.

DataScava Datasheet