DataScava’s fast and effective solutions help companies leverage the digital world’s explosive growth of data, 90% of which is estimated to be unstructured by 2022.
Our patented “Weighted Topic Scoring” methodology finds the most relevant documents from large data sets, providing highly precise results you can see, control and measure.
Proprietary Non-NLP Parser
Our data agnostic parser uses domain-specific language that you control to index, search, score and match unstructured textual data points, not generalized Natural Language Processing and Semantic Libraries.
Since it operates on an alternate principle to AI, it provides a unique adjunct to AI environments to filter input and measure output. In addition, it uses a top-down analysis approach that informs the data filter based on corpus-level statistics, as opposed to AI systems which begin at the word level.
DataScava can provide a true competitive advantage to companies in finance, technology, healthcare, insurance, manufacturing, the public sector or any other industry seeking to maximize the transformative power of data.
Weighted Topic Scoring
Datascava is the first and only product to offer “Weighted Topic Scoring,” which provides users with the ability to select all topics of interest, weight their significance, adjust or create new ones on-the-fly, and rank the output.
You can home in further using multi-level sort to drill down on specific data criteria, to bring key results quickly to the top. Search templates, editable topic libraries, percentile rankings and “not” capability allow for drill-downs so you can get to the precise data you need.
Real-time scoring, visualization and highlighting of data points empowers you to draw better insights and make more accurate business decisions.
Why We Don’t Use Semantic Search/NLP
Ramanathan V. Guha is responsible for products such as Google Custom Search. In their paper on Semantic Search, he and his colleagues distinguished between two major forms of search, navigational and research:
“Before getting into the details of how the Semantic Web can contribute to search, we need to distinguish between two very different kinds of searches.
In navigational search, the user is using the search engine as a navigation tool to navigate to a particular intended document. In this class of searches, the user provides the search engine a phrase or combination of words which s/he expects to find in the documents. There is no straightforward, reasonable interpretation of these words as denoting a concept. In such cases, the user is using the search engine as a navigation tool to navigate to a particular intended document. We are not interested in this class of searches.
In research search, the user provides the search engine with a phrase which is intended to denote an object about which the user is trying to gather/research information. There is no particular document which the user knows about and is trying to get to. Rather, the user is trying to locate a number of documents which together will provide the desired information. Semantic search lends itself well with this approach that is closely related with exploratory search.”
This article on Github What is Semantic Search? references Guha’s work and discusses why Semantic Search is not suitable to navigational search.
NLP and Analyzing Documents
NLP and Semantic Search attempt to disambiguate language, but proponents with real-world experience will acknowledge that NLP parsers can work to the sentence level at best.
AI can take it a step further, but it certainly doesn’t summarize an entire document or provide any means to compare/measure/filter the output so users can view and adjust how they output in normal use.
In addition, business use cases for AI solutions for large bodies of textual data largely focus on routing documents to a particular destination and/or taking some action based on their contents (initiate a process, set an alert, send an email). This requires summarizing the document overall.
DataScava does this in a usable, numerical form for routing purposes or to trigger an action. This process is visible and adjustable by users.
Impact the Bottom Line
Ease of use and transparency make DataScava accessible to both technical and non-technical people, enabling cross-discipline collaboration and a rapid path to efficiency and productivity.
It empowers data scientists, data analysts, software engineers and business users alike by automating time-consuming unstructured textual data preparation and mining tasks, positively impacting the bottom line.
As you tune the system using your subject matter expertise and set automatic data filters that can work for you around the clock, its capabilities grow and provide measurable benefits you can count on.
A Patented White-Box Approach
DataScava keeps the Human in Command using a white-box approach and is fully customizable.
Built upon our U.S. patents in “Profile Matching of Unstructured Data,” DataParser, DataIndexer, DataScorer, DataSearcher and DataMatcher convert unstructured textual content into precisely structured data for your use.
Unlike NLP systems, DataScava uses the jargon of your industry, not general linguistic and semantic libraries. You can create searchable document indices based on your own custom topics and their associated keywords and define weighted search criteria that you control.
Garbage In, Garbage Out
In AI, Machine Learning and data-driven systems, the quality of the output depends on the quality of the input. With bad data, applications produce results that are inaccurate, incomplete or incoherent.
By extracting precise user-defined business information from unstructured textual data, DataScava can help ensure input data is high quality, relevant and useful to your business.
It can also serve to measure relevant output to help you do a reality check on your assumptions.
DataScava’s open architecture makes it simple to connect and share data via SQL or the REST API in event-driven models. Whether used on its own or integrated with your existing business applications, the platform is highly customizable.
From the start, data sources you designate are accumulated in the data store, to be indexed and re-indexed as you adjust and improve the model you use.
Users identify the principal free-form inputs for the system—such as business reports, reference data, surveys, news, journals or research papers—and also the desired outputs of data for use in other platforms.
DataScava Powers TalentBrowser
DataScava’s story is backed by real results: it powers TalentBrowser, industry’s only Automated Job Matching, Skills Assessment and Non-NLP Search Engine.
For more than a decade, management consultants, investment banks, Fortune 500 firms, staffing companies and others have benefited from TalentBrowser’s ability to find exceptional talent for jobs.