An Unstructured Data Miner that keeps the Human in Command

Classify, index, measure, label, curate, filter, and route unstructured text automatically in real-time using your business language, domain expertise, and patented matching technology. Get the high-value data you need for use with applications in AI, ML, RPA, BI, Talent, Research, and Operations.

For Data Scientists, Data Analysts, BI Specialists, Subject Matter Experts, Business Analysts, and IT. 

“DataScava perfectly complements existing approaches to unlocking the value of unstructured text data – by helping companies to model higher-level intents and purposes behind the labeling and classification of data – by capturing the abstract topics and themes that represent their own business and subject matter expertise – and by applying both to big data sets real-time.”

– Scott Spangler, Chief Data Scientist, IBM Distinguished Engineer, Author
Mining the Talk: Unlocking the Business Value in Unstructured Information”


Read More Request Demo


Accurate. Fast. Collaborative.

DataScava enables a more data-centric approach to business applications, with topic models which reflect the primary areas of focus, flexible topic scoring to encode your organization’s priorities, and customized text processing that mirrors the way people actually communicate in the industry. It generates value-added visible metadata you define about messy text data for use in DataScava, other systems, and charting.

Precisely mine unstructured text based on intents, interests, priorities, and relevance — from emails, research reports, support tickets, contracts, news feeds, transcripts, chats, subscriptions, filings, forms, resumes, profiles, surveys, notes, and more. Curate quality training data from large data sets for AI and ML; filter and route emails for RPA; find the right talent for a job; measure the skills of your workforce; mine annual reports, and more.


How it Works

Our Tailored Topics Taxonomies (TTT), Domain-Specific Language Processing (DSLP), and Weighted Topic Scoring (WTS) work together as an alternative or adjunct to Natural Language Processing (NLP). They enable cross-discipline collaboration between technical and non-technical staff.

TTT models and captures features and topics within heterogeneous text using specialized taxonomies you can select, create, edit, and import to capture business language and domain expertise, allowing for the highly customized vocabulary and logic necessary for complex document processing.

DSLP indexes the textual contents of each document based on the business context using your TTT. It works at the file level to generate weighted topic scores and other metadata, surfacing key results and relevant documents from large datasets, providing results you can see and control.

WTS accurately measures and matches topics according to user-defined score thresholds and labels documents into appropriate cohesive categories using heuristic techniques that are tailored to a specific business purpose.



How it’s Different

It makes UNSTRUCTURED TEXT DATA more accessible and actionable

Encapsulates your DOMAIN EXPERTISE in your software

Uses your BUSINESS LANGUAGE, not generic NLP, NLU, or Semantics

Generates sortable Weighted Topic Scores METADATA to summarize content in a numerical format

Measures color-coded topics, HIGHLIGHTS key terms in on-topic files, filters out irrelevant ones

Finds what you ARE looking for, does not INFER what you want

Works top-down through your ENTIRE CORPUS at the file level, not the sentence level

Provides AUDITABLE corpus-level statistics that are explainable, transparent, and provable

Learn More