DataScava is an Unstructured Data Miner for Data, Business, and IT teams

Use your business language, domain expertise, and patented matching technology to get the high-value data you need for use with applications in AI, ML, RPA, BI, Talent, Research, and Operations. Precisely curate, search, filter, tag, and route raw unstructured text data automatically with DataScava.

“DataScava perfectly complements existing approaches to unlocking the value of unstructured text data – by helping companies to model higher-level intents and purposes behind the labeling and classification of data – by capturing the abstract topics and themes that represent their own business and subject matter expertise – and by applying both to big data sets real-time.”

– Scott Spangler, Chief Data Scientist, IBM Distinguished Engineer, Author
Mining the Talk: Unlocking the Business Value in Unstructured Information”

Read More Request Demo

How It Works

Our Domain-Specific Language Processing (DSLP), Weighted Topic Scoring (WTS), and Tailored Topics Taxonomies (TTT) work as an adjunct or alternative to Natural Language Processing (NLP)They generate value-added metadata about unstructured text for use in other systems and charting, with results you can see and measure.

DataScava helps you model and capture features and topics within heterogeneous text using specialized taxonomies you can edit, create, import, and control to capture your business and domain expertise, allowing for the highly customized vocabulary and specific business logic necessary for complex document processing. 

Mine raw unstructured text — from documents, databases, emails, helpdesk tickets, contracts, reports, articles, transcripts, chats, subscriptions, filings, forms, resumes, profiles, surveys, notes, and other sources —  based on intents, interests, priorities, relevance, and more, using weighted topics and their associated key terms that you define.

What It Does

Makes UNSTRUCTURED TEXT DATA more accessible and actionable.

Encapsulates your SUBJECT MATTER EXPERTISE and nomenclature in your software.

Uses your BUSINESS and DOMAIN language, not generic NLP, NLU, or semantics.

Finds what you ARE looking for, not what it INFERS you want.

Works top-down through your ENTIRE corpus at the file level, not the sentence level.

Generates sortable taxonomy topic scores METADATA to summarize content in a numerical format.

Measures color-coded topics, HIGHLIGHTS key terms in on-topic files, filters out irrelevant ones.

Provides AUDITABLE corpus-level statistics that are explainable, transparent, and provable.

Learn More