DataScava is a collaborative unstructured text data mining tool for Data, Business, and IT teams

Use your business language and expertise to get the high-value data you need with our patented domain-specific approach.
Curate, search, and filter large unstructured datasets to fuel applications in AI, ML, RPA, BI, Research, Talent, and more.

“DataScava perfectly complements existing approaches to unlocking the value of unstructured text data – by helping companies to model higher-level intents and purposes behind the labeling and classification of data – by capturing the abstract topics and themes that represent their own business and subject matter expertise – and by applying both to big data sets real-time.”

– Scott Spangler, Chief Data Scientist, IBM Distinguished Engineer, Author
Mining the Talk: Unlocking the Business Value in Unstructured Information”


Read More Request Demo


Turn unstructured text data into Information you Can Act On

Our Domain-Specific Language Processing (DSLP), Weighted Topic Scoring (WTS), and Tailored Topics Taxonomies (TTT) methodologies work as an alternative or adjunct to Natural Language Processing (NLP)They generate metadata about unstructured text and results you can see and measure.

DataScava is a fast, easy-to-use tool for modeling and capturing features and topics within heterogeneous text using your organization’s own taxonomy and a rules-based approach, which allows it to be highly customized around your vocabulary and for the design of specific business logic necessary for complex document processing. Use it to mine unstructured text data based on content, intents, and topics of interest you define and control.


How DataScava is Different


It does not INFER what you’re looking for, it finds what you ARE looking for.

Works top-down through your ENTIRE corpus at the file level, not the sentence level.

Uses your BUSINESS and DOMAIN language, not NLP, NLU, or semantics.

Generates sortable taxonomy topic scores METADATA to summarize content in a numerical format.

Measures color-coded topics, HIGHLIGHTS key terms in on-topic files, and eliminates irrelevant files.

Provides AUDITABLE corpus-level statistics that are explainable, transparent, and provable.

Encapsulates your SUBJECT MATTER EXPERTISE and business nomenclature in your software.

Makes UNSTRUCTURED TEXT DATA more accessible and actionable.

Learn More