By 2022, more than 90 percent of the world’s electronic information will be unstructured, delivered in business reports, research papers, emails and other formats – all of them written in different styles and using terms whose meaning differs from sector to sector.

DataScava uses the language of your business to help you leverage the digital world’s explosive growth of unstructured data. And you don’t need to be a data scientist to use it, but you can be.

Our solution provides a true competitive advantage to companies in finance, healthcare, technology, insurance, manufacturing or any other industry who are seeking to maximize the transformative power of data.


Domain-Specific Language Processing and Weighted Topic Scoring

DataScava uses Domain-Specific Language Processing (DSLP) and patented Weighted Topic Scoring (WTS) to index textual content and curate the most relevant documents from large datasets for use in AI/Machine Learning, prediction engines, business analytics and production systems.

INDEX selected textual content using DSLP.

SCORE and generate metadata including Topic Scores, Percentile Rankings, Data Tags.

MATCH your SEARCH topics using WTS and your Search Templates.

CURATE a subset of highly precise files and FILTER new data automatically.



Convert Unstructured Textual Content into Structured Derived Data


DSLP converts unstructured textual content into precisely structured derived data and searchable document indices.

Built upon our two U.S. patents in “Profile Matching of Unstructured Data,” DataScava uses your business language, jargon and acronyms.

We don’t use any form of generalized Natural Language Processing, Natural Language Understanding, Semantic Search, linguistics or fuzzy logic which ensures DataScava is accessible to  both technical and nontechnical users.



Results You Can See, Control and Measure

DataScava is the first and only product to offer patented Weighted Topic Scoring, which finds and filters the most relevant documents from large unstructured data sets, providing highly precise results you can see, control and measure.

With WTS, users select search topics of interest, set required minimum scores files must meet in each topic to match, add “nice-to-have’s, and rank the resulting output. WTS scores found topic keywords and phrases using DLSP, creating a shortlist of highly precise results.

It only matches files that meet or exceed ALL weighted topic score requirements, and highlights color-coded found topics in the file textual content. Users can adjust WTS, add or edit new topics on the fly, and hone in further, using multi-level sort to drill down and surface key results.



As Flexible As Language

Users can create, select and adjust all topics of interest on-the-fly, weight their significance, and rank the output.

They set minimum “required” and “nice-to-have” score thresholds to be met in each topic, and home in further using multi-level sort to drill down on specific topics of interest, bringing strong matches quickly to the top.

Customized Company-Specific Search Templates and Topics Libraries, percentile rankings and “not” capability help you get to the precise data you need.

Real-time scoring, visualization and highlighting of data points empower you to draw better insights and make more accurate business decisions.



White-Box Approach with a Human In Command

DataScava keeps the Human in Command through a white-box approach that uses human intelligence
(not artificial intelligence) and machine training (not machine learning).

It empowers business users, data scientists, data analysts, and software engineers
by automating time-consuming unstructured textual data preparation and mining tasks.

Operating around the clock and at the direction of expert users, DataScava continually
refines its capabilities in a measurable way.



Control the Input, Measure the Output in AI/ML

The quality of the output depends on the quality of the input. With bad data, applications produce results that are inaccurate, incomplete or incoherent.By extracting precise user-defined domain-specific information from unstructured textual data, DataScava can help ensure input data is high quality, relevant and useful to your business, avoiding the problem of garbage in, garbage out.

Since DataScava operates on an alternate principle to AI, it provides a unique adjunct to AI environments to filter input and measure output to help you do a reality check on your assumptions. It also uses a top-down analysis approach that informs the data filter based on corpus-level statistics, unlike AI systems which begin at the word level.



DataScava Powers TalentBrowser

DataScava’s story is backed by real-world results. It powers TalentBrowser, industry’s only Automated Job Matching, Skills Assessment and Domain-Specific Search Engine.

For more than a decade, management consultants, investment banks, Fortune 500 firms, staffing companies and others have benefited from TalentBrowser’s ability to find exceptional talent for jobs.

DataScava Datasheet