By 2022, more than 90 percent of the world’s electronic information will be unstructured, delivered in business reports, research papers, emails, and other formats – all of them written in different styles and using terms whose meaning differs from sector to sector.

DataScava uses the language of your own business and domain to help you leverage the digital world’s explosive growth of unstructured text data. You don’t need to be a data scientist to use it, but you can be.

Our unique solution provides a true competitive advantage to companies in finance, healthcare, technology, insurance, manufacturing or any other industry that is seeking to maximize the transformative power of data.


Domain-Specific Language Processing and Weighted Topic Scoring

DataScava uses our proprietary Domain-Specific Language Processing (DSLP) and patented Weighted Topic Scoring (WTS) methodologies to index textual content and curate the most relevant documents from large datasets for use in AI, Machine Learning, Robotic Process Automation, Research, Prediction Engines, Business Analytics and Production systems.

With DataScava you can . . .

INDEX selected unstructured textual content using DSLP.

SCORE and generate metadata including Topic Scores, Percentile Rankings, Data Tags.

MATCH your SEARCH topics using WTS and your Search Templates.

CURATE a subset of highly precise files and FILTER new data automatically.


Convert Unstructured Textual Content into Structured Data



DSLP converts unstructured textual content into precisely structured derived data and searchable document indices.

Built upon our two U.S. patents in “Profile Matching of Unstructured Data,” DataScava uses your business language, jargon and acronyms.

We don’t use any form of generalized Natural Language Processing, Natural Language Understanding, Semantic Search, linguistics or fuzzy logic which ensures DataScava is accessible to both technical and nontechnical users.


Precise Results You Can See, Control and Measure


DataScava is the first and only product to offer patented Weighted Topic Scoring, which finds and filters the most relevant documents from large unstructured data sets, providing highly precise results you can see, control and measure.

With WTS, users select search topics of interest, set required minimum scores files must meet in each topic to match, add nice-to-haves, and rank the resulting output. WTS scores found topic keywords and phrases using DLSP, creating a shortlist of highly precise results.

It only matches files that meet or exceed ALL weighted topic score requirements and highlights color-coded found topics in the file textual content. Users can adjust WTS, add or edit new topics on the fly, and hone in further, using multi-level sort to drill down and surface key results.


As Flexible as Language


Users can create, select and adjust all topics of interest on-the-fly, weight their significance, and rank the output.

They set minimum “required” and “nice-to-have” score thresholds to be met in each topic, and home in further using multi-level sort to drill down on specific topics of interest, bringing strong matches quickly to the top.

Customized Company-Specific Search Templates and Topics Libraries, percentile rankings and “not” capability help you get to the precise data you need.  Real-time scoring, visualization, and highlighting of data points empower you to draw better insights and make more accurate business decisions.


White Box Approach with a Human in Command


DataScava keeps the Human in Command through a white-box approach that uses human intelligence
(not artificial intelligence) and machine training (not machine learning).

It empowers business users, data scientists, data analysts, and software engineers
by automating time-consuming unstructured textual data preparation and mining tasks.

Operating around the clock and at the direction of expert users, DataScava continually
refines its capabilities in a measurable way.


Control the Input, Measure the Output in AI/ML/RPA


The quality of the output depends on the quality of the input. With bad data, applications produce results that are inaccurate, incomplete or incoherent. By extracting precise user-defined domain-specific information from unstructured textual data, DataScava can help ensure input data is high quality, relevant and useful to your business, avoiding the problem of garbage in, garbage out.

Since DataScava operates on an alternate principle to AI, it provides a unique adjunct to AI environments to filter input and measure output to help you do a reality check on your assumptions. It also uses a top-down analysis approach that informs the data filter based on corpus-level statistics, unlike AI systems which begin at the word level.


DataScava Powers TalentBrowser


DataScava’s story is backed by real-world results. It powers TalentBrowser, industry’s only Automated Job Matching, Skills Assessment, and Domain-Specific Search Engine.

For more than a decade, management consultants, investment banks, Fortune 500 firms, staffing companies and others have benefited from TalentBrowser’s ability to find exceptional talent for jobs.