By Timo Selvaraj
SearchBlox 8.3 enables searching text within images through integration with the tesseract package. This integration allows organizations to OCR images during the indexing process when the tesseract package in installed. Search can be performed against the text within the images allowing you to find content even when meta data may be missing. The tesseract package processes the image and extracts text contained within the image in a fast manner. This is a unique integration which allows images to be processed for specific use cases including content detection.
#1 Scanned paper can be searchable
Organizations have to grapple with scanned paper which contains text requiring a manual conversion process. Now with the integrated OCR and search, simply searching for the text can reveal the information found within the digitized paper. This feature is useful for mining data from old manuscripts, receipts, paper order forms and historical documents.
#2 Single step process to OCR + Search
SearchBlox combines the process of OCR and indexing the content extracted, thereby saving a lot of time involved in processing the file. Organizations can save considerable amount of time and computing resources spent today by combining the process.
3# Advanced Faceted Search including wildcard search
SearchBlox allows you to search using wildcard, fuzzy, boolean, proximity and other advanced search operators enabling you to find files even when the quality of converted text found within them is below par. Search allows you to mine the data in different ways to find the exact file you are looking for including apply synonyms and auto-suggestions.
Enterprise Search on AWS
Open Distro for Elasticsearch
Google Search Appliance
Google Custom Search
Google Site Search
Solr to Elasticsearch Migration
Search on Docker
Read our recent Blog posts
4870 Sadler Road Suite 300
Glen Allen, VA 23060
Phone: (866) 933-3626