October 7, 2017

Scanning marketing claims vs realities, What does WYSIWYG mean to you?

If you are new to digitizing a book, be aware of time use expectations:
The major time consumption in digitizing is NOT the scan time, it is the cleanup time.
This will depend on the quality and sophistication of the original and how good you want the book to look and what device you want to read the book on.
Old books or books with lots of tables, illustrations or symbols (like math, polytonic Greek, extensive footnoting) are likely to produce marginal results that require extensive cleanup.
The scanning might take 12 minutes, the cleanup might take 12 hours. If you anticipate commercial distribution, it could take several days.
Your expectation and usage plans make the difference. If all you want is readability of skewed pages and no searchability on a desktop PC, CZUR will deliver quickly.
Most e-ink readers like Kindle, Kobo, Nook, etc will do very poorly with a pdf file and much better with epub or mobi searchable formats.
Searchable also means speakable, the text-to-speech feature of readers will function as expected only when documents are in OCR format.
It is much easier (and less desirable) to achieve a non-searchable image view than a searchable OCR view depending on who will use the scanned output and how professional you want it to look.

The auto-repositioning of images feature of CZUR has limited accuracy and if you want clean image based pages that look straight and well marginated to the human eye, you will need to hand crop and rotate the images. This takes time.
The CSUR will correct the rotation of pages to within two or three degrees of straightness, the offset is easily visible when it is present.
The type of layered pdf file where you can both see the original page image and simultaneously search the underlying textual characters is a format beneath it is not generated by the CZUR software.
That layered format is the quick and dirty way that the google book project and high end unattended robotic digitizers achieve fast results.
No human supervised spell check is performed and consequently hundreds, even thousands of errors remain hidden from the person reading the book.
You only see those errors if you can export the document into .epub or .docx format to pass it through software with slow human mediated spell checking.
OCR machine spell checking often misrenders the images it sees. The image of a distorted “O” may render as (] or (j ,may render as “O”. Italicized letters like b,h, rn and cl are especially vulnerable to confused interpretation. Letters or words may be randomly italicized or bolded by mistake.
The presence of page numbers or page headers/footers may interfere with your intended purpose and may have to be manually removed.
An index set in extra small type (eg 6pt) may end up as mostly human readable, but chopped up in a confusing format and randomly misspelled. Many of your lower case “e” letters can turn into “c”.

Expectations realized with CZUR scanner:
High scan rate
Better than average resolution
A set of JPEG files that may be loaded into other higher quality OCR software
Thumbprint removal happens most of the time, but is quirky–very unpredictable and the erasure eats into the page past the actual size of the thumb image sometimes chopping into words by removing leading or trailing letters.
Virtually non-existant documentation

Expectations N OT realized with image CZUR output:
Uniform margins
Straight untilted pages that are not distracting to the eye
Quality rendering of half-tone photos
Automatic contrast detection may create background color drift

Expectations NOT realized with OCRd CZUR output:
Quality rendering of half-tone photos
Rapid OCR and accurate human mediated post processing
The amazon product listing mentions a bundle with ABBYY which was not present with the shipped unit.

No matter how you slice it, book digitizing takes time and the better you want it to look, the more TIME it takes.
In some ways the CZUR is a revolutionary packaged device for high book scan rates at a consumer-oriented price.
But it does not quicken the time-consuming postprocessing aspect of digitization production.
If you already own a very high-resolution 20MP camera you might achieve a similar outcome with third party software like booksorber.
The absence of any meaningful documentation on such a sophisticated product is inexcusable and reason enough to return it in my view.
On the other hand if your computer skills are self-taught by trial and error, this may be just the product for you.
So in the end, just how fussy are you when you read the printed page, ie the distortion distraction factor? How much mental and physical energy are you willing to invest keeping things tidy?
This review is based on personally digitizing several challenging books, but just doing a hardbound novel would be a piece of cake, doing poetry or a mass market paperback is a bear.
My sense of this software [from a former developer] is that it is still at beta level, not enough features and definitely not up to an ADA certification audit, the type is too small and the contrast is loo low unless you have young eyeballs. To my senior eyeballs the screen has an uncomfortable washed out fuzzy look and is functionally at less than ten percent what you would find in the freebie Paint program that comes free with windows.
There is a similar order of magnitude difference between the free copy of Omnipage that comes with a low end Canon flatbed scanner and the inability CZUR to allow for human mediated corrections.
Sometimes the automatic image “correction” makes the image worse rather than better producing a very wavy baseline effect for lines of text.
Sales claim:”Quickly book to ebook (Less than 3 mins for a 200 pages book)” After some practice, I would not attempt to scan a 200 page book in less than 30mins with this product just for the scan alone for a simple book.

If you try the CZUR ET16 you will have a chance to answer such issues from personal experience. If you do decide to take the plunge, here are some tips:
There are lots of youtube ET16 tutorials & tips better than the ones on the CZUR site, like those from D&H Innovation & John Willis & E-Z photo scan.
If you are editing an image and want to rotate it, you can press the right and left arrow keys on the keyboard to go clockwise and counterclockwise by one degree in the rotate tool..
Make sure the book is dead center in the viewfinder, serious parallax distortions result from items placed near an edge because the camera is so close to the page.
If you accidentally scan the same pages twice, you can just delete the extra images without have to renumber anything, the same is true of deleting blank pages.
Some users that digitize glossy pages prefer a light shield box or shield for better results, others do their scanning in a dark room, lighting can be critical for the reflections from coated stock.
Interacting with “support” is fast and easy, their reply will be “Thanks for your question, watch our videos several times, have a nice day.”
Assume you will end up buying either OmniPage Pro or ABBYY Pro if you expect to do any serious work, both can import folders full of images from the CZUR.
Also assume you will have to make several cleanup passes for top notch results, one for human-mediated OCR, one for duplicate or missing pages, one for a word processor spell check, one for page aesthetics including margination and justification, and a set of searches of OCR misread trigger words like arid,arc, riot, sonic, clay, yon, ina, docs, tip, nay, and hut.
And of course the final test is actually reading the book for ugly surprises, even if all you do is read one or two chapters you will learn funky things that will repeat all through the book that you can search for.
As pretty as the packaging is, the bottom line costwise is that this item will set you back about twice what I am willing to pay considering the very low cost of a Sony 20 mp cameras and the weak software.
The unit was returned until the price/product usability becomes more realistic. Other options are way more attractive to me. A user friendly product would require a total rewrite and a clear, complete manual.

February 9, 2017

Nice little book scanner

After reading the reviews of the Fujitsu scansnap sv600, I decided to give this Czur scanner a try and I absolutely like it. Yes, it’s from China and the user manual is not in proper English, but the setup process was very simple and straight forward. The scanning software is also very simple with only three main operations, Scan, Archive, and Bulk, which are used in scanning the page, running OCR and processing the scanned ages in bulk. Without disassemble or roughly handle the book, scanning a thick book is just a matter of turning the pages with the book opened facing up, and the built-in software will correct and straighten the curved lines automatically, the result was pretty amazing. Compared to scan one page at a time facing down on a flatbed copy machine, this little handy scanner does the job much faster and easier and best of all there is no need to do any manual editing. So far I managed to scan over a dozen books without any problem. I haven’t tried the cloud scanning option yet, but already felt my money was well spent.

If anyone who needs to do some serious scanning, the Czur scanner is definitely a good choice and I highly recommend it.

June 16, 2017

Love it!

This double-side scanner is excellent, so is the customer service. All the questions have been solved with their help. You can also watch the videos on YouTube and CZUR website, giving a very good native English speaking about the features of CZUR software.
First time I used the ET16, I was surprised that a scanner can be such a convenient tool to scan books by just turning the pages. It scans them so fast and even hasn’t missed a single one. Completely scanning over 2000 pages in less than an hour made me cry.
I have hundreds of books in need of scan, and functions of flatten curves, finger removal are smart and can process automatically. What’s more, it can easily turn the image format into word file which can be edited with OCR feature.
Well, during the scanning, I also realized that the scanner still has problem that the reflect of light will affect the image quality when scanning glossy papers or glaring magazine. But two LED lights in the packaging box are good help to solve it (Bravo!)
I can see they are still working on the software and hardware to improve user experience. All in all, I highly recommend this scanner to everyone who need to scan a lot of papers.

Optimized with PageSpeed Ninja