scanning

What do you do when Digital Restrictions Management prevents you from doing a lot of things on your own device. I do not know if we can even say it is a device we own, as the company offering books to us can revoke the books at will, without asking you. This was infamously and ironically seen in the removal of Nineteen Eighty Four from Kindle devices without their owners permission.
This is what RMS has to say about Kindle and its practices by Amazon:

“This malicious device designed to attack the traditional freedoms of readers: There’s the freedom to acquire a book anonymously, paying cash — impossible with the Kindle for all well-known recent books. There’s the freedom to give, lend, or sell a book to anyone you wish — blocked by DRM and unjust licenses. Then there’s the freedom to keep a book — denied by a back door for remote deletion of books.” — Richard Stallman

So what do you do against such mal-practices and devices operations which are defective by design?
Since these companies do all in their power to prevent users from taking any stuff out, using all hi-fi programming, what can one do about them?
Here is one low tech solution! And one fine use of Lego Mindstorms!
[vimeo http://www.vimeo.com/73675285 w=400&h=225]
via DIY kindle scanner
Also if you are rather old-fashioned, and even lower tech solution would be to simply one can just make a carbon-copy of the Kindle e-book from a copier or scanner, thanks to their E ink technology, it is as good as a printed book.

Update: 13 Feb 2025

I have made a script to combine images to form searchable pdf which also supports Indic languages. The script and some documentation can be accessed at

https://gitlab.com/the-mitr/ocr-indic-pdf/

Suppose you have an ebook or an article in pdf format, which unfortunately is not cleaned. By not cleaned we mean

Single page scan with edge darkening, pages not aligned that is text is rotated differently , page size different, library and use marks marks etc.
2-in-1 scan: Two pages simultaneously scanned together, the central spine dark band, pages not rotated properly, edge and wear marks, library marks etc.

In this case we cannot use the tools like scantailor for cleaning the images directly. For this we first need to extract images from the PDF file and then do a processing on these images. One can do extract the images one by one and process them, but then we can do it in a better way also.
First we split the pdf file into single PDFs by using the most versatile pdftk
For this in the terminal type
$ pdftk file.pdf burst
It will create as many pdf files as there are pages. with names like pg_0000.pdf etc.
Now next task is to convert these pdf to images, for this we use the convert command, but we don’t want to convert files one by one by
convert pg_0000.pdf pg_0000.tiff
But this is not very useful for large number of files, we want to make this in one go. So we do the following
$ for i in $(ls | grep pdf;);
do
convert -density 600 $i $i.tiff;
done
Lets see what these commands do:
ls
will list all the files in that directory
ls | grep pdf
This will filter out the files with pdf in the filename and provide us with a list
On this list we can do a lot of operations as we do in on any other list
for i in $(ls | grep pdf)
is calling each member of this list that we generated and treating it as variable i
and for each memberwe
do
the following
convert -density 600 $i $i.tiff
and after this is over the task is
done
We can set the dpi for the output images by passing the number, above it is set as 600. The output images will be named same as the input pdf files.
Now we can happily run scantailor on these images to clean them up!
PS:
Instead of a PDF if you have a djvu file we have another approach.
Step 1
Convert the djvu file into a multipage tif file, by using ddjvu command.
$ddjvu -format=tiff -verbose -quality=uncompressed input_file.djvu output_file.tif
With this command we will get a tiff format, with same resolution as the original djvu file.
Once the multipage tif file is there, it can be split into its original pages by tiffsplit command.
$tiffsplit input_file.tif
And we are done. Now we can happily run scantailor on these tiff files.

Temet Nosce

Know Thyself Too…

Kindle, Lego and E-Books

Remaking ebooks from existing pdfs, djvu

Update: 13 Feb 2025