Kindle, Lego and E-Books

What do you do when Digital Restrictions Management prevents you from doing a lot of things on your own device. I do not know if we can even say it is a device we own, as the company offering books to us can revoke the books at will, without asking you. This was infamously and ironically seen in the removal of Nineteen Eighty Four from Kindle devices without their owners permission.

This is what RMS has to say about Kindle and its practices by Amazon:

“This malicious device designed to attack the traditional freedoms of readers: There’s the freedom to acquire a book anonymously, paying cash — impossible with the Kindle for all well-known recent books. There’s the freedom to give, lend, or sell a book to anyone you wish — blocked by DRM and unjust licenses. Then there’s the freedom to keep a book — denied by a back door for remote deletion of books.”Richard Stallman

So what do you do against such mal-practices and devices operations which are defective by design?

Since these companies do all in their power to prevent users from taking any stuff out, using all hi-fi programming, what can one do about them?

Here is one low tech solution! And one fine use of Lego Mindstorms!

via DIY kindle scanner

Also if you are rather old-fashioned, and even lower tech solution would be to simply one can just make a carbon-copy of the Kindle e-book from a copier or scanner, thanks to their E ink technology, it is as good as a printed book.

Undownloading

So, it seems that ebook users need to add a new word to their vocabulary: “undownloading” — what happens when you leave the authorized zone in which you may read the ebooks you paid for, and cross into the digital badlands where they are taken away like illicit items at customs. If you are lucky, you will get them back when you return to your home patch — by un-undownloading them.

via Techdirt

Added.

Consider this was a physical book, you would be fined for smuggling books that you have legitimately brought or your books taken under protective custody by someone, after all they contain the most dangerous things known to humans – ideas!

 

Open Access Manifesto

Information is power. But like all power, there are those who want to keep it
for themselves. The world's entire scientific and cultural heritage, published
over centuries in books and journals, is increasingly being digitized and locked
up by a handful of private corporations. Want to read the papers featuring the
most famous results of the sciences? You'll need to send enormous amounts to
publishers like Reed Elsevier. 

There are those struggling to change this. The Open Access Movement has fought
valiantly to ensure that scientists do not sign their copyrights away but
instead ensure their work is published on the Internet, under terms that allow
anyone to access it. But even under the best scenarios, their work will only
apply to things published in the future.  Everything up until now will have been
lost. 

That is too high a price to pay. Forcing academics to pay money to read the work
of their colleagues? Scanning entire libraries but only allowing the folks at
Google to read them?  Providing scientific articles to those at elite
universities in the First World, but not to children in the Global South? It's
outrageous and unacceptable. 

"I agree," many say, "but what can we do? The companies hold the copyrights,
they make enormous amounts of money by charging for access, and it's perfectly
legal - there's nothing we can do to stop them." But there is something we can,
something that's already being done: we can fight back. 

Those with access to these resources - students, librarians, scientists - you
have been given a privilege. You get to feed at this banquet of knowledge while
the rest of the world is locked out. But you need not - indeed, morally, you
cannot - keep this privilege for yourselves. You have a duty to share it with
the world. And you have: trading passwords with colleagues, filling download
requests for friends. 

Meanwhile, those who have been locked out are not standing idly by. You have
been sneaking through holes and climbing over fences, liberating the information
locked up by the publishers and sharing them with your friends. 

But all of this action goes on in the dark, hidden underground. It's called
stealing or piracy, as if sharing a wealth of knowledge were the moral
equivalent of plundering a ship and murdering its crew. But sharing isn't
immoral - it's a moral imperative. Only those blinded by greed would refuse to
let a friend make a copy. 

Large corporations, of course, are blinded by greed. The laws under which they
operate require it - their shareholders would revolt at anything less. And the
politicians they have bought off back them, passing laws giving them the
exclusive power to decide who can make copies. 

There is no justice in following unjust laws. It's time to come into the light
and, in the grand tradition of civil disobedience, declare our opposition to
this private theft of public culture. 

We need to take information, wherever it is stored, make our copies and share
them with the world. We need to take stuff that's out of copyright and add it to
the archive. We need to buy secret databases and put them on the Web. We need to
download scientific journals and upload them to file sharing networks. We need
to fight for Guerilla Open Access. 

With enough of us, around the world, we'll not just send a strong message
opposing the privatization of knowledge - we'll make it a thing of the past.
Will you join us? 

Aaron Swartz

July 2008, Eremo, Italy

via | Open Access Manifesto

Reading in e-book era

Reading without surveillance, publishing without after-the-fact censorship, owning books without having to account for your ongoing use of them: these are rights that are older than copyright. They predate publishing. They are fundamentals that every bookseller, every publisher, every distributor, every reader, should desire. They are foundational to a free press and to a free society. If you sell an ebook reader is designed to allow Kafkaesque repossessions, you are a fool if you expect anything but Kafkaesque repossessions in their future. We’ve been fighting over book-bans since the time of Martin Luther and before. There is no excuse for being surprised when your attractive nuisance attracts nuisances.

via Boing Boing.

I agree completely.Though cases like these are going to become more common, unless we switch to a technology which we can see that is Free as in Freedom. Governments and corporates are going to use this technology against the people who are using it. It will create profiles of “dangerous” people who are reading revolutionary material, for example. It will go unchecked if we just are using the technology without questioning it.

Also see RMS’s view on this topic.

On-line Education | RMS

Educators, and all those who wish to contribute to on-line educational works: please do not to let your work be made non-free. Offer your assistance and text to educational works that carry free/libre licenses, preferably copyleft licenses so that all versions of the work must respect teachers’ and students’ freedom. Then invite educational activities to use and redistribute these works on that freedom-respecting basis, if they will. Together we can make education a domain of freedom.

via On-line Education|RMS

Mostly people don’t bother about what they get for gratis on the Internet, but institutions cannot adopt the same approach. Licensing is as much important as much as the actual content. But an archaic system will not go down till it is compelled to, and it will fight till the very end.

Remaking ebooks from existing pdfs, djvu

Suppose you have an ebook or an article in pdf format, which unfortunately is not cleaned. By not cleaned we mean

  • Single page scan with edge darkening, pages not aligned that is text is rotated differently , page size different, library and use marks marks etc.
  • 2-in-1 scan: Two pages simultaneously scanned together, the central spine dark band, pages not rotated properly, edge and wear marks,  library marks etc.

In this case we cannot use the tools like scantailor for cleaning the images directly. For this we first need to extract images from the PDF file and then do a processing on these images. One can do extract the images one by one and process them, but then we can do it in a better way also.

First we split the pdf file into single PDFs by using the most versatile pdftk

For this in the terminal type

$ pdftk file.pdf burst

It will create as many pdf files as there are pages. with names like pg_0000.pdf etc.

Now next task is to convert these pdf to images, for this we use the convert command, but we don’t want to convert files one by one by

convert pg_0000.pdf pg_0000.tiff

But this is not very useful for large number of files, we want to make this in one go. So we do the following

$ for i in $(ls | grep pdf;);
do
convert -density 600 $i $i.tiff;
done
Lets see what these commands do:

ls

will list all the files in that directory

ls | grep pdf

This will filter out the files with pdf in the filename and provide us with a list

On this list we can do a lot of operations as we do in on any other list

for i in $(ls | grep pdf)

is calling each member of this list that we generated and treating it as variable i

and for each memberwe

do

the following

convert -density 600 $i $i.tiff

and after this is over the task is

done

We can set the dpi for the output images by passing the number, above it is set as 600. The output images will be named same as the input pdf files.

Now we can happily run scantailor on these images to clean them up!

PS:

Instead of a PDF if you have a djvu file we have another approach.

Step 1

Convert the djvu file into a multipage tif file, by using ddjvu command.

$ddjvu -format=tiff -verbose -quality=uncompressed input_file.djvu output_file.tif

With this command we will get a tiff format, with same resolution as the original djvu file.

Once the multipage tif file is there, it can be split into its original pages by tiffsplit command.

$tiffsplit input_file.tif

And we are done. Now we can happily run scantailor on these tiff files.

 

When Kings Rode To Delhi…

Recently read a book titled When Kings Rode to Delhi by  Gabrielle Festing, which is available here. In the book there is a chapter on Sivaji called The Mountain Rat, title supposedly given by Aurangazeb to Sivaji. After the killing of Afzal Khan, this is what the author has to say:

 In the eyes of a Maratha, who believed himself Bhavani’s chosen warrior, such treachery was meritorious, and the slaughter of the envoy was an act of devotion.

Further the author describes various exploits and acts of Shivaji and in the end he says:

An attempt has been made to cast a glamour about him and his hordes, as patriots, deliverers of their country from foreign rule, devoted heroes who faced desperate odds. After a dispassionate survey no glamour remains. Sivaji was a typical Maratha of the best kind that is to say, he was as unlike the Rajputs from whom he claimed descent as the South African Boer from the good Lord James of Douglas. Never, unless they were driven to it, did the Marathas fight a pitched battle in open field ; the joy of fighting, which made the Rajput deck himself with the bridal coronet, the desperate valour which heaped the plain of Samugarh with yellow robes till it looked like a meadow of saffron, was incomprehensible to the wolves of the Deccan. They fought, not for a point of honour, or because they enjoyed fighting, but in a commercial. spirit, for the sake of what they could get; their word for “to conquer in battle” means simply “to spoil an enemy.” The Rajput was indolent, when not roused by pride or the thirst for battle; the Maratha was untiringly energetic as long as he had anything to gain, but would sacrifice nothing for pride or scruple.

This must be said for Sivaji, that while he lived his followers were forbidden to plunder mosques or women ; after his death his son pursued a different policy.

Free Software Tools for scanning and making e-books

How to give a new life to books which are out of copyright!
Here is a short summary of the Free Software tools that I have found useful for converting hard copies into readable/searchable formats  in GNU/Linux!

Typically the making a soft-copy from a hard-copy involves following steps:

Step1:
Scan the Hard copy using a scanner / camera. This step generates image files
typically .tiff, .png or .jpeg. Some scanning programs also have option of directly generating to .pdf
Basically at this stage you have all the data, if you compress the folder into a comic book reader format .cbr or .cbz format you are good to go. But for a more professional touch read on. The main step to scan the books properly. Some do’s and dont’s

Align the pages to the sides of the scanner.
If the book is small size scan 2 pages at once.
If the book is too large adjust the scan in the image preview side so that only one page is scanned.

If these steps are done properly there is a little that we have to do in the second step. And we can directly jump to Step 3.

Preferably scan in the binary grayscale form, unless there are colored images in the text. This will help reduce the final size of the file.

Scan at minimum 300 dpi, this is the optimum level that I have come to after trials and errors with different resolutions, their final results and the time taken for each scan. Of course this can differ depending on what is that you are scanning. Many people do the scanning at 600 dpi, but I am happy at 300 dpi. Note: The 300 dpi images can be upscaled in scan-tailor to 600 dpi.

First of all for the scanning itself. Most of the scanners come with an installation disk for M$-Windows or Mac-OSX. But for GNU/Linux there seems to be no ‘installation disk’. The Xsane package allows quite a few scanners which are detected and are ready for use as soon as you plug them in.
The list of the scanners which are supported by Xsane can be found here:

http://www.sane-project.org/sane-mfgs.html

When we bought our scanner we had to search this list to get the compatible scanner.
What is the problem with the manufacturers, why do they not want to sell more, to people who are using Free Software?

If your scanner is not in the list, then you might have to do some R&D before your scanner is up and running like I had to do for my old HP 2400 Scanjet at my home.

Once your scanner is up and running.  You scan the images preferably in .tiff format as they can be processed and compressed without much loss of quality. This again I have found by trial and error.

Step2:
Crop the files and rotate them to remove unwanted white spaces or
accidental entries of adjoining pages from the images that were obtained. When the pages are scanned as 2 pages in one image, we may need to separate the pages.

Initially I did it manually, it was the second most boring part after the scanning. But I have found a very wonderful tool for this work.

Imagemagick provides a set of tools which work like magick in images, hence the name I guess 🙂

This is one of the best tools for batch processing image files.

Then I found out the dream tool that I was looking for.
The is called Scan-Tailor, as the name suggests it is meant for processing of scanned images.

Scan Tailor can be found at http://scantailor.sourceforge.net/ or directly from Ubuntu Software Centre.

Step by step scan tailor cleans and creates amazingly good output files from relatively unclean images.

There are a total of 6 steps in scan-tailor which produces the desired output.
You have to choose the folder in which your scanned images are. Scan-tailor produces a directory called out in the same folder by default. The steps are as follows

  1. Change the Orientation: This enables one to change the orientation of all the files in the directory. This is good option in case you have scanned the book in a different orientation.
  2. Split Pages: This step will tell whether the scans that we have made are single page scans, single page with some marginal text from other page or two page scans. Most of the times the auto detection works well with single page and two page scans. But it is a good idea to check manually whether all the pages have been divided correctly, so that it does not create problems later. If you find that a page has been divided incorrectly then we can slide the margin to correct it. In case of two page scans the two pages are shown with a semitransparent blue or red layer on top of them. After looking at all the pages we commit the result.
  3. Deskew: After the pages have been split we need to change the orientation for better alignment of the text. Here in my experience most of the auto-orientation works fine. But still it is a good idea to check manually the pages, in case something is missed.
  4. Select Content: This is the one step that I have found as the most useful one in the scan-tailor. Here you can select the portion of the text that will appear in the final output. So that you can say goodbye to all the dark lines that come inevitably as part of scanning. Also some library marks can be removed easily by this step. The auto option works well when the text is in nice box shape, but it may leave wide areas open also. The box shape can be changed the way we want. If you want a blank page, remove the content box, by right clicking on the box.
  5. Page Layout: Here one can set the dimensions for the output page and how each page content will be on the page.
  6. Output: Produces the final output with all the above changes.

The output is stored in a directory called Out in the same folder. The original images are not changed, so that in case you want some changes or something goes wrong we can always go  back to the original files. Also numbering of the images is done.
So we have cleaned pages of same size from the scanned pages.

Update: The latest scantailor has image -de-warping facility. See the amazing thing at work here:

Step 3:

Collate the processed files in Step 2 to one single PDF. For this I have used the convert command.

Typical synatax is like this

convert *.tiff output.pdf
This command will take all the .tiff files in the given directory and collate these files into a pdf named output.pdf

http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/

Alternative to Step 3

Another alternative is to use gscan2pdf for joining the image files into pdf and doing the OCR which can be tesseract or cunieform. gscan2pdf is also able to scan files and stich them into pdf , but I would recommend that you use scantailor as one of the intermediate steps.

Also using gscan2pdf gives you an option for editing the files, if, for example, you might want to remove some marks from the images. For this it opens the image in GIMP.


Step 4: 
OCR the PDF file.
Now this is again tricky, I could not find a good application which would OCR the pdf file and embed the resulting text on the pdf file. But I have found a hack on the following link which seems to work fine 🙂

http://blog.konradvoelkel.de/2010/01/linux-ocr-and-pdf-problem-solved/

The hack is a bash script which does the required work.

Alternate

gscan2pdf can do OCR for you using cunieform or tesseract as backends. The end result is a searchable text, but it does not sit on the image, as it would happen in a vector pdf, but is embedded on the page as “note” at the top-left-hand corner.

Step 5:

Optimize the PDF file generated in Step 4.

Here there is a nautilus shell script which I have found in the link below which does optimization.
http://www.webupd8.org/2010/11/download-compress-pdf-12-nautilus.html

Step 6: 

In case you want to convert the .pdf to .djvu there is one step solution for that also

pdf2djvu -o output.djvu input.pdf

 

The tips and tricks here are by no means complete or the best. But this is what I have found to be useful. Some of the professional and non-free softwares can do all these, but the point of writing this article was to make a list of Free and Open Source Softwares for this purpose.

Comments and suggestions are welcome!