Rating Your Book Collection

Now that I've nicely got my book collection under tellico, I thought it would be interesting to see how well each book rated, and what were the ones that garnered the most stars and number of reviews on Amazon.

This Python script will read in my book collection data on standard input, extract out a list of ISBN's, and for each one will look up the average Amazon customer star rating and the number of reviews. It then sorts the data (by star rating and then by customer reviews) and prints out the results to standard output.

The script processed 2604 books. I have more books then that, but it had problems reading the ISBN numbers for some of them (see notes below). Of those 2604 books, the star ratings break down as follows:

    Number of stars  |  Number of books
  -------------------+------------------
       5             |        370
     4.5             |        644
       4             |        629
     3.5             |        241
       3             |        104
     2.5             |         14
       2             |         12
     1.5             |          2
       1             |          6
       0  (unrated)  |        582

Nice to know I haven't got too many low star rating books in my collection.

Here are the top ten entries with a 5 star rating:

      Title                       |   Number of reviews
----------------------------------+----------------------------
'Lonesome Dove'                   |    374 
'The Complete Calvin and Hobbes'  |    305 
'Truman'                          |    271
'Boy's Life'                      |    254
'The Code Book'                   |    248 
'The Simpsons'                    |    210 
'Cosmos'                          |    150 
'Black Holes and Time Warps'      |     82 
'A Pattern Language'              |     76
'My Family and Other Animals'     |     72 

Here are the top ten entries with a 4.5 star rating:

      Title                       |   Number of reviews
----------------------------------+----------------------------
'Harry Potter (Book 7)'           |   3102
'Ender's Game'                    |   2475
'Memoirs of a Geisha'             |   2462
'To Kill a Mockingbird'           |   1736
'The Golden Compass (Book 1)'     |   1435
'Animal Farm'                     |   1137
'A Prayer for Owen Meany'         |   1055 
'Dune'                            |   1024
'The Lord of the Rings'           |   1000
'Pride and Prejudice'             |    870

And here are the top ten entries with a 4.0 star rating:

      Title                                         | Number of reviews
----------------------------------------------------+------------------
'The Catcher in the Rye'                            |     2742
'The Time Traveler's Wife'                          |     1610
'Freakonomics'                                      |     1520
'The Curious Incident of the Dog in the Night-Time' |     1400
'The Poisonwood Bible'                              |     1395
'The Road'                                          |     1392
'Fahrenheit 451'                                    |     1242
'The World Is Flat'                                 |     1118
'The Great Gatsby'                                  |     1111
'Guns, Germs, and Steel'                            |     1045

Note that I've already read most of these. My tellico data doesn't (yet) differentiate between read and unread books.

For those interested in taking this script, and munging it to do something with similar book data, here are a few more details:

It uses PyAWS, a Python wrapper for the latest Amazon Web Service by Kun Xi, to get the average star rating and the number of reviews for each book. Thanks to Xun Xi for not only writing this, but also for helping me out with a problem on my script over the weekend.

Note that you will need to adjust:

amazonAccessKey = "XXXXXXXXXXXXXXXXXXXX"

to your own Amazon Access License key.

tellico allows me to export my book collection data in XML format. I wanted to extract out all the ISBN's. When I initially tried this with BeautifulSoup (BeautifulStoneSoup to be exact), it didn't like processing the XML file. Here's what I tried:

#!/usr/bin/env python

from BeautifulSoup import BeautifulStoneSoup
import sys

if __name__ == "__main__":
    xml = sys.stdin.readlines()
    soup = BeautifulStoneSoup(xml)

and here's the traceback I got:

$ python rate_books.py 
    soup = BeautifulStoneSoup(xml)
  File "/tmp/BeautifulSoup.py", line 1058, in __init__
    self._feed()
  File "/tmp/BeautifulSoup.py", line 1082, in _feed
    smartQuotesTo=self.smartQuotesTo)
  File "/tmp/BeautifulSoup.py", line 1705, in __init__
    u = self._convertFrom(proposed_encoding)
  File "/tmp/BeautifulSoup.py", line 1735, in _convertFrom
    markup)
TypeError: expected string or buffer

I'm not sure if it's a bug in tellico or in BeautifulStoneSoup.

In the end I decided to just roll my own getISBNs() routine, that looked for any lines in the tellico XML data that start with "<isbn>" (after stripping off leading and trailing white space), and then extracting out the ISBN number in between and adding it to a list.

Even then, there were lots of malformed ISBN numbers in the tellico data. I suspect they are all for books that pre-date when ISBN numbers were introduced, but it still seems wrong that this bad data is there.

I suspect I'm never going to read all the books I've got. There's always new good ones coming out and I'm discovering other already published ones that are good (especially when I find a new great author). I'm not quite at the point yet where I'm no longer buying green bananas, but with these rating results, I'll now know which books I should consider reading next. For a while, I'm going to focus on the ones that many others found enjoyable (i.e. the ones that are near the top of the list when you take the number of reviews and multiply it by the number of stars, or some other similar formula).

[]

[]

[]

Comments:

Post a Comment:
Comments are closed for this entry.
About

user12607856

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today