How To Automatically Add Thousands of Del.icio.us Bookmarks In One Day

In a word. Slowly.

As I previously mentioned, I now know how to programmatically add a new bookmark to my del.icio.us account.

The next step was to write a script that would parse all my previously saved blog posts, extract out all the links and create the appropriate bookmarks.

I ended up writing two scripts. The first one, find_del_links.py parsed the blog posts, and prepared an input file (on stdout) for the second script, create_del_links.py which actually created the bookmarks.

I did it this way as I still had to hand edit the generated input file. My script did not like it if a blog post had some embedded Javascript in it. I was already doing a fair bit of special casing in the first script. I'd reached the point of diminishing returns for new code added, to a script I was really only intending to run once, and it was easier to just go in and edit the created data file.

The created file contains sets of five lines, one for each link that was to be turned into a bookmark. These were:

  • a url link (i.e. the bit between the double quotes in the '<a href="">...</a>' tag).

  • a description. Again this is the "..." in the url tag above. If that extracted description is "[link]" then the script parsed backwards from the link and got all the words before it, upto the first preceding ">" character.

  • a set of notes for this bookmark. I extract three things from each blog post:

    • the title.
    • the url .
    • the date and time it was published.

  • a set of tags to apply to this bookmark.

  • a blank line to make it easier to hand edit the file.

Here's an example:

Url:  http://www.dali-gallery.com/
Description:  Salvador Dali
Notes:  Title of blog entry: `The Great Book of Optical Illusions`  Url:  http://blogs.sun.com/richb/entry/the_great_book_of_optical  Date: [June 26, 2005 06:38]
Tags:  books Puzzles

The script relies on the way I consistently write the HTML for my blog posts.

I have one or more lines near the end of each blog post for each Technorati tag. These were extracted and applied to all the links found in the current blog post file being parsed.

There were a few types of links I wasn't interested in turning into bookmarks:

  • If the link contained "http://www.technorati.com/tag"
  • If the link contained "http://www.amazon.com/exec/"
  • If the description for the link started with "<img"

The other thing I did was add in many debug lines which get written to stderr. This now makes it fairly easy to track down any remaining problems if I decide to revisit this.

The second script took this data file on stdin, and read sets of five lines. It turned those into pydelicious calls to create the bookmarks.

This was a bit of a hit-n-miss affair. With the minimal documentation it was unclear exactly how fast I could generate these bookmarks. The initial version just let it chug away as fast as it could. It wrote about 24 bookmarks before throwing a hissy fit. I then added in a sleep between each bookmark creation. I made it a two second sleep to try to make sure that del.icio.us didn't throttle it again. This let me get a lot further, but after a while the call to:

status = a.posts_add(url, description, notes, tags)

threw another exception. At that point I couldn't even log into my del.icio.us web account. I got a nice Yahoo! web page returned that stated:

Sorry, Unable to process request at this time -- error 999.

Sorry, you've been temporarily blocked for accessing del.icio.us too rapidly. This could be the result of using a buggy, misconfigured, or malicious program. It could also be accidental on our part. Please hold off for a few minutes and try again later, in a gentler fashion.

If you think we've wrongly blocked you and it continues to happen, let us know. This link will bring you to a Yahoo website. When filling out this report, please ignore the Yahoo ID field.

- the del.icio.us staff

So I sent them a report asking how slooowly shooould I taaalk tooo youuu so that everything would be okay. Haven't heard back from them yet.

I ended up just adjusting the script to terminate if it caught an exception. The time between bookmark creations is now five seconds. It sure would be nice if there was better documentation to try to understand what's the best way to handle it.

Anyway, they wouldn't let me back in for a couple of hours. Eventually I was allowed to login again, so I set another run of create_del_links.py going about 6:00pm last night. It finished about 13 hours later. No more exceptions thrown. The rest of my links are now bookmarks.

The other thing I wasn't expecting was if you try to create two or more bookmarks with the same URL (even though they might have different descriptions), the second and subsequent ones will fail. I had quite a few of those.

I can now see that I have 5972 bookmarks all nicely tagged (with 5970 of them created during the last day). I'm going to have to go back in and fixup the descriptions that are a bit too succinct, but that should be fairly trivial now, albeit time consuming.

I wonder if del.icio.us has a special prize for the most bookmarks created by one person in a 24 hour period. I must be in the running for it. Maybe a little plastic trophy I could stick on the mantle-piece. If I had a mantle-piece. Or at least a mention in their monthly newsletter. If they have such a thing.

[]

[]

[]

[]

Comments:

Post a Comment:
Comments are closed for this entry.
About

user12607856

Search

Archives
« April 2014
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
   
       
Today