Previous 20

Jan. 22nd, 2009

Why it is the way it is - Hypertext Design Issues

This post is an analysis of an early document on Hypertext Design Issues.

The key ideas being discussed in this document are on Hypertext - whether links should be monodirectional or bidirectional, should links be typed etc.

These discussions were conducted in the early days of the web. It is interesting to know how things have evolved since the time this design was made.

Let's first get some facts right:
Hypertext links today:

  • Are Two-ended

  • Are Monodirectional

  • Have one link

  • Are Untyped

  • Contain no ancillary information

  • Don't have preview information



What are the implications of this design?
  • Hyperlinks are not multiended. A single link cannot link to multiple destinations. There are however cases when one to many, many to one and many to many 'links' might make sense. These types of connections among information nodes is what RDF/OWL help achieve.

  • an advantage is that often, when a link is made between two nodes, it is made in one direction in the mind of its author, but another reader may be more interested in the reverse link.
    Bloggers want to track those pages that have linked to their posts. Google indexes allow us to track links to a particular page. Linkback mechanisms have evolved in the Blogger world to serve precisely this purpose. In general however, we never know who has linked to our page

  • It may be useful to have bidirectional links from the point of view of managing data. For example: if a document is destroyed or moved, one is aware of what dangling links will be created, and can possibly fix them.
    This problem has not yet been solved. Since links are monodirectional, dangling links cannot be detected. Dangling links - when the information linked to changes, there is no way to clean up the links

  • About anchors having one or more links: This is still debatable. There are some utilities that allow you to make every word a hyperlink and allow executing a host of 'commands' on the word. Ex: Perform a Google search for the word, lookup the word in dictionary.com, map the word (if it is a city) or lookup in Wikipedia. However I am not a big fan of these utilities since I feel it clutters the screen and the context detection is not yet great.

  • Typed links: I feel this is the single most important thing missing from Hyperlinks in WWW. While making types mandatory would have complicated the issue, a standard way to provide 'types' to links should have been provided. Anyway, it's the way it is. So how are people solving this issue? Microformats, RDFa are 2 things I know of. The data is mostly silently read by the browser and tools and users are usually unaware of this data in the pages. In other words, the User Interface for typed links is still not great.

  • Meta information associated with links. Interesting! I am aware of Wikipedia articles containing the date when the page was last visited but this is pretty much manually updated as far as I know.

  • Preview information: Snap solves this very issue.


  • The conclusion?
    Well, it's tough to say how optimal the design of hypertext on the WWW was. Introducing multi-directional links and typed links would definitely help the technical people out there, but would introduce complexity which would perhaps have made it so tough for the web to flourish that it wouldn't be what it is today.

    Jan. 15th, 2009

    Google search making use of Social Graph information???

    Today, while I was searching for my name in Google, I saw that towards the end, it showed my Delicious bookmarks in the search results. There is definitely no occurrence of my name in the pages that were listed in the search results and the only connection I could see between 'me' and the search result is that I have bookmarked these links in Delicious.

    Is Google making use of the Social Graph information that I provided while I was trying out Google Friendconnect? Doesn't this raise privacy concerns?

    Jan. 6th, 2009

    Why it is the way it is - an analysis of the proposal by TimBL of the WWW

    Ever wonder why hyperlinks in the World Wide Web (WWW) are unidirectional? Why are links not typed? Why are links many to one and not many to many? Why do browsers have the restrictions that they have today? Why is the web the way it is?

    A lot of the answers to these questions are hidden somewhere deep in the web itself. Having come across several technical issues with the web, I began to wonder what the initial creators of the web perceived the web to be? What was running in the minds of the users when they came across the idea of the web?

    I started tracing back into history to the very beginning of the WWW. That's how I came across the 'original proposal of the WWW'.

    So here are some of my notes on the paper:
    (Content in italic are from the paper.)

    Use cases for the WWW


    The initial use-cases for the WWW were related to project management - communicating project ideas, storing technical details for retrieval later, finding out who wrote a piece of code, fetching all related documents for the current task. Most of the proposal revolves around the system to allow for multiuser hypertext access which is non-centralized and non-hierarchical.

    Relationship to relational databases


    Linked information systems have entities and relationships. There are, however, many differences between such a system and an "Entity Relationship" database system. For one thing, the information stored in a linked system is largely comment for human readers. For another, nodes do not have strict types which define exactly what relationships they may have. Nodes of similar type do not all have to be stored in the same place.

    What does this mean?
    We do have entities and relationships, but there are no fixed rules. Entities don't need to have types and any two entities can be related to each other. There is also no restriction on where the entities are stored.

    Hypertext


    The key ideas around Hypertext were put down by Vanevar Bush in 1945 in the form of Memex. There were several attempts by people to implement Hypertext and also Hypermedia (linking images, video etc). Ted Nilson coined the word Hypertext in 1965 and subsequently also coined the term Hypermedia. The first implementation of Hypertext in some form seems to be from Doug Engelbart in 1968. The buzz around Hypertext picked up during the late 1980's - there was a dedicated Usenet newsgroup, a bunch of conferences starting with Hypertext'87, several ACM papers, workshops etc. All this happened even before the WWW was born. There were several commercial products too, like Hypercard from Apple.

    TimBL had also tried his hands at building a hypertext system, which he called Enquire. TimBL claims to have built it as early as 1980, although the first mention of Enquire seems to be in this proposal made in 1989.

    When I started researching on Hypercard features, I realized one thing. These products are easily 20 years old. Technology has changed a lot in this time. It is really hard to imagine how many of these products looked like. Either the source is not available in its entirety or it is tough to compile. This reminds me of what Grady Booch said - about having an archive of source code similar to the archive of books, videos, music and web pages.

    Anyway, the most important difference I see between Enquire and Hypercard is that Enquire was more of a 'programmers playtool', while Hypercard was targeted towards end-users.

    So while Hypercard had 'fancy graphics', Enquire had typed links and was available for multi user access.

    WWW requirements


    About the requirements that TimBL put down for the WWW:
    * Remote access across networks, Heterogeneity, Non-Centralisation - These are what are now taken for granted. The WWW is ubiquitous, it never breaks as a system, it can be accessed from just about any device that is Internet aware.
    * Access to existing data - This was one of the reasons why the WWW became popular. It was easy to get existing data onto the web with minimal effort.
    * Private links -
    One must be able to add one's own private links to and from public information. One must also be able to annotate links, as well as nodes, privately.
    Frankly, I am not sure what TimBL means by private links 'from' public information.
    * Bells and Whistles - Graphical access to the web was considered optional.
    * Data analysis - This is one thing that has not taken off.
    It is possible to search, for example, for anomalies such as undocumented software or divisions which contain no people. It is possible to generate lists of people or devices for other purposes, such as mailing lists of people to be informed of changes.
    It is also possible to look at the topology of an organisation or a project, and draw conclusions about how it should be managed, and how it could evolve. This is particularly useful when the database becomes very large, and groups of projects, for example, so interwoven as to make it difficult to see the wood for the trees.

    The Semantic Web is showing this promise.
    * Live links - These are what are now called 'Dynamic pages' and most popular pages on the web are 'live' in that sense.

    The implementation


    Much of the academic research is into the human interface side of browsing through a complex information space. Problems addressed are those of making navigation easy, and avoiding a feeling of being "lost in hyperspace". Whilst the results of the research are interesting, many users at CERN will be accessing the system using primitive terminals, and so advanced window styles are not so important for us now.

    As I read this, it gives me a feeling that TimBL was not thinking of making the WWW a 'public' web that would be used by just about everyone. Even a non-techie could build a page of content and hook it onto the web. Usability seemed to be of least importance.

    The only way in which sufficient flexibility can be incorporated is to separate the information storage software from the information display software, with a well defined interface between them.

    This division also is important in order to allow the heterogeneity which is required at CERN (and would be a boon for the world in general).

    A client/server split at this level also makes multi-access more easy, in that a single server process can service many clients, avoiding the problems of simultaneous access to one database by many different users.

    'information display software' - Now that's what the browser is! Also this is what created the need for HTTP, HTTP server and HTML.

    Conclusion


    Do we still visualize the web as just content linked via Hypertext? How can we accommodate social networking and the whole realm of developments around Web 2.0 and social network applications?

    The web has surely come a long way!

    (Note: Draft content - subject to change)

    Dec. 13th, 2008

    YQL - Yahoo's query language for the web

    This post is a part of the AfterThoughts series of posts.

    Post: A query language for searching websites
    Originally posted on: 2005-01-27

    I blogged about the idea of a query language for websites back in 2005. Today, when I was doing my feed sweep, I came across YQL, a query language with SQL-like syntax from Yahoo that allows you to query for structured data from various Yahoo services.

    There is one thing that I found interesting. The ability for you to query 'any' HTML page for data at a specific XPath. There are some details in the official Yahoo Developer blog.

    The intent of YQL is not the same as what I had blogged about. While YQL allows you to get data from a specific page, what I had intended was something more generic - an ability for you to query a set of pages or the whole of the web for specific data, which is a tougher problem to solve.

    In order to fetch specific data from a HTML page using YQL, all you have to do is:
    1. Go to the page that you want to extract data from.
    2. Open up Firebug and point to the data that you want to extract (using Inspect).
    3. Right click the node in Firebug and click on 'Copy XPath'.
    4. Now create a query in YQL like this:
    select * from html where url="" and xpath=""

    Although the idea seems promising I wasn't able to get it to work for most XPaths.

    I guess the reason is the difference between the way the browser interprets the HTML and the way a server would interpret it. For example, if there is no 'tbody' tag in your table, the Firefox browser inserts a 'tbody' tag and that would be present in your XPath, while a server that interprets the HTML after Tidying it wouldn't see one. One way we can solve this is to have the same engine interpret the XPath on the server side as well or be as lenient as possible when matching the XPaths. I had similar discussions with the research team in IRL when I was working on my idea of MySearch, which had similar issues, and there were some interesting solutions that we discussed.

    I would say it is only a matter of time when someone will crack the issue of fetching structured data from semi-structured data present in the web and make it available to other services. Tools like Dapper, Yahoo Pipes, YubNub and YQL are just the beginning.

    I have made several attempts at this right from using one of these tools, to building my own using Rhino, Jaxer etc, but until now the most content solution is a combination of curl, grep, awk and sed.

    Nov. 9th, 2008

    weRead - what's new?

    Ever since I blogged about iRead back in April, a lot has changed. We have introduced tons of new features, and there is really not one place where we have captured all of them.

    So this is my attempt to describe the features to our readers.

    • iRead is now called weRead and we have partnered with Lulu
      This post from our official blog has more details.

    • We now have a destination site
      You don't have to login to Facebook or some social network to access weRead. You can directly access your bookshelf from our destination site. If you have already used weRead in Facebook or one of the social networks, you can link your account and access the same account from the destination site.

    • Connections - find people like you
      This Facebook feature allows you to find people who have similar book tastes like you. You can look for people of a specific gender, people in your network and people in specific age groups.

    • We now have friend activities in the homepage
      We now show activities from your friends on weRead in the homepage. This helps you keep track of which books your friends have been reading, and if they have participated in any discussions.

      Activity of friends on weRead


    • Book discussion boards
      This is the place to discuss with your friends and network about your favorite books, what you liked, what you didn't like, why someone should or shouldn't read a book.

    • Author discussion boards
      If you want to discuss about a specific author, talk about what works of an author are good, or what you would expect his next book to be like, this is the place to do it. Check out the latest discussions here.

      AC Discussion Board


    • Author profile claim
      Are you an author? Then you should be on weRead. weRead makes it ultra simple for you to setup a profile and interact with your readers. Writing a new book? Want to know who might like it? Want to get suggestions from your readers? Want to promote your book on various social networks? Start here

      weRead for authors


    • New catalogs
      We now have catalogs from Amazon, Google and OCLC integrated into weRead. This means you have a whole range of books to choose from. More catalogs are coming soon.

    • weRead is now available in multiple languages
      weRead is now available in 6 different languages - English(US), English(UK), German, French, Spanish (on Hi5 only) and Portuguese (on Orkut only). We have more languages being added soon. Want weRead in a local language? Help us translate weRead here.

    • We now have limited previews of books from Harper Collins and Google Books and full preview of some books from Gutenburg
      This will give you some sort of a 'bookstore experience' by allowing you to preview books.

    • See how a book fares in your network
      Curious to know how a book has been rated by people in your network? We now give you near realtime statistics about a book - how people have rated the book in your network, how many people own the book, how many have marked it favorite etc.
      Find who has read a book in your network


    • Readers now have a profile page which displays their bookshelf
      Each weRead user gets his/her own personal page that they can then share with their friends, bookmark, etc. In order to set up your own profile page, link your account from Facebook to our destination site and click on the "Profile" link in the top blue bar. Check out my profile page here.

    • Readers can showcase their bookshelf in their blogs and other sites
      Want to advertise your bookshelf in your blog? It's simple! Go to your profile page and then click on 'Take weRead with you', get the code and put it in your blog. You also have some customization that you can do before you get the code. Check out a demo here.

    • The Facebook Wall application allows you to post information about books, write reviews etc directly from the Facebook Wall.
      You can now chuck a book at your friends directly from the Facebook wall. Go to your Facebook profile page: http://www.facebook.com/profile.php. Under the Wall tab, you should see the Books iRead option. Clicking this opens a dialog that allows you to pick a book from your shelf or search for a book and chuck this at your friend.

      Facebook weRead Wall application


    • Similar authors
      Under every book detail page, we show similar authors that will help you discover authors who write books similar to the one that you are viewing.

    • Mis-spelt searches
      weRead now has builtin suggestions in case you make a misspell some work while typing your query.

    • See more like this
      We have launched some kind of a 'Stumble upon' feature. When you are viewing a book in weRead, you will see a button 'See more like this', clicking which, takes you to a random but related book.

    • External integration with OCLC
      We now power the OCLC related books and reviews.

    • We have also moved to bigger and more powerful servers, which means a better user experience for all our readers.



    As you see, we have been busy! We have tons of new and exciting features lined up and we promise to provide feature updates as frequently as possible. A lot of these features revolve around making weRead a truly social application.

    By the way, you can get some quick updates on weRead in our Twitter page.

    Happy reading!

    PS: Features and feature names are subject to change.

    Aug. 5th, 2008

    It's official - Lulu partners with weRead

    So finally the news has been made official.

    Lulu today announced partnership with weRead (iRead).

    Lulu is a platform that enables wanna be authors, musicians and other creators to bring their work directly to their audience. Publishing is free, and the lack of middlemen means that the freedom lies in the hands of the creator. Lulu was founded by Bob Young, co-founder of Red Hat and an extremely successful entrepreneur. Lulu is the world's fastest-growing provider of print-on-demand books.

    With this partnership, there are several exciting things that we are looking at.

    With weRead, Lulu users now get a simple way to make their creation available on all popular social networking sites and promote their work. As for weRead, users get a much larger catalog of books, some of them which are not available anywhere else.

    Well, this is definitely just the tip of the iceberg and we see several other exciting things ahead.

    News about the partnership from the Lulu site:
    "Lulu (www.lulu.com), the world's largest marketplace for individual, educational, and corporate authors and publishers to bring their books directly to market, announced today an alliance with weRead (www.weread.com), the leading social networking application for books where readers can easily discover and recommend books to their friends on social networks and therefore, the world."

    Over the next few weeks, you should see several new features on weRead. There is one theme that we are concentrating on - make weRead more social, which is why we thought it makes better sense to name it weRead rather than iRead.

    The future now looks promising!

    Jul. 22nd, 2008

    The Afterthoughts - If Google came up with an RSS Reader

    So here is another post in The Afterthoughts series.

    Post: If Google came up with an RSS Reader
    Originally posted on: 2005-01-30

    This post was made long before Google came up with Google Reader. I was experimenting with RSS readers and started wondering what it would be like if Google came up with an RSS reader.

    Now that we have one from Google, it is time to look back and see how my expectations matched with the actual product.

    > * It would first buy the domain "greader" or something similar.
    This didn't happen. However, Google Reader is popularly called GReader. I guess I made this comment because of Gmail.
    On a side note, Google does own greader.net.

    > * It would have an index of more than 8 million different feeds.
    This is not how an RSS reader has evolved. Google Reader does have recommendations based on the feeds you already have. It would be good to see an integration of Google Blogsearch or even Google News with Google Reader. The only integration I see is the subscription of search results from both of these in Google Reader (a 'new' feature).

    > * It would offer 1 GB space for storing posts.
    The storage in most online readers is unlimited.

    > * It would have an excellent search feature for searching posts.
    This was a surprise! The feature came in so late. Totally unexpected.

    > * The interface would be simple, but at the same time powerful.
    You bet this has been true. The keyboard shortcuts are just superb. The speed with which you can navigate and read feeds is extremely good. (You will need my script to make it even faster. :))

    > * We would be able to mail any post just at the click of a button.
    I guess this feature has been around since quite some time now.

    > * It would allow us to filter posts and also label them for future reference.
    With tagging and folders, this has been better than expected.

    > * It would also allow us to make blog entries (of course the service would be integrated with Blogger.)
    Again, this is a surprise. Google has not provided any integration with Blogger. However, recently Google added a feature to share an item with notes. With the microblogging revolution, and Google having acquired Jaiku, I guess that integration will happen first.

    > * It would integrate greader with other offerings like mail, groups etc.
    The integration is not that great as of now. It would be cool to see posts related to a mail, or a message in a group etc.

    > It would be Beta forever. :)
    Surprise! This isn't true!

    Final thoughts:
    So after more than 3 years since I made the original post, (which is a lot of time in technological evolution) I should say, Google did match most of the expectations that I had back then, some features were developed much better than what I had expected. However the integration with other services is one thing where it could have done better.

    Jul. 12th, 2008

    Adventure with Ubuntu, Wubi, yum, libc and the like

    Note: This is not for the casual reader. If you are facing any issues with any of the keywords mentioned above, you might want to continue...

    So here I was trying to install some packages from a YUM repository on my Ubuntu 8.0.4 system. Why YUM when you have apt-get? Well, let's just say, the situation demanded it.

    The installation seemed to be going fine. What I did not realize is that, the installation had innocently relinked my libc files to a new location (actually to an older version of libc). The yum installation failed. Without checking the error, I executed sudo yum install again.

    And I got this:

    sudo: /lib/tls/libc.so.6: version `GLIBC_2.4' not found (required by sudo)
    sudo: /lib/tls/libc.so.6: version `GLIBC_2.4' not found (required by /lib/libpam.so.0)
    


    Next I executed ls. Same error! And soon I realized, I was not able to execute most of the commands. The only things running were, things that were already open. I had closed my terminal by then, and was not able to bring it back neither was I able to login in an alternative terminal.

    It is extremely difficult to figure out what has gone wrong without a terminal. I tried out various things, but I soon realized that since I don't have sudo access anymore, I won't be able to fix anything in the /lib directory, so no point trying.

    The only solution was to reboot in recovery mode and then see if I could relink the libc files. So I popped in the Ubuntu live CD.

    Now here is an added twist to the tale. I run Ubuntu on Wubi. So how do I mount my NTFS 'file' which is actually a Linux partition?

    With some pointers from my colleague, I realized that it is possible to mount a file as if it were a filesystem. I executed this:

    mount /dev/windows/filesystem/containing/wubi/installation /media/disk
    mount /media/disk/path/to/wubi/disks/root.disk /media/root -o loop
    


    Guess what! The Wubi file got mounted and I was able to access the files.

    After some inspection I realized that the problem was that, while Ubuntu has all the libc files in /lib/tls/i686/cmov, the message indicated that these files should be in /lib/tls.

    I did a 'ls' in the /lib/tls directory and found that there are some files of an older version of glibc at this location but the live cd version didn't have any files there. So it was apparent this is what is causing the problem.

    I unlinked all the files, and relinked them to the new location and rebooted.

    This time although it was able to boot Linux, it did not bring up the UI. I booted once again in recovery mode and ran xfix and continued with the boot.

    Things seem to be fine now.

    Update: Not everything was fine. Some applications, like Totem threw a SEGFAULT. So I did this:

    sudo apt-get install --reinstall libc6
    


    Things seem to be fine now.

    Jun. 18th, 2008

    How to ensure that your extensions work on Firefox 3.0

    Here are the steps that I found useful to port my extensions from Firefox 2.0 to 3.0:

    • Step 1: Just start Firefox and allow it to update the extensions. You could go to: Tools -> Add-ons -> Extensions -> Find updates.
      This should update many of the extensions. Restart Firefox.

    • Step 2: For those extensions where the auto-update has not functioned properly, you might want to manually see if an update is available. This is because for some extensions, the auto-update may not recognize that a new version is available.
      • Uninstall the older version and restart Firefox.

      • Search for the addons here and add them.


    • Step 3: Install the MR Tech Toolkit extension.

    • Step 4: For those extensions that have still not been updated and you need desperately, just see if the option 'Make compatible' from MR Tech. This option is available when you right click an extension in the Extension tab. If the compatibility range is upto some older version of 3.0 (for example 3.0b5) then this might work.

    • Step 5: Look for updates at a higher frequency over the next few days. Developers will be forced to ensure that their extension works in new version of Firefox so you can expect an update soon.

    May. 3rd, 2008

    The Afterthoughts - Gmail forwarding and service interoperability - an interesting observation

    "The Afterthoughts" is a series where I revisit some of my older blog entries and see how things have changed since the time I made the blog post and now.

    The posts that I will choose initially will be from 2004 to 2006.

    So here is the first one in the series:

    Post: Gmail forwarding and service interoperability - an interesting observation
    Originally posted on: 2005-11-21

    The entry goes about explaining how when you connect various services together, you could end up with the same information multiple times.

    This is increasingly becoming a problem these days. Services like Twitter and Friendfeed are not solving the problem elegantly, so you see more and more duplicates and links to the original post.

    Here is a typical scenario today:
    I make a blog entry. In order to ensure that my readers see my post immediately, I have a service that automatically posts a message in Twitter. This is like instantly messaging my friends (actually Twitter followers) telling them, "Look, I made a blog entry".

    Now, I use a lot of Web 2.0 services. So, in order to ensure that all my friends have a single feed to follow my activities, I use some aggregator like FriendFeed or Tumblr.

    Some friend of yours (let's call him Bob) likes your blog entry and bookmarks it on del.icio.us. Another friend, Andrews bookmarks it in Magnolia.

    Let us now say, there is another person Dave, who is a friend of you, Bob and Andrews. He is following all 3 of us in Friendfeed.

    How many entries is Dave going to see of the original entry?
    6 in total! 3 from you - 1 from your blog post directly, 1 from Twitter, 2 from Tumblr (1 via the blog post and 1 via Twitter), 1 from Bob via del.icio.us and 1 from Andrews via Magnolia.

    The screenshot shows duplicate entries from mashable's blog feed and from Twitter:


    Friendfeed - problems with aggregations services


    Now this is real noise. And this is more true if Dave is not even interested in the blog post to begin with.

    So the solution?
    Friendfeed allows you to hide specific feeds from specific people. For example, Dave can hide all bookmarks from Bob or all Tumblr entries from me.


    Friendfeed's attempt at eliminating duplicates


    Now that is not a good solution because not all bookmarks from Bob are duplicates.

    Tools like Feedblendr and Blogbridge have solved this problem for simple RSS aggregation. However things are different when it comes to social network and aggregation.

    So right now there is no simple way of detecting duplicates and more and more people are complaining about this in the blogosphere explaining how Friendfeed is more noise than information and why the good old Google Reader is still relevant.

    Here is one such discussion. As the discussion suggests, it is not just about eliminating duplicates; it also requires you to merge discussions/comments in each of these posts keeping in mind that not everyone is a friend of everyone else.

    So what has changed over the last 2 years?
    If anything, the problem has become a tougher one. I am sure the startup that does duplicate elimination and gives you a filtered feed taking your social networks into consideration is going to be the next hyped startup in the Web 2.0 world.

    Apr. 23rd, 2008

    Privacy disasters with aggregation services

    Imagine you have a host of aggregation services like Friendfeed, Tumblr, Suprglu, Lifestreams connected to each other, such that each one is reading from your various feeds and republishing the content.

    Now imagine a disaster where one of these services, say Twitter, suddenly, because of some flaw, exposes your private messages.

    It's like a Tsunami that cannot be controlled! Your private data would flow into various input streams in a matter of seconds and there is no turning back.

    Things will only get worse with activity feeds and Beacon.

    The bottom line is: Be careful about where your data is going and what data you put online.

    Apr. 19th, 2008

    iRead - a social book discovery revolution

    It has been a while since I thought I should write a review of iRead.




    iRead is a social book discovery application. It has been quite successful on Facebook and has a very large userbase. Currently iRead has a total install base of about 1.4 million users, mostly from Facebook.


    So what do we mean by social book discovery?


    iRead is not just about maintaining a bookshelf online. It tries to bring the social aspect into picture.

    'social'?
    iRead depends a lot on your social network. You can share your bookshelf with your friends, learn what your friends are reading and what their reading tastes are. You can discuss about books in various book clubs. You could participate in Quizzes or even add your own. You can find out how compatible your reading tastes are with other people in the network.

    iRead does not require a separate registration. It is available right in your social network. (As of now the application is available in Facebook, Orkut, MySpace, Hi5 and Bebo.) So when we are talking about friends, we are talking about your friends from the network where you are using iRead. So if you use iRead in Facebook, you see your Facebook friends in iRead, while in Orkut you see your Orkut friends. Many a times, all it requires is to just add the application to your profile.

    'book discovery'?
    For one, iRead provides recommendations based on your reading tastes. Then there are various other mechanisms by which you can discover new books to read.

    Let's explore some.

    Several ways to browse


    * You could first start off by searching for books and adding them to your bookshelf. This helps us learn about your tastes and recommend books that you may like.

    * When searching, you could either enter the name of the book, or its author, or if you know the ISBN, you could enter that.

    * If you want to just browse through the application you could start off by looking at what other iReaders are doing. The home page shows the most recent activity in the network.

    News feeds on homepage


    * So let's say you find some interesting book. Just click on the book and you are taken to the book details. Here you get to know how many readers the book has, how many reviews people have written for the book and get some instant user reviews and an editorial review. You can also find out similar other books.

    Book details for Da Vinci Code


    * If you see that the book is interesting, just click on the 'See All' reviews link. This will display all the reviews for the book. Read the ones you like and you will soon learn what the book is about.

    Book review page for GEB


    * Since there are multiple ways to reach your data, your reviews are never buried. So even if you are writing a review for a book, that already has a thousand reviews, you can expect your review to be read by other iReaders.

    * If the book interests you, you might want to check out other books by the same author. Just click on the author's name. This will show all books by the author. You could also click on the small icon next to the author's name to search for the author in Author's corner. This will give you other details like the profile of the author, what others think about the author, how many fans the author has etc.

    Authors corner


    * Author's corner is a forum for readers to interact with their favorite authors. So if you are the author of a book and are looking for a forum to interact with your readers, this is where you should be. Author's corner allows authors to maintain their profile, and also learn about their readers' expectations.

    * While reading reviews, you might find that the review from a particular user is very interesting. You might now want to look at this reader's bookshelf. Many a times, I have found this to be a good mechanism to discover new books. You can get an assurance of how close your tastes are by looking at the number of common books amongst you. Ok, now you might want to look at other reviews by this reader.

    * You could also contact the reader by leaving a wall post/scrap.

    * You may also want to check out who among your friends is on iRead and what they are reading. Click on the Friends link in the header. If you want to know about your friends' reading tastes and they are not yet on iRead you could invite them to add the application.

    Friends reads on iRead


    * For selected books, you could even browse inside the book. A lot of out of copyright books are available for free online viewing. Some other selected books are available for limited preview.

    Other features worthy of mention


    Take your reads with you


    The top header on iRead

    So what if you are in all these networks and want to use iRead everywhere?
    iRead has a feature to import your bookshelf from Facebook to Orkut, MySpace and/or Hi5. Once imported, you will see the same bookshelf in all the networks. However the friends shown to you depends on the network you are currently in.

    Import books from other sources


    Import books from other sources

    If you have been maintaining books in some other place, you may want to try importing books using the import books option. The link to this is found below the search box.

    Add a book


    Can't find a book you want to add to your bookshelf? You can add it to our catalog. The link to add a book is found below the search box.

    So what's more?!

    Happy iReading!

    Disclaimer: I work for Ugenie and am part of the iRead application development team. The views expressed here are my own and not necessarily those of Ugenie.

    Apr. 7th, 2008

    Downloading data using Greasemonkey - Part 2

    So I finally found some time to continue my experiments with the data download from browser to the server.

    This time my target was Orkut. I decided that I write a simple script to extract my Orkut profile and then display a sub-set of these fields in my own site using my own formatting.

    I did not write a Greasemonkey script this time, but just used Firebug to write Javascript. Here is the browser side script:

    var arrayToExtract = new Array('listdark', 'listlight');
    
    for(var z=0;z<arrayToExtract.length;z++){
       var elements = $$('.'+arrayToExtract[z]);   // Just got lucky here. $$ is available!
       for(var i=0;i<elements.length;i++){
           var item = elements[i].getElementsByTagName('p');
           if(item[0] == undefined)
               continue;
           postData(item[0].innerHTML);
           postData(item[1].innerHTML);
       }
    }
    
    function postData(data){
       var scriptElement = document.createElement('script');
    
       scriptElement.setAttribute('src','http://buzypi.in/backup?data='+data+'&file=orkut&date='+Date());
    
       document.body.appendChild(scriptElement);
    
    }
    


    The script above posts the profile information one by one to the server and the server captures it and appends it in a file. The server side code is as follows:

    <?php
    global $_REQUEST;
    
    $file_name = $_REQUEST['file'];
    $data = $_REQUEST['data'];
    $more = $_REQUEST['more'];
    
    $DIRECTORY = 'data';
    
    $file_with_location = dirname(__FILE__).'/'.$DIRECTORY.'/'.$file_name;
    
    $file_handle = fopen($file_with_location,'a');
    
    fwrite($file_handle,$data);
    
    if($more == "true")
       ;
    else
       fwrite($file_handle,"\n");
    
    $success_value = fclose($file_handle);
    
    echo "/*";
    if($success_value === TRUE){
       echo "Successfully appended: ".$data."<br/>";
       if($more == "true"){
          echo "Expect you to send more data";
       }
    } else {
       echo "Failed to write data";
    }
    
    echo "*/";
    
    ?>
    


    Guess what happened when I executed the script?

    The data was appended to the file alright, but the ordering of the items was messed up in some places.

    Here is a sample:

    job description:
    work phone:
    I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
    career interests:
    ...
    


    while the expected output was:
    job description:
    I am a social networking application developer. I work on the Books iRead application in Ugenie. Our app is currently available in Facebook, Bebo, Orkut and Myspace.
    work phone:
    career interests:
    ...
    


    The job description content should have been received before 'work phone', but this was not the case.

    So what is the solution?

    There are 2 things I can think of:
    1. Ensure that data posted is atomic.
    2. Come up with a simple sliding window protocol arrangement between the browser and the server.

    Solution 1 is not always feasible, because of the limits on GET URL size. In fact, we might need to split the body just so that it can be posted using GET's. So the only solution that can take care of this is (2).

    I will post more entries as I progress. Meanwhile, if you have any better solution to the problem, comment here.

    Mar. 19th, 2008

    PHP Functional Programming - A code snippet

    Given a string of comma-separated values, how do you convert each of them into a link of the form:
    <a href="http://--item--.google.com/">--item--</a>
    and return a comma-separated list of these strings?

    Snippet 1:
    <?php
    //Make these links to google.com
    
    $string = "news,reader,mail";
    
    $array_of_string = split(",",$string);
    
    $final = array();
    
    foreach($array_of_string as $item){
    	$final[] = "<a href='http://".$item.".google.com/'>$item</a>";
    }
    
    echo implode(", ",$final);
    ?>
    


    Snippet 2 (uses functional constructs):
    <?php
    
    //Make these links to google.com
    
    $string = "news,reader,mail";
    
    $array_of_string = split(",",$string);
    
    echo implode(", ", 
    		array_map(
    			create_function('$item',
    	'return "<a href=\'http://".$item.".google.com/\'>$item</a>";'
    					),$array_of_string
    			)
    		);
    
    ?>
    

    Mar. 16th, 2008

    Downloading your data using Greasemonkey

    Whenever I use some service over the web, I look for several things. Ease of use and customisability are important factors.

    However, the most important thing I consider is vendor lock-in (or rather the lack of it). Let's say I am using a particular mail service (ex, GMail). If someday, I find a better email service, would it be easy for me to switch to that service? How easy is it for me to transfer my data from my old service to my new service?

    For services like Mail, there are standard protocols for data access. So this is not an issue. However for the more recent services, like blogging, micro-blogging etc, the most widely used data access methodology/format is 'HTTP' via 'RSS' or 'ATOM'.

    However, it's not the case that all services provide data as RSS (or XML or in any other parseable form). For example, suppose I make a list of movies I have watched, in some Facebook application, or a list of restaurants I visited, how do I download this list? If I cannot download it, does it mean I am tied to this application provider forever? What if I have added 200 movies in my original service and I come across another service that has better interface and more features and I want to switch to this new service but not lose the data that I have invested time to enter in my original service?

    In fact, recently when I tried to download all my Twitters, I realized that this feature has been disabled. You are not able to get your old Twitters in XML format.

    So what do we do when a service does not provide data as XML and we need to somehow scrape that data and store it?

    This is kind of related to my last blog entry.

    So I started thinking of ways in which I could download my Twitters. The solution I thought of initially was using Rhino and John Resig's project (mentioned in my previous blog entry). However, I ran into parse issues like before. So I had to think of alternative ways.

    Now I took advantage of the fact that Twitters are short (and not more than 140 characters).

    The solution I came up with uses a combination of Greasemonkey and PHP on the server side:

    Here is the GM script:
    If you intend to use this, do remember to change the URL to post data to.
    // @name           Twitter Downloader
    
    // @namespace      http://buzypi.in/
    
    // @author         Gautham Pai
    
    // @include        http://www.twitter.com/*
    
    // @description    Post Twitters to a remote site 
    
    // ==/UserScript==
    
    function twitterLoader (){
    	var timeLine = document.getElementById('timeline');
    	var spans = timeLine.getElementsByTagName('span');
    	var url = 'http://buzypi.in/twitter.php';
    	var twitters = new Array();
    	for(var i=0;i<spans.length;i++){
    		if(spans[i].className != 'entry-title entry-content'){
    			continue;
    		}
    		twitters.push(escape(spans[i].innerHTML));
    	}
    	
    	for(var i=0;i<twitters.length;i++){
    		var last = 'false';
    		if(i == twitters.length - 1)
    			last = 'true';
    		var scriptElement = document.createElement('script');
    		scriptElement.setAttribute('src',url+'?last='+last+'&data='+twitters[i]);
    		scriptElement.setAttribute('type','text/javascript');
    		document.getElementsByTagName('head')[0].appendChild(scriptElement);
    	}
    }
    
    window.addEventListener('load',twitterLoader,true);
    


    The server side PHP code is:
    <?php
    
    global $_REQUEST;
    $data = $_REQUEST['data'];
    //Store data in the DB, CouchDB (or some other location)
    $last = $_REQUEST['last'];
    if($last == 'true'){
    	echo "
    	var divs = document.getElementsByTagName('div');
    	var j= 0;
    	for(j=0;j<divs.length;j++){
    		if(divs[j].className == 'pagination')
    		break;
    	}
    	var sectionLinks = divs[j].getElementsByTagName('a');
    	var href = '';
    	if(sectionLinks.length == 2)
    		href = sectionLinks[1].href;
    	else
    		href = sectionLinks[0].href;
    	var presentPage = parseInt(document.location.href[document.location.href.indexOf('page')+'page'.length+1]);
    	var nextPage = parseInt(href[href.indexOf('page')+'page'.length+1]);
    	if(nextPage < presentPage)
    		alert('No more pages to parse');
    	else {
    		alert('Changing document location');
    		document.location.href = href;
    	}
    	";
    } else {
    	echo "
    	var recorder = 'true';
    	";
    }
    
    ?>
    


    The GM script scrapes the twitters from a page and posts it to the server using <script> includes. The server stores the twitters in some data store. The server also checks if the twitter posted was the last twitter in the page. If so, it sends back code to change to the next page.

    Thus the script when installed, will post twitters from the most recent to the oldest.

    Ok, now how would this work with other services?

    The pattern seems to be:
    * Get the data elements from the present page - data elements could be movie details, restaurant details etc.
    * Post data elements to the server.
    ** The posting might require splitting the content if the length is more than the maximum length of the GET request URL.
    * Identify how you can move to the next page and when to move to the next page. Use this to hint the server to change to the next page.
    * Write the server side logic to store data elements.
    * Use the hint from the client to change to the next page when required.

    The biggest advantage of this method is we make use of the browser to do authentication with the remote service and also to do the parsing of the HTML (which, as I mentioned in my previous post, browsers are best at).

    Mar. 7th, 2008

    HTML parsing and Rhino

    About a year back I was working on a personal project in IBM. This was a clone of YubNub for the IBM intranet.

    For those of you who don't know YubNub, it is a simple but powerful tool, which allows you to define keywords to reach pages. One of the popular examples is gim which will take you to the Google Image Search results page for the keywords that you entered.

    When I built this YubNub clone, I had plans to introduce the feature of defining commands to get data from specific portions of a page. For example, you would be able to fetch the telephone number of a person using a command like: telephone . The way this works is by scraping the content off a page containing the telephone number at a specific section in the person's profile page.

    But wouldn't it be cool to provide the flexibility to the user to define what to fetch from a page on the Intranet? You can ask the user to define what content to fetch from a page when he creates the command.

    Look at the YubNub create command interface. The basic information asked in the page is:

    • Name of the command

    • URL

    • Description



    Now imagine having an extra text-field which asks you to enter the XPath to the content that you want to scrape from the resultant page.

    In simple words, this means, you are saying, fetch this page, then get this specific portion of the page and only give me that content. You could perhaps pipe that content to some other command or play with that content in umpteen ways. I haven't followed YubNub of-late, but I am sure there are many commands in YubNub which have similar functionality.

    Now in principle, although this is possible there was one major issue I faced. The server had to do the page fetch and then page scraping. Now although there are very good XML parsers out there, there is no good 'XML' parser for HTML. And XPath does not work unless the page is XML.

    Most pages on the Internet are HTML (or XHTML) and although it is straight-forward to transform them to XML, anyone who has tried it will see that this is not a simple solution. When you try to parse an XHTML page (even popular pages out there) you will run into issues like 'entity not defined' or 'matching element not found' etc. Although there are tools like Tidy or TagSoup, you are not guaranteed that the output of such tools is a well-formed XML.

    On the other hands, browsers are extremely flexible in the way they handle HTML. Traversing through the HTML DOM is really simple and many a times you don't even realize that your browser has silently corrected 10's of errors in the page. You can get to any specific portion of the page using HTML DOM functions or using libraries like JQuery.

    So what I was looking for, was some tool which had the flexibility of the browser's HTML handling, but at the same time was able to function on the server.

    As if by co-incidence, I ran into this post from John Resig (the person popular for JQuery). John describes one of his projects on bringing the browser environment to Rhino. He also gives an example of how to scrape content from a web-page and send the result to a file.

    Wow! This is exactly what I had been looking for. Since Rhino can be embedded in Java, all you would need to do is to make a call to the JS function to scrape content and then pass the content back to Java and continue with your processing.

    Although I don't work on the project anymore, I see requirement of this functionality in many other places. For example, just sometime back, I was looking for a simple tool to fetch Tiddlers from Tiddlywiki and convert them into a simple HTML page. This will help in supporting those browsers which don't have Javascript enabled. I tried some of the tools out there, but most of them failed. So I planned to write my own. And lo, I came across this same issue. TiddlyWiki content is in HTML and this content is not easy to parse using XML parsers (which is perhaps why many of those tools failed). So how about using Rhino and John's project to scrape content from the wiki and sending it to a file in a different format?

    Unfortunately for now, I am not able to parse the TiddlyWiki HTML content using Rhino and John's project. I get the following error (index.htm refers to the TiddlyWiki document):

    js> load("env.js");
    js> load("jquery.js");
    js> window.location = "index.htm";
    index.htm
    js> [Fatal Error] :1:325382: The element type "img" must be terminated by the matching end-tag "</img>".
    Exception in thread "Thread-0" org.mozilla.javascript.EcmaError: TypeError: Cannot call method "createEvent" of null (env.js#29)
            at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3350)
            at org.mozilla.javascript.ScriptRuntime.constructError(ScriptRuntime.java:3340)
            at org.mozilla.javascript.ScriptRuntime.typeError(ScriptRuntime.java:3356)
            at org.mozilla.javascript.ScriptRuntime.typeError2(ScriptRuntime.java:3375)
            at org.mozilla.javascript.ScriptRuntime.undefCallError(ScriptRuntime.java:3394)
            at org.mozilla.javascript.ScriptRuntime.getPropFunctionAndThis(ScriptRuntime.java:2026)
            at org.mozilla.javascript.Interpreter.interpretLoop(Interpreter.java:3081)
            at script(env.js:29)
            at script.makeRequest(env.js:650)
            at org.mozilla.javascript.Interpreter.interpret(Interpreter.java:2394)
            at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:162)
            at org.mozilla.javascript.ContextFactory.doTopCall(ContextFactory.java:393)
            at org.mozilla.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:2834)
            at org.mozilla.javascript.InterpretedFunction.call(InterpretedFunction.java:160)
            at org.mozilla.javascript.Context.call(Context.java:548)
            at org.mozilla.javascript.JavaAdapter.callMethod(JavaAdapter.java:507)
            at adapter1.run()
            at java.lang.Thread.run(Thread.java:595)
    


    The project looks very promising. I should follow it closely.

    Mar. 3rd, 2008

    Bulls and cows and the Javascript challenge

    About 2 years back, I had conducted an experiment with the Bulls and Cows game[1] [2]. I now wanted to see what the 'human average' for the game is. So I wanted to build a small Facebook application to add the social aspect to the game and conduct my experiments.

    But before I continued, I had to solve a major problem.

    If I continue to make it a Javascript game, as is hosted here, I need to ensure that the random number generated by the browser is secure and not manipulated or found out by the player using illegal ways.

    Anyone who knows a bit of Javascript and is used to looking at code using Firebug will soon be able to 'guess' the number in one step:

    Debugging using Firebug

    Yeah, that's right. I store the random number generated in a variable randomNo. And you can find out the value using Firebug. Now this is fine, as long as it is not a competition and you play the game because you actually like it and not because you are winning a million dollars. But what if this game was being played for money?

    So my next attempt was to think of storing a MD5 of the number and then match it with the MD5 of the number entered by the player. This works well as long as the random number is generated on the server side and only the MD5 is sent to the client.

    Can the random number and its MD5 be generated on the client side without the user being able to 'debug' and get the random number?

    My first attempt towards this was the following piece of code:

    function getRandomNo(){
            var md5OfRandomNo = MD5(Math.floor(Math.random()*10001)+'');
    	return md5OfRandomNo;
    }
    


    But unfortunately:

    md5-debug1

    and you step into the function and:

    md5-debug2

    :(

    Right now, I am still not able to find a fool-proof way to generate the random number on the client side. Is there a solution?

    Ok, let's say the number is securely generated in some way (client or server) and we only store the MD5 value on the client. Now, there is a second problem:

    What if the player just changes the random number altogether?

    >>> randomNo
    "948f847055c6bf156997ce9fb59919be"
    >>> randomNo = MD5('7839')
    "ca91c5464e73d3066825362c3093a45f"
    


    We need to maintain a session and include some verification code to ensure that the MD5 was not manipulated.

    Is there a solution for this if we want to write the entire game using only Javascript? Are there any other issues other than the 2 described?

    Dec. 11th, 2007

    Prove that you are a human being

    In continuing with my observations made here, I wonder if the following way of preventing someone from spamming a commenting system really helps:

    ____________ Spam protection: Sum of II plus III plus IV ?

    If I really want to spam this site, I can write a simple routine to scrape the string matching 'Sum of ___ ?' and then feed that into Google like this:

    Google search: II plus III plus IV

    And there you go! You are spammed!

    Dec. 8th, 2007

    Speed reading by hacking the column count in Firefox

    Recently, I came across a Greasemonkey script for Wikipedia. The script helps us to view Wikipedia articles in multiple columns.

    I found this to be useful and in fact saw that it improved my reading speed. In the last one week, I have referred to a lot of Wikipedia articles, and I am really addicted to this multi-column hack.

    So now, when I am reading some article, if the article spans the entire width of the page, I open Firebug, 'Inspect' the element displaying the content under consideration and add:

    -moz-column-count: 3;
    -moz-column-gap: 50px;
    font-family: Calibri;
    font-size: 11px;

    to the element.

    And if I end up visiting this site frequently, then I can add a Greasemonkey script or a Userstyle for the page or set of pages.

    Wikipedia in Firefox using a customized GM script


    The above screenshot shows a Wikipedia page as displayed in my browser.

    So why is this so useful?
    Sometime back, when I was reading an article on usability, I learnt that the reading speed depends on the width of the column. This is one of the reasons why you are able to read news articles faster in newspapers than online. You end up spanning the page vertically rather than horizontal + vertical eye movements. Rather than point to a single article, I would like to point you to the Google search for the study around this topic.

    Some of the popular pages where I have added this multi-column functionality are: Wikipedia, Developerworks and Javadocs.

    Nov. 13th, 2007

    Microblogging experiences

    I have been looking for the perfect micro-blogging service. In the last one week I tried 3 services: Twitter, Tumblr and Pownce.

    Here is what I liked in each:

    Twitter

    The good
    • Simple and elegant. Gets the job done.
    • Has IM support.
    • Good Facebook integration.

    The bad
    • The 140 character limit is more limiting than SMS.
    • All twitters are public.
    • No option to comment on twitters. The @ replies are confusing.

    Tumblr

    The good
    • Really cool interface. Ability to post text, photos, links, audio and video in one place.
    • Import entries from other services like del.icio.us, flickr, twitter etc.
    • Re-blogging.
    • Easy sharing of all my entries with people in my social-network.
    • Ability to comment on entries.
    • Entries can be either for self or public.
    • Good archive view.

    The bad
    • No Facebook integration yet.
    • The timestamp on imported entries is wrong (possibly because of timezone differences).
    • Some of the entries are not imported - looks like a bug.

    Pownce

    The good
    • Simple interface.
    • Extensive privacy options for entries.
    • Looks like there is Facebook integration (not tried this yet).

    The bad
    • There is no ability to import entries from other services.

    All 3 services have API support.

    I am still on the look out for a good microbloging service. I need some service that offers the simplicity of Twitter and the features of Tumblr. I will stick to Twitter until I see one service that helps me with all my requirements.

    Previous 20

    2008

    May 2009

    S M T W T F S
         12
    3456789
    10111213141516
    17181920212223
    24252627282930
    31      

    Advertisement

    Tags

    Syndicate

    RSS Atom
    Powered by LiveJournal.com