Tuesday, February 14, 2006

 
Based on the feedback for the proposal we choose a new topic, the basic idea is to cluster the bloglines blogs. Here is our proposal. We have submit the a new proposal.



Mining for Blog communities

Personal Blogs gives rise to interesting social networks. When we consider a Personal blog as a vertex of a graph, there are two types of edges that are connected to that particular vertex. Given a particular blog X;

  1. Subscriptions to the blog X
  2. Blog X owner's Subscriptions

Apart form this there are subscriptions are pointed to popular news feeds which could potentially unveil information about the user. In this project we try to analyze the connectivity of public blogs at www.bloglines.com to identify the communities and to observe the interests of the groups based on the public
news feeds they refer to. The information will be mined at the blogs of individuals starting form seeds harvested by querying the subscribers to popular news feeds. Starting form these seeds the system will identify and build blog connectivity graph which would be a directed graph in which the vertices would represent the individual bloggers and the edges would represent the subscriptions to blogs. In other words the If A has subscribed
to B's blog then (A, B) would be a directed edge.

The analysis is geared towards

Tasks
  1. Identifying seed blogs - This would be based on either harvesting the blog by
  2. going to subscribers to popular news feed or by user input depending on the use case.
  3. Crawling the blogs - Involves crawling the blogs based on initial seeds and
  4. building a frontier by analyzing already crawled blogs.
  5. Social network analysis using graph algorithms - Involves connectivity graph
  6. analysis and graph overlap analysis based on use case.
  7. Visualization and filtering of communities -Using a visualization package to
  8. visualize the output graph and to emphasize the relationships and clusters/communities identified.

Note: We have looked in to the robots.txt file at www.bloglines.com and we can extract necessary information without violating it.



Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?