Google Pagerank – An emprical analysis

Pagerank is an algorithm used by google to assign importance of each page in world wide web. The order of search results  depends on the page rank assigned to each page.  A web site has multiple pages and page rank assigned to each page in the web site. The page rank of the page depends on number of external links coming to the page and number of links going out of the page.  The page rank of an incoming link also plays a role on determining the page rank of the page.

Page rank is used to decide the sequence of search results for a google search. Companies want their page to be at the top of the search result. The easiest way to be at top of the search result is to pay google for the adwords. The sponsored web sites appear above and to the right hand side of its regular search results. 

Not all companies can afford to pay google for their Adwords. There is cost effective alternative which requires an understanding of how google works.  Google’s internal working knowledge will aid to define  analytical (or mathematical) strategies and implemented in the web site to improve the page rank and eventually with more visitors to the web site.

To validate the theory with emprical analysis, a plan with following steps drafted.

  1. Find a key word that is not in google’s index server
  2. Create a graph (random graph) and find the initial transition probability
  3. Evaluate the steady state of transition probability matrix using power methods – Which is the page rank of the graph (I will post the technical/mathematical details in a pdf. It is time-consuming to write matrix and other math notation in word press editor)
  4. Develop a set of web sites (pages) adhering to the random graph and each web site to contain the new key word
  5. Allow google crawler to include the new sites in their index server for the new key words
  6. Search for the new key word and absorb the order of search results
  7. Report the results

1. A new key word was selected and  is given below.   There is no google search result for the key word.

2. The random graph  (a representation of how web sites are linked to each other) and the graph will be implemented  with various blog post and  each blog post will have the key word  adhering to the graph. Each node of the graph denote a blog posting. The links connecting the nodes are the hyperlinks connecting the nodes.

Note: This page is used for google’s page rank emprical analysis. The links will be created based on the random graph created.  This is node #1 which has the key word:  xysivabodzinyx , xysivabodzinxy . As per the graph, it links out to page 2, page 4

Google Workshop

Fifteen years ago, seeking for useful information in a public domain  was a challenge and where as today, we have the opposite problem.  IMG_0161There are lots of information available on any topic. 

The social networking sites enables the right  information  come to us instead of we seek for it.  But still, filtering the information is a key challenge.  We more often go to search engine like google  for information and filtering techniques in the google helps us to receive the most relevant information quickly.

Recently, I wrote a  job aid/reference  on “How to google effectively” and there are few beneficiary of that document personally stopped by my desk at work and thanked me  for that  job aid. That was signficantly modified by other team members in the organization and published to the entire organization.

We decided that live demonstration on utilizing google search engine using  the job aid  would  give more clarity on usage.  A work shop was scheduled and opened up  to entire organization with the limit of 40 users and it  filled up almost immediately.   I  conducted the work shop to the signed audience from business and IT area on how to google effectively using google keywords  like file type, site, define, “wiTHin qUoTeS”, and etc.

I took a simple scenario (#1), search for presentation material in educational institutes containing our company name. Run the scenario as a novice search engine users and showed the total number of relevant documents returned by google was around 1.2 million documents. Used various google key words and step wise demonstrated how the results can be narrowed down and eventually reached 9 most relevant documents.  In scenario (#2), demonstrated  how to search effectively to understand the basic definition of crowd sourcing, scenario (#3) was to find the patterns registered in core business. Had few more scenarios and explained, in high level, how google works using their pagerank algorithm and the required knowledge level for internet marketing group to increase pagerank of our web sites and eventually to increase the number of potential customer visits to our web sites.

IMG_0164A scenario (#n) given by an audience on how to search satellite images of a used car dealership in America. The scenario is so broad and the criteria required refinement before we jump into  search of  ocean of information.  The criteria was narrowed down just to Michigan. The results  returned as a map data (kml extension file) and these files read by free software like google earth. The kml extension files contained locations of all used car dealership in Michigan with the satellite images. Since I didn’t have google earth installed in my laptop , I could not demonstrated the results back to the audience. The purpose of the write-up is to display the results of the kml extension file and  raise the question if google should consider to directly read the kml extension file in their google maps.


iPod of the car industry

Few years ago, when I saw iPod for the first time, like many, I was stunned for its design, simplicity and quality. I had at least 5-6 different cell phones in my life and each of the cell phone manual was around 150 pages and when I got the iPod few years ago, the manual for that iPod was 2 pages. When I brought up this to my close friends during my Sunday chat sessions, some of them argued with me that iPod  and cell phone functionalities are different and hence the significance in the manual size. Those friends were speech less when Apple came up with iPhone.

I was wondering, why Sony did not come up with something similar like iPod. They dominate this market for so long and why they were not the first one to come up with something similar to iPod.

After some study, as I understand, most of the Japanese companies use the Japanese management style in all strategic and operational management. The key approach, as I understand, Japanese management style is more on consensus building. If there are 5 members in a team, all of them HAS to agree on the direction, approach, next steps before an action is taken. It makes a fundamental assumptions that all the 5 members are subject matter expert and kind of have an idea of the future prediction through approximation.

Obviously this management style is to limit the agility, innovation and time consuming. Statistically, this style proven to produce better quality products. In other hand, quick to market approach management style is proven to be more innovative but lack quality.

In Walter Chrysler biography, Chrysler stated one of main reason for his success and innovation was: make quick decision ,observe the results and adapt instead of taking long time to make a decision and realize it was not the right decision. Historically Chrysler company proven to produce most innovative car product in the car industry. Walter Chrysler management style is other spectrum of Japanese management style.

It appears, based on the recent JD power survey and consumer reports, American cars quality have been improving a lot but long way to go. They are on the right track. Particularly, Ford has been producing high quality product with best fuel mileage in last year. Like Ford, if the other American car companies figure out a way to drastically improve the quality of their product AND keep the innovation which has been in their roots they are going to produce the iPod of the car industry.

I wonder, the Japanese car companies are making any adjustment to their management style to be more innovative to achieve what Sony failed to do so.