Jerry Wayne Odom Jr.

My Search Engine



I've read alot about and worked with many of the major search engines and directories. This includes Google, Inktomi, Altavista, Yahoo and AlltheWeb just to name a few. This interest has brought me into wanting to go ahead and write my own web crawler and search engine. I'll be writing this engine in the very versitile PERL programming language and posting the results on this page so I can gloat over my not so brilliance. I am using this page as a framework for this project.

Parts of my Search Engine The Web Crawler
  • The web crawler will go out and collect information such as URL, Page Content, IP, and whatever else becomes necessary. I will be using DMOZ as a starting point because they seem to be the most friendly towards spider agents.
  • I'm considering sending this spider out cloaked. Using either LWP or Socket in PERL I can make it appear as an ordinary browser request which will help me avoid peole who dislike spidering. The spider will be aggressive in that it won't look at or obey robots.txt.
The Index
  • My Index will start at 10,000 sites and depending on the scalability and storage requirements it will move up to an undefined number. I intend to try to index as many different domains as possible as some individual domains contain thousands of pages themselves.
The SERPs Generator
  • Results will be generated based on some algorithm to be applied to the Index. My thoughts are that only on the page content will be used to evaluate the pages at first and later on a possible random walk link popularity system with some sort of grouping for categories.