Archive for February, 2008

The Mass-Accumulation of *.EDU Backlinks

Okay, there’s an obsession with *.edu backlinks — and I’m here to fill that void. Essentially, this will be a tutorial on how to find .edu/.gov backlinks (nothing too fancy). Be forewarned, the last few suggestions are slightly less serious.

  • Competitive Backlink Checking - Look at a domain name that is in direct competition for your site and then run the following “backlink” query on Yahoo.
  1. To rank this site highly, I might check a site that ranks high for the term “SEO”. I see that SEOinc.com ranks highly on Yahoo, let’s run this query: linkdomain:www.seoinc.com +site:.edu -site:.com -site:www.seoinc.com
  2. If I was in direct competition with Jim Boykin, I’d run this: linkdomain:www.webuildpages.com +site:.edu -site:.com -site:www.webuildpages.com at Yahoo.
  3. So, running this on any site that looks to have artificially produced backlinks would be a good start.
  • Add Links to User Contributed Pages - Find a site that uses some type of software to have pages produced. i.e., guestbooks, ffalp, forums, blogs, etc. So, check-out a demo of that script, and scroll to the bottom to see any type of copyright notice or a link back to the original site — now, copy that text and run a query (like one of the below):
  1. Using a Free For All Links Page: It had the text on the bottom of the demo: Script Created by Matt Wright and can be found at Matt’s Script Archive. Now search: site:.edu “Script Created by Matt Wright and can be found at Matt’s Script Archive”
  2. Remember adding to related sites will have the best results. This is not spamming as long as you add your site to just relevant pages. To do this, search site:.edu “name of script” “desire query for optimization”. Doing this will dramatically decrease your number of pages to submit to, but at least you’ll get much higher quality results.
  • Provide a Business Service - I’ve personally added some of my sites (other sites) into online business directories.
  1. If you take a look at this post — you can actually offer internships and receive a decent backlink because of it. Now, you’ll certainly need to be innovative while trying this technique, but it can work quite nicely.
  • Go to College - I doubt all of you will take this advice. ;)
  1. Harvard might be a good start, though (see the link — then you’ll see why).
  • Buy ‘em
  1. Buying backlinks is no longer considered “white hat seo,” so maybe I should say, “donate.”
  • Other Techniques - Acquiring links without pushing for them.
  1. Create something that reference-able (yeah, I know its not a real word).
  2. Create a mathematical application or tool related to programming or any related subject at schools. Essentially, all educational tools work well. (I’ve done this one fairly well for one of my sites.)

Is the “.edu” really that necessary?

It’s not any better than any other TLD (or maybe I should say they aren’t supposed to be — Matt Cutts confirmed this before). Other than the reason Matt Cutts gave, age is also a factor.

Splogs (or Snews) Using Web-Stemming

Most are fairly familiar with the current trend in splogs that are stuffed with three AdSense units above the fold of the page.

These pages use blackhat SEO techniques of scraping RSS feeds from multiple blogs. They are then displayed on Google (if the SEO-er is innovative enough).

If you use partial feeds, you’re safe, right?

Here’s a Python Webstemmer that takes it all to a new level.

“Snews” — scraping the news sites

Here’s their claimed accuracy:

New York Times 488.8/552.2 (88%)
Newsday 373.7/454.7 (82%)
Washington Post 342.6/367.3 (93%)
Boston Globe 332.9/354.9 (93%)
ABC News 299.7/344.4 (87%)
BBC 283.3/337.4 (84%)
Los Angels Times 263.2/345.5 (76%)
Reuters 188.2/206.9 (91%)
CBS News 171.8/190.1 (90%)
Seattle Times 164.4/185.4 (89%)
NY Daily News 144.3/147.4 (98%)
International Herald Tribune 125.5/126.5 (99%)
Channel News Asia 119.5/126.2 (94%)
CNN 65.3/73.9 (89%)
Voice of America 58.3/62.6 (94%)
Independent 58.1/58.5 (99%)
Financial Times 55.7/56.6 (98%)
USA Today 44.5/46.7 (96%)
NY1 35.7/37.1 (95%)
1010 Wins 14.3/16.1 (88%)
Total 3829.1/4349.2 (88%)

It’s fairly accurate with an 88% average while scraping professional news sources. If you read a lot of news online, you’d be fairly familiar with how much separation of text there it — meaning, news items broken up with random ads. Now, how much easier would it be to scrape Wordpress blogs that EACH have the SAME EXACT template structures? Not too much.

Below is how text is broken up:

$ cat cnn.txt

!UNMATCHED: 200511210103/www.cnn.com/                                             (unmatched page)!UNMATCHED: 200511210103/www.cnn.com/privacy.html                                 (unmatched page)

!UNMATCHED: 200511210103/www.cnn.com/interactive_legal.html                       (unmatched page)

…

!MATCHED: 200603010455/www.cnn.com/2006/HEALTH/02/09/billy.interview/index.html   (matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Too busy to cook? Not so fast - Feb 9, 2006                      (supplementary section)

TITLE: Too busy to cook? Not so fast                                              (article title)

SUB-10: Leading chef shares his secrets for speedy, healthy cooking               (supplementary section)

SUB-17: Corporate Governance                                                      (supplementary section)

SUB-17: Lifestyle (House and Home)

SUB-17: New You Resolution

SUB-17: Billy Strynkowski

MAIN-20: (CNN) — A busy life can put the squeeze on healthy eating. But that     (main text)

         doesn’t have to be the case, according to Billy Strynkowski, executive

         chef of Cooking Light magazine. He says cooking healthy, tasty meals

         at home can be done in 20 minutes or less.

MAIN-20: CNN’s Jason White interviewed Chef Billy to learn his secrets for

         healthy cooking on the run.

…

SUB-25: Health care difficulties in the Big Easy                                  (supplementary section)

!MATCHED: 200603010455/www.cnn.com/2006/EDUCATION/02/28/teaching.evolution.ap/index.html  (another matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Evolution debate continues - Feb 28, 2006                        (supplementary section)

TITLE: Evolution debate continues                                                 (article title)

SUB-17: Schools                                                                   (supplementary section)

SUB-17: Education

MAIN-20: SALT LAKE CITY (AP) — House lawmakers scuttled a bill that would have   (main text)

         required public school students to be told that evolution is not

         empirically proven — the latest setback for critics of evolution.