Archive for the 'Blog' Category

Splogs (or Snews) Using Web-Stemming

Most are fairly familiar with the current trend in splogs that are stuffed with three AdSense units above the fold of the page.

These pages use blackhat SEO techniques of scraping RSS feeds from multiple blogs. They are then displayed on Google (if the SEO-er is innovative enough).

If you use partial feeds, you’re safe, right?

Here’s a Python Webstemmer that takes it all to a new level.

“Snews” — scraping the news sites

Here’s their claimed accuracy:

New York Times 488.8/552.2 (88%)
Newsday 373.7/454.7 (82%)
Washington Post 342.6/367.3 (93%)
Boston Globe 332.9/354.9 (93%)
ABC News 299.7/344.4 (87%)
BBC 283.3/337.4 (84%)
Los Angels Times 263.2/345.5 (76%)
Reuters 188.2/206.9 (91%)
CBS News 171.8/190.1 (90%)
Seattle Times 164.4/185.4 (89%)
NY Daily News 144.3/147.4 (98%)
International Herald Tribune 125.5/126.5 (99%)
Channel News Asia 119.5/126.2 (94%)
CNN 65.3/73.9 (89%)
Voice of America 58.3/62.6 (94%)
Independent 58.1/58.5 (99%)
Financial Times 55.7/56.6 (98%)
USA Today 44.5/46.7 (96%)
NY1 35.7/37.1 (95%)
1010 Wins 14.3/16.1 (88%)
Total 3829.1/4349.2 (88%)

It’s fairly accurate with an 88% average while scraping professional news sources. If you read a lot of news online, you’d be fairly familiar with how much separation of text there it — meaning, news items broken up with random ads. Now, how much easier would it be to scrape Wordpress blogs that EACH have the SAME EXACT template structures? Not too much.

Below is how text is broken up:

$ cat cnn.txt

!UNMATCHED: 200511210103/www.cnn.com/                                             (unmatched page)!UNMATCHED: 200511210103/www.cnn.com/privacy.html                                 (unmatched page)

!UNMATCHED: 200511210103/www.cnn.com/interactive_legal.html                       (unmatched page)

…

!MATCHED: 200603010455/www.cnn.com/2006/HEALTH/02/09/billy.interview/index.html   (matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Too busy to cook? Not so fast - Feb 9, 2006                      (supplementary section)

TITLE: Too busy to cook? Not so fast                                              (article title)

SUB-10: Leading chef shares his secrets for speedy, healthy cooking               (supplementary section)

SUB-17: Corporate Governance                                                      (supplementary section)

SUB-17: Lifestyle (House and Home)

SUB-17: New You Resolution

SUB-17: Billy Strynkowski

MAIN-20: (CNN) — A busy life can put the squeeze on healthy eating. But that     (main text)

         doesn’t have to be the case, according to Billy Strynkowski, executive

         chef of Cooking Light magazine. He says cooking healthy, tasty meals

         at home can be done in 20 minutes or less.

MAIN-20: CNN’s Jason White interviewed Chef Billy to learn his secrets for

         healthy cooking on the run.

…

SUB-25: Health care difficulties in the Big Easy                                  (supplementary section)

!MATCHED: 200603010455/www.cnn.com/2006/EDUCATION/02/28/teaching.evolution.ap/index.html  (another matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Evolution debate continues - Feb 28, 2006                        (supplementary section)

TITLE: Evolution debate continues                                                 (article title)

SUB-17: Schools                                                                   (supplementary section)

SUB-17: Education

MAIN-20: SALT LAKE CITY (AP) — House lawmakers scuttled a bill that would have   (main text)

         required public school students to be told that evolution is not

         empirically proven — the latest setback for critics of evolution.


Buy All the .edu Links You Want

buy blog edu linkThis is kind of a disgusting thing for a .edu site to do — but they’re doing it. From http://blogs.adison.edu/ (Google PageRank = 4) you can “donate” $35 a month get a WordPress blog hosted at their site. Here’s a quote from them:

Blogs are a $35 donation per month or $320 if you pay for your donation once a year ($100 Savings). This helps cover critical bandwidth, updates and support.

Wow, if I was a web host and made people pay $35 a month just to have a simple WordPress blog hosted, I’d never get a sale (BTW, you can get a free blog hosted a wordpress.com). Also, that “$100 Savings” sounds a little bit like a sales pitch to me. ;)

They all also have a resources directory (PageRank = 5), that you can buy a link for them to put on for $40 per year (better than the prices on text-link-ads):

To get into the resource directory a donation of $40 per year per link is required. This donation goes to reviewing your site and to making the site better for students.

Ohhhh, do the students get paid, too?

Bottom line, if .edu sites can get penalized, this one should be.

PS: I put rel=”nofollow” on the links to the .edu site.

Linkbait Tactics That Actually Work

I’m horrible at linkbaiting and I’ve certainly read plenty of articles about linkbait, but which ones actually work? These writers try write the stuff to try to make you feel warm and fuzzy inside, but do they actually tell you linkbait tactics that work? That’s what I’m about to find out.

Linkbaitable articles

The follow are articles that actually acquired links through their use of their linkbait tactics. Now, don’t just read the following articles, but rather analyze the following articles on how they used their linkbait tactics to actually acquire links.

Linkbait is the New Reciprocal Links Page (seobook.com) 2,330 Yahoo! Links

Andy Hagans’ Ultimate Guide to Linkbaiting and SMM
(tropicalseo.com) 2,090

SEO Advice: linkbait and linkbaiting (mattcutts.com) 1,680

The Art of Linkbaiting (performancing.com) 1,410

2007 Guide To Linkbaiting: The Year Of Widgetbait? (searchengineland.com) 1,120

Methods of website linking (wikipedia.org, linked to with “nofollow”) 506

An Introduction to Linkbaiting (problogger.net) 350

Rand Fishkin is Brilliant or Linkbait’s Characteristics (pandia.com) 348

Linkbait Articles (cornwallseo.com) 534

Linkbait, Linkbait, Linkbait (jimboykin.com) 281

The History of Link Bait (copyblogger.com) 203

What is Linkbait? (problogger.net) 154

The Resource Linkbait - Using Lists to Build Authority, Traffic and Links to Your Website (doshdosh.com) 69

The Two Kinds of Linkbait (seomoz.com) 62

How To: Linkbait Your Blog (brandon-hopkins.com) 50

Linkbait (golod.com) - 10

Analysis

Can little guys make linkbait too?

As you may have noticed, the sites that tended to acquire the most amount of links are actually fairly well-known websites/blogs. This shows that even if you wrote perfect linkbait, you probably wouldn’t reap the benefits over-night.

Linkbait tactics I know that work

Since this blog doesn’t have tons of credibility (yet), it is very difficult for me to make use of linkbait tactics effectively. So, here are some of my ideas that work effectively:

  • Design a free template (usually something simple and easily customizable works best) - then submit it to oswd.org and few other freebies like this.
  • Develop a script for websites (related to your website; i.e., Ajax, PHP, etc.) - then submit it to hotscripts.com and a few other script directories.

Keeping the links on the script or template can be difficult, but if you include and enforce licensing info., then more people should keep on your link. Also, a hidden link can be quite effective when used by only displaying the link when a part of the site is hovered over (slightly confusing — but can work very well).

The Insanely Long “Places to Ping List” for Blogs

long ping listThe Looooooooooonnnnngg ping list is at the last paragraph (if you don’t care to read about the explanations of it, and/or view the short ‘n’ simple list of pinging places).

Pinging is used to get exposure to your blog. A lot of times you can get backlinks to your site through some of these small-time pinging services (nice for SEO). Basically, if you have a feed at your site, you should be able to ping (all WordPress and Blogger users have this). In WordPress, you can edit your ping list by navigating to Options > Writing > Update Services.

Short ‘n’ Simple Ping List

http://rpc.pingomatic.com

Yeah, that isn’t a whole lot. Ping ‘o Matic pings to a lot of services — but not all.

The Long List

http://1470.net/api/ping
http://api.feedster.com/ping
http://api.moreover.com/ping
http://api.moreover.com/RPC2
http://api.my.yahoo.com/RPC2
http://api.my.yahoo.com/rss/ping
http://bblog.com/ping.php
http://bitacoras.net/ping
http://blog.goo.ne.jp/XMLRPC
http://blogdb.jp/xmlrpc
http://blogmatcher.com/u.php
http://blogsearch.google.com/ping/RPC2
http://bulkfeeds.net/rpc
http://coreblog.org/ping/
http://mod-pubsub.org/kn_apps/blogchatt
http://ping.amagle.com/
http://ping.bitacoras.com
http://ping.blo.gs/
http://ping.bloggers.jp/rpc/
http://ping.blogmura.jp/rpc/
http://ping.cocolog-nifty.com/xmlrpc
http://ping.exblog.jp/xmlrpc
http://ping.feedburner.com
http://ping.myblog.jp
http://ping.rootblog.com/rpc.php
http://ping.syndic8.com/xmlrpc.php
http://ping.weblogalot.com/rpc.php
http://ping.weblogs.se/
http://pingoat.com/goat/RPC2
http://rcs.datashed.net/RPC2/
http://rpc.blogbuzzmachine.com/RPC2
http://rpc.blogrolling.com/pinger/
http://rpc.icerocket.com:10080/
http://rpc.newsgator.com/
http://rpc.pingomatic.com
http://rpc.technorati.com/rpc/ping
http://rpc.weblogs.com/RPC2
http://topicexchange.com/RPC2
http://trackback.bakeinu.jp/bakeping.php
http://www.a2b.cc/setloc/bp.a2b
http://www.bitacoles.net/ping.php
http://www.blogdigger.com/RPC2
http://www.blogoole.com/ping/
http://www.blogoon.net/ping/
http://www.blogpeople.net/servlet/weblogUpdates
http://www.blogroots.com/tb_populi.blog?id=1
http://www.blogshares.com/rpc.php
http://www.blogsnow.com/ping
http://www.blogstreet.com/xrbin/xmlrpc.cgi
http://www.lasermemory.com/lsrpc/
http://www.mod-pubsub.org/kn_apps/blogchatter/ping.php
http://www.mod-pubsub.org/ping.php
http://www.newsisfree.com/RPCCloud
http://www.newsisfree.com/xmlrpctest.php
http://www.popdex.com/addsite.php
http://www.snipsnap.org/RPC2
http://www.weblogues.com/RPC/
http://xmlrpc.blogg.de
http://xping.pubsub.com/ping/
http://rpc.pingomatic.com/
http://pingqueue.com/rpc/
https://phobos.apple.com/WebObjects/MZFinance.woa/wa/pingPodcast
http://rpc.britblog.com/
http://services.newsgator.com/ngws/xmlrpcping.aspx
http://www.holycowdude.com/rpc/ping/
http://1470.net/api/ping
http://http://www.a2b.cc/setloc/bp.a2b
http://http://api.feedster.com/ping
http://www.bitacoles.net/ping.php
http://ping.pubsub.com/ping/
http://xmlrpc.blogg.de/

PS: My computer removed the duplicates and sorted them alphabetically — hopefully, most of these services work.

Get Free BackLinks w/o rel=”nofollow”

backlinkThe quick and clean way to is to post on blogs. The below are possible excuses made by an unbelieving webmaster:

All blogs have the rel=”nofollow” on their blogs.

Not all — that’s what the list below is for.

Won’t the webmasters moderate their blogs so you can’t comment anything?

Yeah, but some are so excited that you commented on their blog that doesn’t get any traffic (like this one) — they’ll probably allow your comment if all you said was “nice post”.

Lists of blogs without the rel=”nofollow”

http://www.bumpzee.com/no-nofollow/blogs - Provides a list with tags about the blogs.

http://www.dofollowblogs.com/ - A whole site-directory for dofollow blogs.

http://commenthunt.com/ - A Google custom search for do-follow blogs.

http://courtneytuttle.com/blogs-that-follow/ - Over 200 blogs.

http://nicusor.com/do-follow-list/ - More than 300 blogs with Alexa Ranking.

http://noladawn.wordpress.com/2007/10/04/do-follow-list/ - Decent size list.

http://readbarbi.blogspot.com/2007/07/do-follow-list-of-bloggers.html - Lots of links.

http://money.bigbucksblogger.com/get-links-do-follow/ - A feed of recent posts from do-follow blogs (the quick and dirty way to backlinks).

http://randaclay.com/blogging/the-i-follow-d-list/ - Just a few blogs.

http://soleflor-en.blogspot.com/2007/04/do-follow-list.html - Not sure if still updated.

The other, related backlink trick

Become a do-follow (no-nofollow) blog and then submit your sites to all of those lists and get ton of high-quality back links from them. To do this, you can remove the rel=”nofollow” from all of the following blogging software: Wordpress, TypePad, Blogger/BlogSpot, and Moveable Type.

Social bookmarking sites

http://www.caroline-middlebrook.com/blog/do-follow-social-bookmarking-sites/ - A list of 24 popular sites.

Want more backlinks?

If you have a do-follow list or know of any others, submit a comment and I’ll add it!