Archive for the 'Scripts' Category

Splogs (or Snews) Using Web-Stemming

Most are fairly familiar with the current trend in splogs that are stuffed with three AdSense units above the fold of the page.

These pages use blackhat SEO techniques of scraping RSS feeds from multiple blogs. They are then displayed on Google (if the SEO-er is innovative enough).

If you use partial feeds, you’re safe, right?

Here’s a Python Webstemmer that takes it all to a new level.

“Snews” — scraping the news sites

Here’s their claimed accuracy:

New York Times 488.8/552.2 (88%)
Newsday 373.7/454.7 (82%)
Washington Post 342.6/367.3 (93%)
Boston Globe 332.9/354.9 (93%)
ABC News 299.7/344.4 (87%)
BBC 283.3/337.4 (84%)
Los Angels Times 263.2/345.5 (76%)
Reuters 188.2/206.9 (91%)
CBS News 171.8/190.1 (90%)
Seattle Times 164.4/185.4 (89%)
NY Daily News 144.3/147.4 (98%)
International Herald Tribune 125.5/126.5 (99%)
Channel News Asia 119.5/126.2 (94%)
CNN 65.3/73.9 (89%)
Voice of America 58.3/62.6 (94%)
Independent 58.1/58.5 (99%)
Financial Times 55.7/56.6 (98%)
USA Today 44.5/46.7 (96%)
NY1 35.7/37.1 (95%)
1010 Wins 14.3/16.1 (88%)
Total 3829.1/4349.2 (88%)

It’s fairly accurate with an 88% average while scraping professional news sources. If you read a lot of news online, you’d be fairly familiar with how much separation of text there it — meaning, news items broken up with random ads. Now, how much easier would it be to scrape Wordpress blogs that EACH have the SAME EXACT template structures? Not too much.

Below is how text is broken up:

$ cat cnn.txt

!UNMATCHED: 200511210103/www.cnn.com/                                             (unmatched page)!UNMATCHED: 200511210103/www.cnn.com/privacy.html                                 (unmatched page)

!UNMATCHED: 200511210103/www.cnn.com/interactive_legal.html                       (unmatched page)

…

!MATCHED: 200603010455/www.cnn.com/2006/HEALTH/02/09/billy.interview/index.html   (matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Too busy to cook? Not so fast - Feb 9, 2006                      (supplementary section)

TITLE: Too busy to cook? Not so fast                                              (article title)

SUB-10: Leading chef shares his secrets for speedy, healthy cooking               (supplementary section)

SUB-17: Corporate Governance                                                      (supplementary section)

SUB-17: Lifestyle (House and Home)

SUB-17: New You Resolution

SUB-17: Billy Strynkowski

MAIN-20: (CNN) — A busy life can put the squeeze on healthy eating. But that     (main text)

         doesn’t have to be the case, according to Billy Strynkowski, executive

         chef of Cooking Light magazine. He says cooking healthy, tasty meals

         at home can be done in 20 minutes or less.

MAIN-20: CNN’s Jason White interviewed Chef Billy to learn his secrets for

         healthy cooking on the run.

…

SUB-25: Health care difficulties in the Big Easy                                  (supplementary section)

!MATCHED: 200603010455/www.cnn.com/2006/EDUCATION/02/28/teaching.evolution.ap/index.html  (another matched page)

PATTERN: 200511210103/www.cnn.com/2005/POLITICS/11/20/bush.murtha/index.html      (layout pattern name)

SUB-0: CNN.com - Evolution debate continues - Feb 28, 2006                        (supplementary section)

TITLE: Evolution debate continues                                                 (article title)

SUB-17: Schools                                                                   (supplementary section)

SUB-17: Education

MAIN-20: SALT LAKE CITY (AP) — House lawmakers scuttled a bill that would have   (main text)

         required public school students to be told that evolution is not

         empirically proven — the latest setback for critics of evolution.


Clickbank Redirect Script (PHP)

As always trying to use search engine marketing and optimization (SEM&O) in the best possible way, I try to combine them so they both workout perfectly.

Without delaying, here is the script that will create redirects for ClickBank (if you notice errors or want improvement, you can drop a comment):

<?php

// ClickBank Redirect/Cloak Script (PHP)

// example, relative URL: cb.php?s=STORE-ID

// made at http://ugux.com/blog/

$nickname  =  ‘multiz’; //your ID/nickname is at clickbank

$store     =  $_REQUEST[’s’];

$extra     =  $_REQUEST[’x'];

$tid       =  $_REQUEST[’tid’];

if($tid)

{

$extra .= ‘&tid=’.$tid;

}

if($store)

{

header(’Location: http://’.$nickname.’.’.store.’.hop.clickbank.net/?’.$extra);

}

else

{

//if something is entered wrong, they’re sent to your not found (404) page

// or whatever page you choose.

header(’Location: 404.shtml’);

}

?>

For those of you who do not like reading useless instructions, a link would look like this:

<a href=”/cb.php?s=store” rel=”nofollow” title=”Add an SEO friendly persuasion to the reader/buyer”>Store</a>

How to use the script

1. Save the script as cb.php or something else and upload it

2. Change the line: $nickname  =  ‘multiz’; to $nickname  =  ‘YourClickbankNickname’;

3. Link to merchant or store using their Clickbank ID (usually found as the subdomain after your Nickname; http://yourNickname.storeID.hop.clickbank.net)

4. Link like this cb.php?s=STORE-ID (although obvious, make sure you change STORE-ID; you have no idea how many people will mess this up)

Advanced parameters

1. You can add tracking IDs by linking like this: cb.php?s=STORE-ID&tid=TRACKING-ID

2. Add any extra parameters: cb.php?s=STORE-ID&x=MORE-Parameters

For more parameters, you will need to URL-encode any extra “=” or “&”.

Other tips (SEO & SEM)

1. Change robots.txt by adding Disallow: /cb.php to a new line

2. Add title attributes to links (can increase conversions and provide a little better SEO) <a href=”/cb.php?s=store” title=”A store that adds insane discounts daily”>Store</a>

3. Keep your pagerank by adding rel=”nofollow” to links <a href=”/cb.php?s=store” rel=”nofollow” title=”A store that adds insane discounts daily”>Store</a>

FYI: This should be able to prevent sneaky Adware that tries to place their ClickBank ID in your ClickBank links.

Another PHP Clickbank script

Another Clickbank script should be added soon that will automatically fetch merchants/products based upon added keywords.

How You Should Park A Domain Name

If you no longer want to use a domain name you already purchased, you shouldn’t delete it, but rather profit off of it. There are a couple methods that you should avoid and a couple that are pure genius ways of how you should park your domain name(s).

List sponsored results on it with a professional parking service

Now, this is the worst thing you could possibly do with it except for just deleting. So, do not allow GoDaddy, Sedo, etc. to do this for you or else you will dramatically be reduced in the search engine results and probably be removed and loose almost all of your previously acquired traffic.

Keep the domain the way it was, plus…

Just add some sponsored results to the top of every page — using AdSense, Yahoo!, Text-Link-Ads, etc.

If the domain name doesn’t have any content

This can be great for brand new domains or expired domains. You can either try WhyPark or 45n5’s free script which shows you a tutorial, below: And that’s about it.