Since Niall mentioned GData, I've been meaning to look into it further. Today Otis mentioned that one of the Apache/Summer of Code projects proposed is a lucene-based GData server implementation.
I took a look at the docs and realized that this is actually a really old spec, as old as the epoch as a matter of fact. Check it out:
But seriously folks, the G-man and his crue have done a fine job providing client implementations (as long as your not waiting on Ruby or one of the P-languages, no perl, python or php yet). Even a nice set of examples for the Java implementation. Thanks, G!
google gdata lucene summerofcode ruby perl python php
( Apr 25 2006, 08:17:45 PM PDT ) Permalink
Over the last year and a half, I've spoken extensively to friends, colleagues and audiences about web spam. At eTech I showed how spam blogs behave statistically atypical, as soon as you start looking at the publishing characteristics (such as linking and posting rates), the spam comes percolating up to the top. For instance, this chart is a sample of linked-to domains from blogs hosted by Google's blogspot service (the y-axis is in thousands of links per day):
The highlighted domains are sites that the spammers are trying to put in front of mouse cursors by making them look important. Besides being a nuisance, this is part of a larger hazard to the whole web advertising market.
While Technology Evangelist pointed out Google News' role in the spam ecosystem, Niall has previously noted that Google provides lots of tools for perpetrators of web spam to employ. AdSense is the prime object and subject of spam. A video posted yesterday on YouTube details the anatomy of some typical AdSense abuse:
There has been a recent explosion in abuse of AdWords, Google's PPC (pay per click) advertising platform. It is apparent that the techniques do not follow Google's own clear guidelines. Unfortunately, users and legitimate advertisers pay the price, while Google and the unscrupulous advertisers profit.Web spam isn't new, this has been going on for years and at this point, one must conclude that Google's not serious about doing anything about their spam problems. There used to be a qualification step for AdSense and they'd reject sites that didn't meet some basic criteria as legitimate sources of content. But that stopped about a year and a half ago and the spam has been pouring on ever since. The video is a call to action to complain to Google; they have a fidicuary responsibility to move against the abuse. The video shows a search for forklifts to illustrate rampant ad policy violations. The video narration asks, "Why is google allowing this to happen?" and answers
Google has always put its users first and one would hope that they will continue to abide by their charter. My aim with this video is to urge them to do something about these issues sooner rather than later.
This is the dirty little secret about these tactics: Google profits from every click on it's network. Be it on ads that are clicked on google.com or ads that are clicked on websites that are running google ads throuth the adsense program. This is a very difficult thing for them to self police because doing something about it will effect their short term profits.I'm not condemning AdSense per se, it's a great service and revitalized web advertising after the flame-out a few years ago. But black-hat SEO's have definitely cranked up the game over the last year or so and are putting the whole market at risk again. Google may not be particularly motivated to go out and find the abuse but they have to act against it when you bring it to their attention. Watch this video:
web spam splog google spam adsense adwords splogs
( Apr 25 2006, 12:28:25 PM PDT ) Permalink