When will we stop hearing about Calacanis and Mahalo? The only thing I take away from the whole thing is how effective manipulating search engines can be. And I already knew this. From blackhat SEOs.
The only thing the blackhats haven't taught me is how to get into the Google index and stay there - how are you doing it, Calacanis?
Mahalo is a black hat SEO site, that's given preferential treatment by Google. Any other black hat SEO site would have gotten delisted a month after launching.
This is getting a little silly I agree. I guess the SEOs are not going to stop so we're all going to have to suffer as they create spam pages in Mahalo and then point to them and say "Calacanis allows spam!"
Update:
1. Did we have show short pages in Mahalo? Yes.
2. How did they get there? Our users start topic pages and don't finish them.
3. Were they getting traffic? Not really... like < 6% of our traffic was from short pages. We know this because we've not deleted them or greatly improved them.
4. Why does this keep coming up with the SEOs? Because I called SEO bullsh#$t back in 2005 when someone asked me at a conference why Engadget and Joystiq were doing so well with SEO.
5. Do you still think SEO is bullshit? No, I don't. Back then I didn't know what SEO was and assumed it was spamming google so I said, "oh, that's bs." This was true at the time. At Weblogs Inc. we did zero SEO and we had amazing search traffic. Why? Well, it was later explained to me that we made a lot of content (20 posts a day across 50 blogs = a lot of posts), we got a lot of comments on them and we used keywords in the titles of the posts before anyone else used them. Today's I don't think SEO is BS. I think SEO is an essential part of building any startup just like building a viral loop, doing PR or hiring great people.
6. Are you sorry to the SEO community for hurting their feelings. Yes, I'm really really sorry.
7. I found a page that has no content on it? yes, that will happen from time to time... when we find them we delete, redirect or expand them. These will be in the < 1% of our corpus range. If you do find one email me jason at mahalo.com.
8. don't you have tools to do that? yes and no. we're building them. we need to balance turning off pages people are planning on building out vs. ones that are abandoned. So, you will see nofollow on pages with no content, and no follow on pages with < 100 words if they are older than X day, etc. This is all being built now.
That's it... there is nothing bad going on here. We're good people doing good work. Hundreds of people are making real money to take care of their families at Mahalo and I wish you guys would stop attacking them.
If you're pissed at me take it out on me, leave Mahalo out of it.
.... or don't. Who cares. This is so biased and absurd right now I don't think any is really taking it seriously any more. Aaron had a handful of good points last month, but now folks are just doing this to linkbait for their SEO sites.
great... trust me, this is making my experience reading HN annoying too. If HN wants to filter out "calacanis" and "mahalo" from HN i'm all for it. Clearly the system here is being abused.
Jason, you are actually full of crap. For starters, you are mistaken, or pretending not to know, as to why this keeps happening to you. You brought this on yourself, and on Mahalo.com, by unsubtle spamming and boasting about it, and then being rude to boot. You can whine all you want about the perceived injustice of people calling you on your crap, but the fact remains it is your actions that are getting called out, nothing more and nothing less.
This isn't about a small handful of pages that don't have content on them, and if you read the posts you were rebutting instead of trying so damn hard to make yourself sound better then you might actually have information you could use to improve your site.
The vast majority of the pages on Mahalo are empty pages devoid of human generated content. It is not like I am straining my ass off to find them, either. Click on this link, scroll down the page, click through to 20-30 pages (the ones with all lower case titles demonstrate what I am talking about most often, focus on those), and LOOK AT YOUR WEBSITE:
In order to appease Matt Cutts, who gave you the courtesy of warning you instead of banning you outright, you did in fact add noindex to all of those pages. 1 day later, after he saw that it was there, you removed it again.
Pretending not to know what is being talked about and making shit up in an attempt to garner pity for being picked on is not helping your cause any.
as I've said, all these pages are being deleted, redirected or built out.
also, the no index thing is going to be on all pages under 100 original words.
we are working as fast as we can to resolve all these stubs, but many are owned by users and we don't want to just nuke them.... so, we're busy at work.
we're removing (or building out) any page in our system created by our users with under 200 words of original content. This will take a couple of weeks but it's tarted.
The noindex tags were there... you took them out again as soon as Matt Cutts looked at the pages. If you want to be taken seriously, maybe you can address that issue instead of calling me a troll for bringing it up.
Also, if you are going to attack me and make accusations about what is being written, could you please at least have the courtesy to actually read the stuff first?
Shame on Google algorithms for not automatically being able to detect a page without content.
Over the years, my view of the Google's ranking algorithms went from dream-like (they are so smart they understand everything) to infant-like (just a bunch of keyword counters).
Obviously there has to be huge complexity involved, so my current view must be pessimistic, but still...
Just one (unrelated) example: search for "leader in online used cars" in Google. Does every human understand what I'm looking for? Here's search result #5: "Indian holy leader and BA stewardess arrested over prostitution".
In electronics it is a well known fact that it is impossible to measure a system without influencing it.
I think is no way you could ever build a successful search engine ranking algorithm that will not influence the web to the point where that algorithm will lose a lot of its initial edge.
Damn, you're going to feel pretty foolish if it ends up that either the Indian holy leader or the BA stewardess are among the current leaders in online used car sales.
That being said, just to add fuel to the fire, I did the same search on bing. All results were relevant, unlike Google's. But none of them gave me the answer I was looking for.
Google is a tool, and like all tools, you need to learn how to use it. Two of your keywords are unnecessary, and I think that caused the problem. First, you don't need "leader." If you're looking for the top of something, you already get that information from the ordering of results. Second, "online" is redundant, since anything you look for will probably be "online." (And I suppose you don't need "in," but I don't think Google used it anyway.)
So then you're just left with "used cars." That result page looks reasonable to me.
You make my point better than I could! You just explained to me that my query, which every human understood perfectly, is stupidly understood as "used cars", so Google is a basic keyword algorithm and this still doesn't explain result #5.
Really, I'm smart enough to understand your explanation of how the tool works. The problem is that I'm in the 99th percentile in the field of "computer keyword understanding" over the general Internet population. So what does that say about the tool?
That there is probably room in the market for more search engines - perhaps specifically ones that specialise in translating only human queries and returning results rather than trying to be more general like Google is.
Like almost all professions that utilise tools in some way, sometimes you just need more specialised tools. A chef wouldn't be that great if he (or she) only had knives to cook with.
Honestly, I think you under-estimate the general population's ability to learn how to use a tool. I think being able to explain why the results are what they are takes another level of understanding. But I think - and, yes, this is an unsubstantiated claim based on personal bias and anecdote - that many people have developed an intuitive feel for how to craft a decent Google query. They may not be able to tell you why they say what they say, but they can still say it.
Personally, I think Google is a good tool. I feel my mental model of it is relatively simple, and from that, I can mostly get what I want. This is also a good time to plug a great article on tools: http://unqualified-reservations.blogspot.com/2009/07/wolfram...
edit: My explanation was not that Google understand your search to be "used cars." My explanation was based on the assumption that you were looking for a place, online, to buy used cars from. The keywords "online" and "leader," muddied that. The search for "used cars" yields many places to buy used cars from, and the top entries are probably the "leaders."
Jason Calacanis is a guy who runs a sweatshop building a website you've probably never been to and never will because you're a smart person who doesn't fall for spam.
He also has made a career out of being a professional asshole and making fun of "lifestyle businesses" that make products that people like and pay for. He thinks that if you're not a workaholic and want to enjoy your short time here on Earth then you shouldn't be working with startups.
We're hackers not journalists. If he feels Calacanis is scum he should be free to say so and not be held to some arbitrary standard of wiki-objectivity. It's certainly a lot more truthy than most of the horseshit that Calacanis drops around here whenever the subject of Mahalo comes up.
I think it's reprehensible that parent has been downvoted. Ad hominem is ad hominem, no matter how deserving we think the target might be. Great-grandparent poster asked a question, ostensibly seeking an informative answer. What they got was grand-parent's uninformative tirade. While it certainly represents how people feel about Calcanis, it does not in any way answer great-grandparent's question, it is not informative, and it undermines the norm of civil discourse that we should expect on HN.
Ad hominem is not a synonym for insult. It is used exclusively for when someone attacks the speaker, and not the speaker's arguments. In this case, there were no arguments. The question was "Who is X?" Someone answered who X is. That answer was harsh, but also consistent with everything that I have seen.
I agree its civility is borderline. But it's not an ad hominem attack because no argument was being made.
What's amazing, to me, is that we're still reading about it several months into the story. Mahalo is a known black hat spam site, with despicable tactics, and Google continues to apologize on Calacanis' behalf and send traffic to Mahalo.
I think that's actually the interesting part of the story: How Calacanis has snowed so many smart and decent people into doing his bidding.
I'm only speculating (which is dubious at best) but there are 2 probable reasons why Calacanis continues to get away with it.
1) Sequoia funded both Mahalo and Google, so there's probably some leeway granted to Mahalo in that respect
or (my personal favourite)
2) People can get away with a lot more if they're in the public eye and are loud about it. Think OJ Simpson... Calacanis is definitely one that shouts a lot louder than most to get his point across.
He's allegedly building a business on scraping sites and re-hosting their content to exploit long-tail search. SEO is in general germane to Hacker News. What Google does (or does not do) to deal with Mahalo will be relevant to many YC-style startups.
On one of my sites, I used AdSense channels to learn more about the CTR from those arriving from Google vs regular users of the site. The CTR from incoming Google traffic was 10x higher, so I show ads to that traffic but not to the regulars - keeps them happier.
In Jason's most recent subscriber email he absolutely THROWS Mark Zuckerberg under the bus for lacking integrity and riding on the edge to grow his business.
If there’s ever been a case of the pot calling the kettle black . . .
Seeing as you operate a search engine, what's your take on this whole thing? Do you feel this all truly lies in a gray area, or that there is a clear-cut way things ought to be in this situation?
As an aside, your comment about blocking sites manually for your users just crystallized the value of DuckDuckGo (and by extension other smaller search engines) to me. There is definite value in a search engine that has some opinions about what kind of results are considered quality results. There's probably a name for this; but "curated search engine" or "opinionated search engine" don't seem right.
It's kind of like the difference between a micro-brew and Google's Budweiser.
It's all about perceived user quality. Google has guidelines, but if you read closely they qualify them with we can do whatever we want. They keep making the call that keeping Mahalo in is better for whatever reason, perhaps because they think it will bad PR (censorship?), or they'll get a lot of complaints (where is mahalo.com?) or maybe just because they don't see data that it is bad for users now (site metrics).
I think what angers people legitimately is two fold. First, Mahalo seems to be doing stuff that gets other people banned. Second, they're seemingly being really sketchy about it.
Personally, I think they are on the whole useless pages and am happy to block them. There are certainly a subset of useful pages, and I'd consider doing a tighter integration where I could promote just those pages in our Zero-click Info boxes. Same goes for something like Yahoo! Answers. Most of their pages are useless, but they have a subset that are really useful.
To your latter point, I couldn't agree more. I'm much more aggressive at de-listing what I deem to be "useless" sites. Google can't do this because they'd get too much flack for it, e.g. censorship.
I think this whole thing would have blown over if not for Jason's incredible ego and his "I can do whatever I want" response to the whole issue. Pissing off the geeks = bad idea.
Malhalo is starting to remind me of that (was it a Canadian) company who was making all that money off of pay-per-click arbitrage between to search/ad networks, and then one day, Google cut them off cold turkey. Who were they?
This is the thing. The huge spotlight that guys like Aaron Wall are pointing at Mahalo right now I'm sure is making Calacanis and Co sweating bullets. For big cases like this, it's often a matter of managing perceptions and Google may definitely take action if the story becomes big enough.
The thing is, there are quite a few sites that really skirt the edge of what is acceptable. When these sites/companies are somewhat larger with lots of employees and contractors, well the effect of 'blacklisting' can affect A LOT of people's lives in a very bad way.
For the independent blackhatter, some spam side-site they spin off isn't their livelihood and won't ruin them if banned. Mahalo (and ilk) isn't just a technical matter. Of course if this story gets bigger and Google doesn't do something about it, then this sets a bad precedent for the spam-community to try and game Google. Sure Google will figure ways to deal with it, but if you're the guy dealing with the spam... well, you'd just rather it not exist from the beginning.
I'm really interested to see how this all plays out. From Google's side, Mahalo's side, etc. It's riveting stuff actually.
Question: how does somebody end up at this spam page? Google seems to do a good job of filtering stuff like this. It wasn't in the first 10 or so pages of search results for "aaron wall"
Second: the idea is not to rank for every page. But the marginal cost of one more page is basically zero. So if even 1% of them rank on page one, the return on investment can be pretty high.
Calacanis is in a tough position, to be honest. He can dial down the quality and automatically get more revenue for no extra money. But he can't reduce the capital expenditure he made to create this system in the first place.
It's sort of like being a janitor, but having a great connection for huge quantities of black tar heroin. You could be a really stand-up guy, but after a while, all the easy money gets tempting...
Sorry I did not answer your question earlier. I used that particular page in the post because it was an example of how these autogenerated pages come into being in the first place, from a post I did back when I first called Jason on the practice. It was the actual page that I had seen the noindex tag on after Jason added it. You can get to any of these autogenerated pages by running a query on Mahalo itself.
It was never an issue of how many of the empty pages were ranking per se (although quite a few do, just not that one), but rather of the sheer numbers of them that were collecting enough tiny quantities of of PageRank each, that was then (and is still, as a matter of fact) adding up to massive amounts of self-sustaining ranking power:
I guess as long as you bring in a lot of money for Google they will look the other way. I'm just wondering if their anti spam team is aware of articles like this and why they don't respond? This is really a clear case of spam and its going on for ages..
If Mahalo is alleged to be a scraper site then just let Google handle it.
If you think about, if Google delists the site then thats probably their business model gone because Google is like 60% of search Market in the US and higher else where so if Google delists them and take away their Adsense then they are in trouble.
So let them taunt Google all they like and then see what happens when the Google Dragon wakes up because it won't be pretty.
Aaron is basically acting as their Ombudsman right now. Google will probably lose some serious revenue from that, because they get a big cut of what Mahalo generates. If Google lost that ad revenue for some other reason, the person responsible might well be fired for it.
When Aaron Wall hounds them over it, they'll be more likely to make the right decision.
Google does something like $18Bn per year in revenue and of that about $750MM-$1Bn is from ads on non-Google-run sites.
So, I think they care about Mahalo as a source of revenue about >< that much. They're probably more worried about possible anti-trust actions if they just drop Mahalo -- I bet that would shave more off their market cap than losing Mahalo as a customer.
Goog's 60% market share is understated, too. Yahoo and Bing often point search results back to their own networks (Yahoo & MSN respectively). So third parties like Mahalo probably see Google driving much more than 60% of their search traffic on average.
For me Google usually brings 88-92% of SE traffic, that's the same for most people. Google is understating their market share, because essentially they have a monopoly, and if that was widely known, they'd be regulated in no time.
Froo is right, there is only one reason why Jason currently gets away with spamming google and that is becuase both google and Mahalo are Sequoia funded. Someone senior at Sequoia has told someone senior at google that mahalo will clean their act up so please dont ban them yet.
Can I ask why everyone cares so much about Mahalo? Up to a week ago I didn't even know "human powered search" existed and the consensus seems to be it's an epic fail.
The first comment they probably voted down because you expressed an opinion that had no relation to the way things work... Matt Cutts can't add noindex to a page, only the people who own the websites can do that. The second comment was probably voted down because you whined about the first one being voted down. And I am guessing that they probably voted this comment down because you called them morons. Just a guess though.
The only thing the blackhats haven't taught me is how to get into the Google index and stay there - how are you doing it, Calacanis?