Try Google’s new search engine

Apparently Google has secretly been working on “the next generation of Google Search … an entire new infrastructure for the world’s largest search engine.” And you can be among the first to try it out, here:


It’s clearly faster. Are the results better? It seems they are different at least. My first impression is that they’re a little more commercially oriented — more Bing-like so to speak.

You can check out a discussion at webmaster world.


Image (detail) from Gibsonclaire’s photostream


Weekend update: personalized search

Graywolf — the search guy not the book publishing company — shows how to turn off personalized search by default in Google Chrome.

You can also get plugins for various browsers from Yoast.

Personalized search means that your search engine results will be skewed according to your browsing history.

Topicality in literary writing, and its implications for web search optimization

Many years ago, as a graduate student in comparative literature at the University of Wisconsin-Madison with a focus in part on the linguistic model in literary criticism, I turned my attention to beyond-the-sentence topicality. Scholars have parsed the sentence since ancient time, but they have paid less attention to the way sentences connect to each other.

One of the applications of this line of research is for machine translation. How does the translation engine determine, for example, whether the word lead in a text refers to the heavy metal or to the concept of leadership?

Writers reading Right Reading

links to tom's glossary of publishing termsThe image at right is a selection from my inlinks tag in Google Reader. It shows websites that have been linking to mine (these are all via Google Blog Search). This is less than a single day’s sample. As you can see, all of a sudden many people are posting links on their blogs to my glossary of book publishing terms.

I’m sure the number of links is not staggering compared to pages that go viral on places like Digg. Still, the glossary has gotten about 5,000 views over the past five days.

Driving traffic

heavy traffic

Today’s guest post at ForeWord Magazine is about how book publishers can increase traffic to their websites.


Google gone wild

What would cause Google to label the San Francisco Ballet website as porn? Please see the post on this subject at FriscoVista.

Google dangers and opportunities

google in 2084

A few months ago, scholars from the University of Graz in Austria released a 187-page pdf document, entitled Report on dangers and opportunities posed by large search engines, particularly Google. (The file is large, so I recommend downloading and opening it from your hard disk rather than trying to access it through a browser.) The authors’ goal is to examine the implications of “the monopolistic behaviour of Google.” They assert that “Google’s open aim is to know everything there is to know on Earth. It cannot be tolerated that a private company has that much power: it can extort, control, and dominate the world at will.”

While the writing is a bit clunky, the report is interesting not only for its content but as an expression of a concern that seems stronger in Europe than in the U.S. Following is the report’s overview of its contents:

1. To concentrate on Google as virtual monopoly, and Google’s reported support of Wikipedia. To find experimental evidence of this support or show that the reports are not more than rumours.
2. To address the copy-past syndrome with socio-cultural consequences associated with it.
3. To deal with plagiarism and IPR violations as two intertwined topics: how they affect various players (teachers and pupils in school; academia; corporations; governmental studies, etc.). To establish that not enough is done concerning these issues, partially due to just plain ignorance. We will propose some ways to alleviate the problem.
4. To discuss the usual tools to fight plagiarism and their shortcomings.
5. To propose ways to overcome most of above problems according to proposals by Maurer/Zaka. To examples [sic], but to make it clear that do this more seriously [sic] a pilot project is necessary beyond this particular study.
6. To briefly analyze various views of plagiarism as it is quite different in different fields (journalism, engineering, architecture, painting, …) and to present a concept that avoids plagiarism from the very beginning.
7. To point out the many other dangers of Google or Google-like undertakings: opportunistic ranking, analysis of data as window into commercial future.
8. To outline the need of new international laws.
9. To mention the feeble European attempts to fight Google, despite Google’s growing power.
10. To argue that there is no way to catch up with Google in a frontal attack.
11. To argue that fighting large search engines and plagiarism slice-by-slice by using dedicated servers combined by one hub could eventually decrease the importance of other global search engines.
12. To argue that global search engines are an area that cannot be left to the free market, but require some government control or at least non-profit institutions. We will mention other areas where similar if not as glaring phenomena are visible.
13. We will mention in passing the potential role of virtual worlds, such as the currently overhyped system “second life”.
14. To elaborate and try out a model for knowledge workers that does not require special search engines, with a description of a simple demonstrator.
15. To propose concrete actions and to describe an Austrian effort that could, with moderate support, minimize the role of Google for Austria.

Among the authors’ claims are that Google is “massively invading privacy,” that its SERPS (search results) are corrupted by its ad system (favoring advertizers) and that this is a necessary result of its for-profit structure, that the internet itself is becoming skewed to a slanted “Google-Wikipedia version of reality,” that by acquiring extensive privileged information Google is positioned to play stock markets with what amounts to insider information and massively affect world economic structures, that commercial considerations cause Google to condone plagiarism, and more.

While particular charges may be debated, the idea of so much of the world’s information being held by a single company should give anyone pause. Should search be government-controlled or regulated on a nonprofit basis as the authors’ suggest? Wouldn’t such information in the hands of government be at least as troubling as the present arrangement? Or is Google such a power now that it is already in effect a kind of virtual world government of sorts, operating at the bequest of its shareholders? Can massive knowledge be regulated without compounding its potential exploitation?


Image: Vision of Google in 2084, New York Times, 10 October 2005, reprinted in the report cited above.

Dutch Type

Publisher 010 Uitgeverij has made what I think is probably a smart decision to put their 2004 title Dutch Type by Jan Middendorp in Google Book Search. Of course we have seen public domain books in GBS for some time (by the way, it is absurd for Google to claim any proprietary rights at all on those titles just because they scanned them), but recently more publishers have been moving toward allowing their copyrighted materials into the program as a strategy for book marketing and promotion.

dutch type: cover

Polish Posters

polish posters

There’s a nice selection of (mostly) postwar Polish posters at a Grayspace Poster Gallery.

I’ve set the background to white in the selections above, using the “remove background image” and “page color to white” bookmarklets (I realized afterwards that I should also have set the text to black, and I found the zap colors bookmarklet that both sets the background to white and the text to black). I like clarity. Not everyone does. Without the bookmarklets applied the site looks like this:

grayspace gallery

New insights into the Google search algorithm

I like Matt McGee’s summary of the NYT article on Google search.


google image searchright reading

If you do an image search on Google and then append &imgtype=face to the end of the url, what do you get? You get only faces as results. The above images are all from this site. Interestingly, the tags under the images are not the filenames or the alt or title tags but text that is near the image on the source page. I’m guessing these images were tagged as “face” via Google’s enhanced image search and labeler. But it seems a step, if a rudimentary one, toward more visual image searching.

The image is clickable if you want a closer look.

Via Marketing Pilgrim.

Update: Hey, here’s a related post from Slashdot. Computers outperform humans at recognizing faces.

WorldCat Library Search

worldcat search boxI’ve been working on a bibliography for a book about Chinese jades. Many of the listings were incomplete, and I had to search a variety of sources to find the information I was looking for. I found that by searching through WorldCat I was able to locate a number of titles (including many books published in India or China) that I could not find elsewhere, and which had turned up no results with a standard web search engine such as Google.

WorldCat provides standard bibliographic information. It will show a list of libraries within a range of a specified zip codes. It will allow e-mail follow-ups to searches. Users can post reviews of titles. Unfortunately its only prominent link for purchasing titles is Amazon, although other options may appear in search results.

Here is what a search for Julio Cortazar’s Around the Day in Eighty Worlds turns up.

When I mentioned it to a librarian friend, he wrote the following:

Yes, it is a very useful tool produced by OCLC, which, however, is taking over the (library) world like Starbuck’s or McDonald’s. Though it is extremely helpful, and wonderful in concept, it is also insidious, and I am very wary of it. It is controlling and unifying all library cataloging and in some places outside of the US and replacing wonderfully enlightening and useful cataloging with bland, uniform, insufficient and extremely conformist cataloging. OCLC is forcing old school and creative catalogers and librarians out of jobs as it grows ever larger. Use it at your own risk & only if necessary. KILL it if you can.

So I guess this is another of those modern dilemmas that seem to be springing up more and more frequently. Good resource, soulless librarian killer, or both?


How to Get a Book Published

how to get a book published

Over at Google Blogoscoped they’ve been talking about Google results for the query “how to get a …” Seems the things people appear to want are a passport, a six pack, a girl (or guy), and a book published.

Well, I can’t help much with the six pack, the girl, or the passport. But “how to get a book published” yields 54,100,000 results. And guess who’s number 1?

how to publish search results

Swim, Swim, Swim!

swimming across the atlantic

Are you in shape for following step 12 in the instructions shown in the screen capture?

Via Google Blogoscoped. While at GB, check out Raymond Chandler’s 1953 mention of Google.

200+ U-Turns

Google Maps offers the following:


Click image for more info.

NoFollow revisited

Wikipedia announced recently that it is going back to adding the “nofollow” attribute to its outbound links in an attempt to keep people from gaming the system to leach linkjuice off the the site for personal gain.

According to Google, “when Google sees the attribute (rel=”nofollow”) on hyperlinks, those links won’t get any credit when we rank websites in our search results.” So the theory is that by denying linkjuice Wikipedia will stem article spam.

NoFollow has always been controversial, and the response to Wikipedia’s decision has been mixed. Rand Fishkin at SeoMoz (which has also instituted NoFollow) says Wikipedia has finally made the right decision. But he offers surprisingly little to defend that position.

Peter Da Vanzo at blog.v7n.com says the decision has scant significance:

Here’s a question: why do people assume that if Wikipedia adds nofollow, then the links won’t count in search engine calculations? It wouldn’t take much for the search engines to make Wikipedia a special case, and ignore the nofollow tag, if that isn’t the case already.

And another: How do people know that Wikipedia was passing any (real) PageRank or authority before? There are many pages which aren’t using the nofollow tag that also aren’t passing any measurable PageRank and/or authority, probably due to some hand tweaking.

Barry Welford thinks search engines are running up against Heisenberg’s Uncertainty Principle:

In a sense, Wikipedia is correcting the fallacy in the whole Google PageRank approach. It’s like Heisenberg’s Uncertainty Principle. There are some things you can’t measure. If you try to measure them then they’re not the same. Once Google says inlinks will boost a web page’s relevancy, then of course everyone, often supercharged with dumb computer programs, generates as many inlinks as they can.

My opinion? I don’t like NoFollow. I think it amounts to trying to get a free ride by benefiting from links without paying the cost for them. In fact, I’ve added a plug-in that removes the default NoFollow from my blog comments. If anyone wants to comment, I can approve or deny the comment, so the onus is on me to decide whether the link should stand. If commenters have added something of value then I think they deserve any link benefit I can give back to them (my home page, btw, is currently PR6).

I also feel that NoFollow will have little if any effect on the value of Wikipedia contributions. Even with NoFollow, links still bring traffic, and since Wikipedia is likely to continue to rank high in the SERPs, scam sites will still benefit from Wikipedia links if they can get them. In fact, a lot of the spam links submitted to this blog already have the NoFollow tag embedded. By instituting NoFollow, Wikipedia probably hurts honest sites more than scammers — just the sites that took Wikipedia to the top of the SERPs by linking to them in the first place.

So put me in the camp of the sensible Philipp Lenssen who writes at Google Blogoscoped:

What happens as a consequence, in my opinion, is that Wikipedia gets valuable backlinks from all over the web, in huge quantity, and of huge importance — normal links, not “nofollow” links; this is what makes Wikipedia rank so well — but as of now, they’re not giving any of this back. The problem of Wikipedia link spam is real, but the solution to this spam problem may introduce an even bigger problem: Wikipedia has become a website that takes from the communities but doesn’t give back, skewing web etiquette as well as tools that work on this etiquette (like search engines, which analyze the web’s link structure). That’s why I find Wikipedia’s move very disappointing.

Perhaps the most interesting response to the news came from Andy Beal at Marketing Pilgrim. He is adding NoFollow to links to Wikipedia from his site.

UPDATE, 24 JAN. Andy Beard has made a Wikipedia NoFollow plug-in, and Aaron Pratt offers a good commentary.

Universal Google?

In D-Lib magazine David Bearman provides an abstract of the argument Jean-Noël Jeanneney (President of the Bibliothèque nationale de France) presents in his Google and the Myth of Universal Knowledge: A View from Europe (University of Chicago Press, October 2006). Jeanneney argues:

  1. Google’s selection skews “the world’s knowledge” toward English-language texts, especially those from the U.S. (For example, searches for Dante, Cervantes and Goethe find not the original texts but English translations.)
  2. Google snippets decontextualize texts, and the works presented so far are poor in visual quality.
  3. Google Books SERPS inappropriately rank results, perhaps with a bias toward results with commercial ramifications.
  4. There are dangers with the privitization of collective knowledge. Google has already shown a complicity with censorship in China.
  5. Google’s liberal interpretation of copyright laws may not fully respect the legal or moral rights of authors.

Bearman concludes his abstract by expressing his opinion that “Jean-Noël Jeanneney has done us all a service by reminding us to look under the hood and hold Google, and those providing content to it, accountable. In the two years since Google first announced its ambitions, I think the D-Lib community has largely given Google the benefit of the doubt; now that some results are visible and the implications are more clear, I think it’s time to publicly endorse open access to rights-cleared, high quality, scanned page images and reconsider the appropriate roles for academic and public institutions participating in commercial analogue heritage conversion efforts that don’t contribute to this end.”

this item first noted at if:book

In related news: a few months ago Google announced a new program, Google Purge, as part of “a far-reaching plan to destroy all the information it is unable to index.”

Speaking of SEO

Speaking of SEO, here is a list of the SEO-related sites that have feeds I subscribe to. (I’m just an amateur who got into this when my website got penalized.) Maybe I’ll actually use this someday. Am I missing any important ones?

Is SEO the new protection racket?

It’s beginning to seem that way. As soon as anyone says anyone negative about search engine optimizing, the SEO community (or, to be fair, one faction of it) jumps all over that person and tries to inflict punishment by driving down the offender’s pages in the SERPs.

First there was the unfortunate Kimberly Williams — a case more of scraping than SEO per se, but the response came from the SEOs when she tried to prevent her content from being scraped. Next came Ted Leonsis, whose mild comments about being his own SEO made him the object of an SEO contest with a $500 cash prize. Now Jason Calacanis is the latest to have offended the SEOs.

I find search engine optimization interesting. I subscribe to a number of SEO feeds, and several of the people in the industry are clever and creative. But some are starting to seem like bullies.

I think I might be rooting for Ted Leonsis to win the Ted Leonsis SEO contest.


Link: Wikipedia on protection rackets

Google Calls Google Alerts Spam

I found this in one of my gmail spam folders.

gmail spam

