The Web Is Missing an Essential Part of Infrastructure: An Open Web Index

 
 
 
A proposal for building an index of the Web that separates the infrastructure part of the search engine—the index—from the services part that will form the basis for myriad search engines and other services utilizing Web data on top of a public infrastructure open to everyone.
 

Communications of the ACM, April 2019, Vol. 62 No. 4, Page 24
Viewpoint : "The Web Is Missing an Essential Part of Infrastructure: An Open Web Index"
By Dirk Lewandowski

The web as it currently exists would not be possible without search engines. They are an integral part of the Web and can also be seen as a part of the Web's infrastructure. Google alone now serves over two trillion search queries per year. While there seems to be a multitude of search engines on the market, there are only a few relevant search engines in terms of them having their own index (the database of Web pages underlying a search engine). Other search engines pull results from one of these search engines (for example, Yahoo pulls results from Bing), and should therefore not be considered search engines in the true sense of the word. Globally, the major search engines with their own indexes are Google, Bing, Yandex, and Baidu. Other independent search engines may have their own indexes, but not to the extent that their size makes them competitive in the global search engine market.

While the search engine market in the U.S. is split between Google and Bing (and its partner Yahoo) with approximately two-thirds to one-third, respectively, in most European countries, Google accounts for more than 90% of the market share. As this situation has been stable over at least the last 10 years, there have been discussions about how much power Google has over what users get to see from the Web, as well as about anticompetitive business practices, most notably in the context of the European Commission's competitive investigation into the search giant.

Search Engine Bias?

From the users' point of view, search engines are reliable and trustworthy sources, providing fair and unbiased results. However, it has been found that search results simply should not be considered "neutral." Some scholars argue that an unbiased search engine is simply not possible, as there is no ideal result set against which a bias can be measured. Therefore, I argue that every search engine presents its own algorithmically generated view of the Web's content. Every such view can be different, and none of them are the definitive or correct one.

Problems that may arise from search engines' interpreting the world in certain ways include: reinforcing stereotypes, for example, toward women; influencing public opinion in the context of political elections (see, for example, Epstein and Robertson); and preferring dramatic interpretations of rather harmless health-related symptoms.

It seems, therefore, unreasonable to have only one (or a few) dominant search engines imposing their view on the Web's content, which is, on closer inspection, really only one of many possible views. Therefore, I argue for building an index of the Web that will form the basis for a multitude of search engines and other services that are based on Web data.

 
The main idea I presented in this Viewpoint is to foster building search engines and other services needing Web data on top of a public infrastructure that is open to everyone. A multitude of such services would foster plurality not only on the search engine market (with the result of having more than a few search engines to choose from) but even more importantly, a plurality with regard to the results users get to see when using search engines.
 

Read the article »

About the author: Dirk Lewandowski is Professor for Information Research and Information Retrieval at the Hamburg University of Applied Sciences in Hamburg, Germany.

Visit the Open Web Index.