Internet search engines
Tips, tricks, confidentiality
August 2, 2021 — April 26, 2024
Finding things on the internet! At one point this felt like a solved problem, but it seems to have gotten unsolved.
Famously, Google seems not to be good at search any longer. Speculations about why include losing the SEO battle, or that human-friendly content is being squeezed aside in general, that Google is spending down its credibility in order to bring in advertising revenue, or some other more complicated mechanisms and incentives are just making things terrible or boring.
For some quantifiable data on this theme, see webis-de/ecir24-seo-spam-in-search-engines (Bevendorff et al. 2024).
Regardless of the details or reasons, it does seem to be true for me that search results are bad right now.
In addition, I am uncomfortable with the surveillance and tracking involved in search engines. Insofar as they are the way I access the world, they can potentially know too much about me.
I am interested in solving these problems.
1 Better commercial search providers
I don’t want large search businesses to know what I am searching for.
Here are some links to search engines which may reduce the degree of user surveillance, or at least, diffuse the surveillance across a few different players, or provide added value over the classic searches such as Google and Bing.
Many of these make strong claims to protect user privacy, although few offer substantive guarantees in excess of inspecting tracking headers. Some of them repackage other searches; some run their own indices. Most of them have very unclear business models.
1.1 Kagi
An exception to the opaque-business-model rule is Kagi. Their value proposition is, they claim, to be credibly user-centric:
Kagi has no ads and is fully supported only by its users. We worked very hard to provide high quality, fast and tracking-free results at a minimum cost to ensure sustainability of our operation.
By choosing a paid Kagi plan, you are also helping accelerate our mission of humanising the web.
The free plan is pretty good, and they will happily sell you extra features/more searches:
No ads
Ability to block/boost domains
Bangs allow you to quickly jump to all popular sites on the web.
zero telemetry, zero tracking
See how fast is a website or how many ads/trackers it has before clicking the result.
They have been criticised for being chaos pants. These criticisms to me seem reasonable but not fatal.
Obviously if I become a subscriber, they can in principle track me, so the privacy angle hinges upon some trust.
1.2 Marginalia
File under quirky/quixotic/small web, Marginalia Search:
This is an independent DIY search engine that focuses on non-commercial content, and attempts to show you sites you perhaps weren’t aware of in favour of the sort of sites you probably already knew existed.
The software for this search engine is all custom-built, and all crawling and indexing is done in-house. The project is open source. Feel free to poke about in the source code or contribute to the development!
The search engine is currently serving about 107 queries/minute.
1.3 Startpage
Startpage claims to repackage Google search results AFAIK anonymously, although I cannot see much information about why I should believe them on this. Dutch company. To use them as a search bar in Firefox I needed to add a browser extension, for some tedious reason.
1.4 DuckDuckGo
Perennial favourite, duckduckgo is a search engine run by strident privacy advocates which is laudable I s’pose. The search is… OK. Usually not as good as Google. Every now and again it is serendipitously wonderful, but not reliably.
1.5 Brave
Brave Search recently launched, backed by the creators of the Brave browser. TBC.
1.6 Mojeek
Mojeek/Mojeek Focus (Bookmark) Search Engine
Mojeek was created to provide a globally competitive and genuine alternative search engine based in the UK, and from the outset one that didn’t track its users nor simply retrieve its results from another engine (i.e. to provide real alternative results).
Mojeek’s technology has been developed entirely from scratch by Marc Smith, mostly using the C programming language, and uses no pre-existing search or web crawler technology. All technology and IP is fully owned by Mojeek Limited.
1.7 Qwant
1.8 Runnaroo
Similar to Qwant? See runaroo. Promises to aggregate many other search engines and review sites. Their business model is opaque.
1.9 Search encrypt
search encrypt claims additional privacy via encryption in the Perfect Forward Secrecy mode. Presumably this is supposed to prevent them from assembling a history of my searches?
1.10 Suppressing spam in search results
2 DIY search proxies
A.k.a. meta-searching. I suspect these imply maintenance overhead as the search companies attempt to circumvent this circumvention of their business model. Effectively, you would be participating in an arms race.
2.1 searx
The searx family is a network of metasearch engine portals with the aim of protecting the privacy of users. Searx does not share users’ IP addresses or search history with the search engines from which it gathers results. Tracking cookies served by the search engines are blocked etc. The flagship instance is searx.me There are many user-operated instances and it is open source. Advanced: run your own DIY search anonymiser!
2.2 mysearch
mysearch — Local search engine portal designed to anonymise search requests and display search results better. A public instance is available at search.jesuislibre.net. Dead AFAICT.
3 AI-augmented search
The new hotness
3.1 Perplexity
Perplexity is an alternative to traditional search engines, where you can directly pose your questions and receive concise, accurate answers backed up by a curated set of sources. It has a conversational interface, contextual awareness and personalisation to learn your interests and preferences over time.
Perplexity’s mission is to make searching for information online feel like you have a knowledgeable assistant guiding you, it is a powerful productivity and knowledge tool that can help you save time and energy with mundane tasks for a multitude of use cases.
Probably the front-runner in the race to get use out of LLMs in information discovery.
3.2 You
- You.com is an AI-heavy search thing.
Also promises a private mode:
You.com gives you the option to choose between a customised search experience through personal mode or an entirely private one through our private mode. Our private mode offers the most private search experience of any search engine. In private mode, You.com never stores your queries, preferences, or locations. That also means that localised queries (such as “best restaurants near me”) won’t work. In private mode, we only save whether the service is used at all, in order to prevent attacks and misuse of our servers.
3.3 Free/ FOSS -ish:
- nilsherzig/LLocalSearch: LLocalSearch is a completely locally running search aggregator using LLM Agents. The user can ask a question and the system will use a chain of LLMs to find the answer. The user can see the progress of the agents and the final answer. No OpenAI or Google API keys are needed.
- nashsu/FreeAskInternet: FreeAskInternet is a completely free, PRIVATE and LOCALLY running search aggregator & answer generator using MULTI LLMs, without GPU needed. The user can ask a question and the system will make a multi engine search and combine the search result to LLM and generate the answer based on search results. It’s all FREE to use.
3.4 Others
4 Decentralised search
What does the decentralised web do?
5 Incoming
Vicki Boykis, How I search in 2024
We are now in a very weird liminal space in information retrieval for consumers, particularly those attuned to trends in search and working on the bleeding edge of LLMs.
[…]we have the fall of old companies. Broadcast-based centralised social media, which steadily served as a newsfeed and realtime search for a small, vocal minority, is basically dead, or on its last legs. Search, namely Google, is basically a useless pile of ads and SEO gamification at this point and a stopping point for Reddit results. Everyone has written about it and covered this extensively. […]
… on the heels of the large companies of the last 15 years declining, we have a new indie search engine scene emerging, hungry, armed with AI tooling, and ready to take back quality on the web.
She picks Kagi, Marginalia, and Perplexity.
-
Simple Search is an extension that highlights the “traditional” or “ten blue link” search results provided by the search engine, laying them over the info boxes and other content. Close the window to view the full results page. Compatible with Bing and Google search engines.
Gwern’s Internet Search Tips
6 Discovering a website’s search
If you want your cool hand-rolled search to magically appear as a search option, you are looking for OpenSearch.
Worked example: Add Google Scholar to your browser.
Detailed documentation: opensearch/mediawiki/Specifications/OpenSearch.