Table of contents
Start for free
Andrew Milich / 10.11.2022Home / guides
What is encrypted search?
Searching over encrypted data is a unique challenge. What algorithms make it possible?Searching through data is one of the oldest problems in computer science. From basic problems, like checking whether one value exists in a long list of items, to mapping routes from offices to home addresses, search algorithms have led to enormous innovation in data structures, algorithms, and optimization.On the largest scale, algorithms may process search terms that require indexing and tracking the entire internet (such as Google Search), or indexing volumes of user data across millions of documents, emails, and messages.This blog will begin by outlining how traditional search algorithms work, including some basic algorithms for processing data. Then, we’ll give a brief overview of how search algorithms work on user data in search providers as well as common consumer products, such as Gmail and Microsoft Outlook. Finally, well explain how new and innovative algorithms allow for searching through data - even end-to-end encrypted data completely private to service providers.
How do traditional search algorithms work?
Search algorithms are the heart of every search engine. Without them, we would be lost in an ocean of unstructured data with no way to find the information we need. Search algorithms are designed to take a set of data and optimize to find the most relevant content for a given search query. All operate on massive datasets consisting of the world’s information, from news articles to Wikipedia pages.The most popular search engine algorithm is Google’s PageRank. PageRank was developed by Google co-founder Larry Page. It looks at the link structure of the web to determine which pages are most important. The more links a page has, the more important it is. There are now many other important factors in ranking, however, such as the freshness of content and the number of searches for a particular term. Google’s algorithm is constantly changing to give users increasingly relevant results.Microsoft’s Bing uses a different algorithm called the Bing Webmaster Guidelines. This algorithm looks at many factors, including the quality of a website’s content, its design, and how easy it is to use. Yahoo! uses a combination of human editors and algorithms to create its search results. The company’s search engine was once the most popular, but it has since fallen behind Google and Bing.Algorithms are also used to index content so it can be searched. When you search for a term on a search engine, the results are organized by how relevant they are to your query. The relevance is determined by algorithms that look at the content of the pages and how often the search term appears. The most important part of a search algorithm is the ranking function. This is the part of the algorithm that decides which results are the most relevant. The ranking function looks at a number of factors, including the number of times the search term appears on the page and the location of the term on the page.Some consumer products, such as Gmail and Outlook, form “boolean” indices of user data, where boolean true or false values are used to indicate the presence or absence of a particular term in a certain document. In this context, a document could be any piece of structured content, such as an email, a webpage, or an actual file.Search engines are constantly tweaking their algorithms to try to improve the quality of their results. They are also working on ways to index more content so that people can find what they’re looking for more easily. As more and more content is created, the algorithms that index and rank it will become more important. They will play a vital role in helping people find the information they need.Why is it hard to search encrypted data?
Searching encrypted data is challenging because the data is not in plaintext and instead in a completely unreadable format. This makes the algorithms described above impossible to apply effectively.Instead, the data is stored as ciphertext, which is unreadable without access to a symmetric or asymmetric encryption key, depending on the context and algorithm used. As a result, keyword search using algorithms similar to how Google, Yahoo, and Bing index content would be completely useless.This leaves two options: Indexing and searching when data is decrypted, such as on a user’s device, or using advanced algorithms for searchable encryption. Searchable symmetric encryption requires one participant to process and encrypt data in a way that allows for it to be searched over later. More advanced research in this area includes homomorphic encryption, which outlines cryptographic algorithms for performing computations, such as search, over encrypted data.Currently, indexing and searching through data while decrypted, such as on a client device, is the most practical and simplest. Below, we’ll walk through a few examples of products that leverage client-side search algorithms.What products leverage encrypted search?
New products have started to implement search algorithms that index data client side. This presents multiple tradeoffs: Generally, client-side search can be faster and more reliable, but it is also costly to sequence and index user data on a device instead of in the cloud. This price yields significantly greater privacy.Apple spotlight: Apple’s spotlight feature indexes all searchable data on a user’s computer, from files to applications. It sorts this data into an easily searchable format that can then be used to surface results of many different types.Skiff: Skiff is an end-to-end encrypted, privacy-first email and collaboration workspace. Because all data is stored end-to-end encrypted, and unknown to Skiff, it must be searched on users’ devices. This allows for easy-to-use, user-friendly search while maintaining complete data security.In our earlier blog, we walk through even greater technical detail on how Skiff’s search algorithms work. Check out some of our open-source search indexing on the Skiff GitHub page as well.Related articles
Eli MacKinnonPrivate search: Take a walk on the client sideClient side search, finally done right - with end-to-end encryption intact.
Andrew MilichWhat is encrypted email, and is it secure?Encrypted emails are a necessity for privacy and cybersecurity. Learn everything you need to know about using encrypted email providers and other good email security practices.
Jason GinsbergHow to migrate to a new email accountSwitching emails? Here's how to migrate to a new email account without losing any of your contacts, old inbox, or custom domain.
Andrew Milich10 tips for setting up an email accountSetting up a new email account can be a daunting proposition. Follow these tips for a painless migration and setup process.
Andrew MilichHow can you do cryptography in Javascript?Almost every web application likely touches some type of encryption - whether it’s AES encryption in SSL or application level security. How can you do cryptography in JavaScript?
Andrew MilichQuick guide: Get your custom domain set upHow can you add a custom domain to your Skiff account? Learn more in this guide.
Andrew MilichBlock trackers and remote content on Skiff MailStarting today, all Skiff Mail users on every plan can disable remote content loading by default in their inbox.
Arpeet KaleWhat is a tracking pixel?How do tracking pixels optimize marketing and emails while invading user privacy?