How I wrote my search engine to quickly find personal information / Habr

Intro

It all started with the fact that it became difficult for me to find the necessary information. The more files and folders I had created, the more time it took to find the right one. I realized that searching for endless lists of files and folders, especially with a nesting condition, is not an option for large amount of data.

As for the search by file name, the number of characters specified in the name is limited and the search words must be in a strictly defined sequence. Moreover, if the system indexes other files that are not needed for file search (system files, project files), then the search produces a lot of "garbage".

Searching the contents of the file does not give the most relevant result. It may produce useless results with content containing keywords but not related to what is actually needed to be found.

Moreover, only text files can be searched by content.

Structures

The folder structure is represented as a tree. It’s not ideal, because each folder lcan only contain certain files, if copying and links are not taken into account.

It can also be imagined with a real life example, in order to find the green fresh apple variety “virgin”. In this imaginary example It is necessary to find the department with fruit, then the apple department, then look for the green apples, then the variety, and also fresh ones, not fresh ones and then finally find the right apple.

Everything is further complicated by the fact that I don’t remember if there are apples there at all, and if there are, whether they are stored in the fruit department.

And why not just ask the henchman about this (everyone has them, right?) - “Bring me a green fresh apple.”

How convenient it becomes!

In general, with all this I want to say that searching for the necessary information in folders is good if there are few folders and if you remember which folders exist, and not sort through everything.

But if we don’t know if apples exist at all, then we ask the henchman:

- Are there any apples?

- Yes, sir! Hundreds. Toy, red, rotten apples....

- I need a fresh apple.

- Understood! There is a red fresh apple "Sirota", a red fresh apple "Apricot", ....

- And what about a green fresh apple?

- There is! Green fresh apple "Pooh-tibiduh" and green fresh apple "Virgin".

- In that case bring me the green fresh apple “Virgin”.

- Yes, sir.

Back to apples. Have you noticed that in the first case, you need to look for apples, don’t understand where, and in the second, we set clarifying conditions for the request?

To find the desired result, using a tree structure (folders), you have to bypass all the nodes. And in the case of a graph (tags), you can get a result, at best, for passing through a single node.

The next example is more realistic. There is a folder with music and subfolders to select genres. But if at some point I want to listen to French music, regardless of the genre. This is where the whole problem of the tree structure of folders comes out. You can, of course, as advised on the forum, create a folder with text and drop the link, but again the folder ...

But what is the outcome if you find the file to set tags with the genre, language, and of course that this is music, a song.

In this case, it is possible to group, sort music in a much more flexible way. For example, by combining 3 tags: French, Russian, rock. In this case, it's possible to group and sort music in a much more flexible way, in this way you can do something that's not possible with standard windows tools.

Trying to find a solution

The first idea was tagging files and folders. That way you can search for information with a combination of tags, regardless of word order. The best apps for this in my opinion are “XYplorer” and “Tagging for windows”. The first one is a separate file manager with a tagging option. The second one is an extention to a standard file explorer. However, they allow you to search for files only on a PC and, of course, cannot have user friendly query like in a Google search engine, and no an algorithm that has select tags from the query and sort the information by priority. Later, both programs were removed from my PC, they often hung and crashed (maybe it’s because of my windows add ons, but that was my experience, it doesn’t mean the programs are necessarily bad).

_{Visual search}

In try to find and store information faster, I tried some more unusual programs. I was saving information in the social network “VKontakte” as comments under images. This increased the search speed and it’s possible to use on any device, however as you probably guessed, this didn’t go on for long. In the end I couldn’t understand if an image of rails meant travel or address or something else, so I decided to drop the idea.

Desired functionality

I thought it would be great to develop an application that would meet the following criteria:

Can be used on any device without the need for internet connection.

Search for personal information as quickly as possible.

The search should be natural like Google Search.

Ability to save all text information in a text file.

Choice of technologies

1. According to the first point of desire, it was decided to develop a web application, where any device that has a browser can access it. The data is stored in the browser's localstorage, but when the site is opened, it is immediately uploaded to a variable for better speed.

To synchronize data with another device, a browser, I took the mysql database from 000webhost for free, but then stopped using it due to volume limitations. Right now the only way to update user data on another device is to import and export the file. However, I do this very rarely, because. I mostly use my smartphone. As for offline mode - I used serviceworking.

It’s only necessary to visit the site once so all the site resources are loaded, then it’s possible to use it completely offline from a browser.

2. Quick search.

Since the search should be carried out like a Google search engine, then it is necessary to check each word from the request for an existing one from an already created block of information. An object with keys acts as such a block for me: a unique block name, action (show information, open a link...), content, tags.

So, according to the "tags" key, we will store an array of characters (words) for a specific block of information.

Let's take a block as an example.

Title: how to create a website.

Action: show information.

Content: take html, add js and decorate with css.

Tags: website development, web programming, layout.

An array of tags is formed from the texts received from the input fields for tags and titles. Each word is a tag, separated by a comma and a space.

There was the idea to make tags like phrases like on YouTube, but I decided to focus on wider results by keywords. From the block example above, the array of tags will be: ["how", "create", "site", "create", "site", "web", "programming", "layout"].

Now the most important thing is to decide how the search will take place. My first idea was to take each word from the search query and compare it with each word from the tag of each block. A very bad idea because searching will take a million years on a big data.

The next idea was to create an object in which each tag is a separate key, and the value is an array of indexes of blocks.

3. So, when entering a query, it checks if there is a word in the tag storage, if so, then the block is added to the array for display. Now you need to sort by priority. The higher the result in the search results, the more it matches the query. I implemented this using the number of keywords in the request, the more words from the request are contained in the array of block tags, the more the block has priority.

4. About saving data to a file in a few words.

You can save and import the file as json. Also, in my experience with using VKontakte as an image search engine it gave me the idea to add an image to each block if desired.

Results

As a result, I created what I’ve been using for over a year.

Both the web and PC version proved to be very helpful. I use it for work and personal life. The search speed I ended up with came in handy many times when I needed to find something very quickly.

Relatives with my other projects

I liked the application very much, that I wished to write a program for execution exe files by request of an user on a PC. Search, respectively, is similar to the Google search engine. The peculiarity is that you can drag and drop a file / files into the program and the algorithm automatically sets tags from the file name and folders in which it is used. But that’s a topic for another post if anyone’s interested.

Afterword

I will be glad to receive any comments, share your opinion about my idea. Is it complete nonsense or, as I think, there’s already other applications with the corresponding implementation.

Thank you!