We have already written about how to compile a semantic kernel on your own. But before you categorize your queries, you should do a good job of cleaning them up. How do you remove eight levels of junk and leave pure silver? You will need an account in Key Collector and 12 minutes to read this post.
1. Cleaning the semantic core by marker words
Open Key Collector and use the filter to sift out all the inappropriate words. For example, for the category “silver rings” the main marker words would be “silver”, “rings”, as well as their word forms. Enter only part of the word to cover all word forms.
First of all, let’s select all the queries without “col-” in the Key Collector.
To do this, go to the tab with the choice of filtering condition.
And select the appropriate conditions (the phrase does not contain “col-“)
Mark all the filtered phrases and send them to the “basket”.
Next, using the same algorithm, filter requests for the word “silver-“.
To cover more phrases with the same meaning, you can create nested filters in Key Collector.
What is it for? For example, take the queries “pendants” and “pendants. Both variants will show identical results in the search results.
In this example, we searched for information queries containing the words “pendant” and “pendant.
All filters created by the specified conditions can be saved and used in other projects.
2. Removing repetitive words
Phrases with repetitions are often garbage, so it makes sense to remove them already at the first stages of cleaning the semantics. To do this, select the advanced filter and set the rule: “Phrase” – “Contains repetitive words”
3. Delete Latin letters, special characters, and number queries
Latin letters and special characters can be removed with the advanced filtering and regular expressions.
Using the advanced filter you can select several parameters at once.
The filter by the condition “contains other symbols” will select phrases with the Ukrainian characters “i”, “ї”. Do not forget to apply the OR/OR rule to all conditions.
Another method is to learn regular expressions and clear the semantic core with them.
The regular expression \d+ helps to get rid of numbers.
For example, in the case of the semantic core on silver rings, I left all queries containing the value of the metal and weight of the product, but I removed the year of manufacture.
The regular expression [a-z]+ is needed to filter letters of the Latin alphabet.
Letters of the Latin alphabet can be in the names of brands, collections or other elements of product cards. Before deleting such queries I advise to look through them carefully.
Filtering with regular expressions can be performed both by means of a quick filter (as in the example above) and by means of an advanced filter.
4. Cleaning with stop words
Go to the “Stop Words” tab
Add the words we don’t need. I usually divide all stop words into several groups:
- cities (which do not meet the marketing objectives);
- everything that refers to free ways to get a product: free, inexpensive, cheap, expensive, on/under order (not for all sites), and so on.
- subjective concepts: most, best, beautiful, unusual, groovy, original.
- names of sites with ads: “prom ua”, “alkh”, “flowerbed”, “beagle ua”.
- visualization: images, photos, videos, download, watch, drawings, instructions, diagrams.
- very often there are requests with the prefix “with their own hands”, they are also added to the stop-words.
The list of groups may vary depending on the subject of the site, but the above examples work in almost all cases.This is what cleaning with the list of stop words in Key Collector looks like.
Important: Information queries with the prefixes “how,” “where,” and “what” should not be deleted. It is better to move them into a separate folder and use them in the future to develop a content plan.
You can also add all unnecessary words directly from the full list of queries. In this case, create a separate group – specifically for such stop words.
5. Cleaning the kernel with the word group analysis function
In KeyCollector, go to the tab “Data” – “Group analysis”.The groups marked in the table are automatically marked in the main list of queries. After all the unsuitable words have been marked, close the table and delete all unnecessary queries.
6. Looking for and removing implicit duplicates
To use this method, you must first gather information about the frequency of queries. After that go to the tab “Data” – “Analysis of implicit duplicates.
Click the “Smart Mark” button.
The program will automatically mark all implicit duplicates, the frequency of which is less in the specified search engine.
7. Manual search by query group
Finally, you can manually mark all unnecessary words in the semantic core: slang, misspelled words, and so on. The main body of irrelevant queries has already been cleaned up previously, so manual cleaning will not take much time.
8. Cleaning of queries by frequency
Use the advanced filter in KeyCollector to set query frequency parameters and mark all low-frequency phrases. This step is not always necessary.
Use an alternative keyword tool – Clustering from Serpstat. Its advantage is that you upload all your keyword phrases, set the settings and clustering itself groups them and distributes them to the pages of the site. Those words that do not fit into any cluster formed a separate list. Also, you can edit and clean up the clusters are ready.
To qualitatively clean the semantic core from garbage, you should follow eight steps in KeyCollector:
- Cleaning the semantic core by marker words.
- Removing repetitive words.
- Removal of Latin letters, special characters, queries with numbers.
- Cleaning with stop words.
- Cleaning the core with the word group analysis feature.
- Searching for and removing implicit doubles.
- Manual search by query group.
- Cleaning of queries by frequency.
- At each stage, it is advisable to review the words marked for deletion, as there is a risk of removing high-quality and relevant queries.
- Instead of deleting unnecessary queries, it is better to create a separate group and move them there. In the latest Key Collector updates there is a corresponding default group – “Trash”.
After a thorough cleaning of the semantic core, you can move on to the next step – clustering and grouping of queries.
I should note that there is always a risk of missing a couple of irrelevant queries during the cleaning of the kernel. They are very easy to identify and remove at the grouping stage, but about this – next time.