
Introduction
Definition of N-Gram Analysis
N-Gram analysis is a method in natural language processing (NLP) and computational linguistics used to examine text and data by analyzing sequences of N consecutive elements or tokens (e.g., words or letters). It helps to identify patterns, frequencies, and relationships between elements in larger text corpora.
Importance of Optimizing Search Terms in Google Ads
In the world of digital marketing, Google Ads has established itself as an effective advertising tool that helps businesses reach their target audiences and generate clicks and conversions. Optimizing search terms is crucial for the success of a Google Ads campaign as it increases the relevance of ads to the target audience, leading to improved performance. The right keywords contribute to your ads being displayed according to users’ search queries, thereby capturing their attention.
Objective of the Blog Post
The purpose of this blog post is to provide you with an understanding of how N-Gram analysis can be used to optimize search terms for Google Ads. We will explain the basics of N-Gram analysis, highlight its benefits for keyword optimization, and provide a step-by-step guide for conducting an N-Gram analysis to improve your Google Ads campaigns.
Introduction to N-Gram Analysis
What is an N-Gram?
An N-Gram is a sequence of N adjacent tokens or elements extracted from a larger text or sentence. In natural language processing (NLP), these tokens are typically words or letters. For example, a bigram (N=2) consists of two consecutive words or letters, while a trigram (N=3) consists of three consecutive words or letters. Analyzing such sequences can help identify patterns and frequencies within textual data.
Application Areas of N-Gram Analysis
N-Gram analysis has a variety of applications in different disciplines. In computational linguistics and NLP, it is used to create language models for predicting the probability of word sequences. In machine learning, N-Gram analysis can be used for text classification, sentiment analysis, or automatic text generation.
In the context of Google Ads, N-Gram analysis plays a crucial role in optimizing search terms and understanding user behavior in relation to searching for products and services.
Benefits of N-Gram Analysis for Search Term Optimization
NGram analysis offers several advantages for optimizing your search terms in Google Ads:
1. Identify patterns and trends: By uncovering common word combinations in search queries, you can identify patterns and trends relevant to your campaign goals.
2. Improved ad relevance: By using the identified patterns to adapt your ad copy, keywords, and landing pages, you can increase the relevance of your ads and thus potentially achieve better results in terms of CTR, CPC, and conversions.
3. Effective data analysis: Instead of focusing on individual keywords, NGram analytics allows you to gain broader insight into your audience’s behavior and make more informed decisions about your campaign strategy.
Overall, NGram analysis offers a more efficient approach to keyword optimization in Google Ads and can help improve your campaign results.
Steps for Conducting an N-Gram Analysis for Google Ads
Data Collection: Gathering search queries and clicks
For an NGram analysis, sufficient search query data should be available. For this, the time period and the selected campaign should ideally contain several thousand records. The data can then be transferred either using this script or the Google Ads transfer for BigQuery . Both methods have already been described in detail.
Data processing: Decomposition of search queries into NGrams
To decompose the search queries into grams, Google BigQuery with the ML.NGRAMS function is the method of choice.
This BigQuery code executes a query on the table `MYproject.gads_custom.search_term_view`. The main goals of this query are: 1. Selecting specific columns from the table 2. Creation of N-grams (1-gram, 2-gram, 3-gram, 4-gram and 5-gram) from the search term 3. Applying a date range for the query Here is an explanation of each part of the code: - `SELECT`: Selects the specified columns and calculated fields from the table. - `segments_date`: A column in the table that represents the date. - `searchTermView_search_term search_term`: Selects the column `searchTermView_search_term` and renames it to `search_term`. - `ML.NGRAMS(..)`: A function to create n-grams from a text sequence. - `REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+')`: Converts the search term to lowercase and extracts all strings that contain letters, numbers or certain special characters (e.g. +, -, ä, ü, ö, ß). - `[1,1]`, `[2,2]`, `[3,3]`, `[4,4]`, `[5,5]`: Specifies that n-grams with a length of 1 to 5 should be created. - The columns `metrics_impressions`, `metrics_clicks`, `metrics_cost_micros`, `metrics_conversions` and `metrics_conversions_value` are selected directly from the table. - `FROM`: Specifies the table from which the data should be queried: `ap-MYproject.gads_custom.search_term_view`. - `WHERE`: Filters the rows in the table based on the specified conditions. - `segments_date BETWEEN PARSE_DATE("%Y%m%d","20230301") AND PARSE_DATE("%Y%m%d","20230411")`: Filters the rows with a date range between March 1, 2023 and April 11, 2023. - `AND searchTermView_search_term IS NOT NULL`: Additionally filters all rows where the search term is not NULL.
This BigQuery code executes a query on the table `MYproject.gads_custom.search_term_view`. The main goals of this query are:
1. Selecting specific columns from the table
2. Creation of N-grams (1-gram, 2-gram, 3-gram, 4-gram and 5-gram) from the search term
3. Applying a date range for the query
Here is an explanation of each part of the code:
– `SELECT`: Selects the specified columns and calculated fields from the table.
– `segments_date`: A column in the table that represents the date.
– `searchTermView_search_term search_term`: Selects the column `searchTermView_search_term` and renames it to `search_term`.
– `ML.NGRAMS(..)`: A function to create n-grams from a text sequence.
– `REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), ‚[a-z0-9\+\-äüöß]+’)`: Converts the search term to lowercase and extracts all strings that contain letters, numbers or certain special characters (e.g. +, -, ä, ü, ö, ß).
– `[1,1]`, `[2,2]`, `[3,3]`, `[4,4]`, `[5,5]`: Specifies that n-grams with a length of 1 to 5 should be created.
– The columns `metrics_impressions`, `metrics_clicks`, `metrics_cost_micros`, `metrics_conversions` and `metrics_conversions_value` are selected directly from the table.
– `FROM`: Specifies the table from which the data should be queried: `MYproject.gads_custom.search_term_view`.
– `WHERE`: Filters the rows in the table based on the specified conditions.
– `segments_date BETWEEN PARSE_DATE(“%Y%m%d”,“20230301″) AND PARSE_DATE(“%Y%m%d”,“20230411″)`: Filters the rows with a date range between March 1, 2023 and April 11, 2023.
– `AND searchTermView_search_term IS NOT NULL`: Additionally filters all rows where the search term is not NULL.
Further Modification
For further processing, we now want to type the created grams so that we can choose between “1_gram” or “2_gram” in the data. We also want to see some example search queries from which the respective gram was created. We also want to summarize the KPIs so that we can estimate whether the gram is having a positive or negative impact on our campaigns. We do this with the following query:
SELECT gram, "1_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_1) gram WHERE gram IS NOT NULL GROUP BY 1,2
The complete BigQuery query
WITH ngrams as ( SELECT segments_date, searchTermView_search_term search_term, ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+'),[1,1]) gram_1, ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+'),[2,2]) gram_2, ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+'),[3,3]) gram_3, ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+'),[4,4]) gram_4, ML.NGRAMS(REGEXP_EXTRACT_ALL(LOWER(searchTermView_search_term), '[a-z0-9\\+\\-äüöß]+'),[5,5]) gram_5, metrics_impressions impressions, metrics_clicks clicks, metrics_cost_micros cost, metrics_conversions conversions, metrics_conversions_value conversionValue FROM `myproject.gads_custom.search_term_view` #WHERE segments_date BETWEEN PARSE_DATE("%Y%m%d",@DS_START_DATE) AND PARSE_DATE("%Y%m%d",@DS_END_DATE) WHERE segments_date BETWEEN PARSE_DATE("%Y%m%d","20230301") AND PARSE_DATE("%Y%m%d","20230411") AND searchTermView_search_term IS NOT NULL ), gramOne as ( SELECT gram, "1_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_1) gram WHERE gram IS NOT NULL GROUP BY 1,2 ), gramTwo as ( SELECT gram, "2_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_2) gram WHERE gram IS NOT NULL GROUP BY 1,2 ), gramThree as ( SELECT gram, "3_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_3) gram WHERE gram IS NOT NULL GROUP BY 1,2 ), gramFour as ( SELECT gram, "4_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_4) gram WHERE gram IS NOT NULL GROUP BY 1,2 ), gramFive as ( SELECT gram, "5_gram" type, ARRAY_TO_STRING(ARRAY_AGG(DISTINCT search_term IGNORE NULLS LIMIT 5 ),", ") top_search_terms, SUM(impressions) impressions, SUM(clicks) clicks, SUM(cost) cost, SUM(conversions) conversions, SUM(conversionValue) conversionValue FROM ngrams, UNNEST(gram_5) gram WHERE gram IS NOT NULL GROUP BY 1,2 ) SELECT * FROM gramOne UNION ALL SELECT * FROM gramTwo UNION ALL SELECT * FROM gramThree UNION ALL SELECT * FROM gramFour UNION ALL SELECT * FROM gramFive
The query can be immediately adopted in Looker and should work for testing purposes. For better performance and lower query costs, I recommend setting up the query as a daily run and writing a table with the ngrams.