# CMP behavior with bots

#### What are bots?

👉 Bots are software applications that run automated tasks over the Internet. They are used to index internet content or to automatically gather information from websites.&#x20;

**Some bots work for legit purposes, whereas some collect data for malicious purposes, such as:**

* Content reselling
* Click generation
* Price undercutting
* Etc.

Like any client-based web solution, Didomi is impacted by the bot traffic that generates “false” data. As a consequence, it can generate inaccurate CMP analytics.&#x20;

**Impact on CMP Analytics Indicators**

The most impacted metric is the **total notices** (with an increase in volume), which directly inflates the **notice bounce rate** and **addressability rate** performance indicators.

#### Provide analytics data without bots

👉 Bots impact Web data, so they generate false user data. They deteriorate the **addressability rate,** as well as the **pageview consent rate** by increasing the volume of **notice bounces** and the number of **pageviews without consent**.

{% hint style="danger" %}
In order not to deteriorate the compliance of your reports, we advise you not to exclude all UA (user agents). These UAs can be hiding bots, but also users who have given their consent.&#x20;
{% endhint %}

**In this case, excluding UAs represents both a compliance and legal risk.**

There are two types of bots:

**Declared Bots**: they can be detected thanks to their user agent (UA). They are excluded with the user agent filtering method. A few **examples** of bots:<br>

* Scraper bots: programmed to capture the content offline, such as names, prices, and product details on e-commerce websites.
* Crawler bots: used by large companies, such as Google, Yahoo etc, for content indexing purposes.
* Performance/audit bots: used by website performance tools to perform SEO audit or to evaluate page loading time performance. Didomi also uses a bot to evaluate the compliance of websites.

**Hiding Bots**: they use standard user agents and therefore can’t be identified with the UA filtering method.

A specialized solution/technology is required to detect then to exclude them from analytics data.

#### Example of user agents

**Declared Bots**

* Mozilla/5.0 (Macintosh; Intel Mac OS X 10\_15\_7) AppleWebKit/537.36 (KHTML, like Gecko) **TagInspector**/500.1 Chrome/90.0.4430.72 Safari/537.36 Edg/90.0.818.42
* Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) **HeadlessChrome**/85.0.4183.102 Safari/537.36
* Mozilla/5.0 (**iplabel**; Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36

**Elements** that are not part of a standard user agent.

**Hiding Bot User agents**

* Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36
* Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36 Edg/91.0.864.64

Even if the user agents above are used by bots, they are also used by regular visitors: user agents can’t be excluded.

#### **Be careful with your own bots**

If you are using tools to evaluate the performance of your website: page loading time, SEO audit, etc.&#x20;

They probably use bots to do it. As a consequence, they generate data **if** they are not identified by our technology. You can:

1. Check the bots we detect ([see the list below](#didomis-bot-list)).&#x20;
2. Verify with your solutions if the bots have a UA pattern.
3. Add the patterns in your bot management custom feature.

#### Behavior of the CMP with Bots

⚙️ By default, bots will "bypass" the consent notice. And we consider that the consent is already given for the bots and all the scripts will therefore be fired. So the banner is not deployed and doesn't collect any consent from the bots.

➡️ If you need to collect consent for bots in your Consent Notice, you can follow [our Bypass consent collection for bots](https://developers.didomi.io/cmp/web-sdk/consent-notice/bots).

You can add the JSON code to your consent notice in 2.customization; Advanced settings; Custom JSON.&#x20;

Remember that, in that case, the banner is deployed for bots, but they will probably not be able to make a consent choice: there is just a consent notice with the consent string by default. No consent is collected, the bot will probably not be able to browse the website.

**Custom bot management, bypass consent collection for bots**

👉 You can directly customize the bot management with custom json in your SDK implementation.&#x20;

The features offer the following capabilities:

* Defining the category of bots to block
* Adding user agent patterns (terms) for exclusion purposes

[Here](https://developers.didomi.io/cmp/web-sdk/consent-notice/bots) are all the details in the developer documentation.

#### Didomi’s bot list

👉 +90 bots are automatically detected at the CMP level and during data cleaning processing. Below the lists of the bot patterns (terms) used to identify the bot traffic. All the visitors with a user agent containing the following terms are identified as bots.

**Crawler bots**

Googlebot, adsbot, feedfetcher, mediapartners, bingbot, bingpreview, slurp, linkedin, msnbot, teoma, alexabot, exabot, facebot,  facebook, twitter, yandex, baidu, duckduckbot, qwant, archive, applebot, addthis, slackbot, reddit, whatsapp, pinterest, moatbot, google-xrawler, NETVIGIE, PetalBot, PhantomJS, NativeAIBot, Cocolyzebot, SMTBot, EchoboxBot, Quora-Bot, BLP\_bbot, MAZBot, ScooperBot, BublupBot, Cincraw, HeadlessChrome, diffbot, Google Web Preview, Doximity-Diffbot, Rely Bot, pingbot, cXensebot, PingdomTMS, AhrefsBot, semrush, seenaptic, netvibes, taboolabot, SimplePie, APIs-Google, Google-Read-Aloud, googleweblight, DuplexWeb-Google, Google Favicon, Storebot-Google, TagInspector, Rigor, Bazaarvoice, KlarnaBot, pageburst, naver, iplabel, **plus generic terms like “robot”, “scraper”, “crawler”, “spider”, “crawling” and “oncrawl”.**

**Performance bots**

Chrome-Lighthouse, gtmetrix, speedcurve, DareBoost, PTST, StatusCake\_Pagespeed\_Indev.

#### Bot management diagram

<img src="/files/00gUr90rzlAXVzOOZVDS" alt="schema" width="323">

**(1)**  SDK is loaded

**(2)** Notice triggering rules verification:

* SDK scans the user agent to identify if it’s a bot or not.
* If a bot is detected, the behavior of the notice is defined by the notice config (trigger or not the notice).
* If the visitor is not labelled as a bot, the notice is triggered.

**(3)** CMP events (notice display) are triggered

**(4)** Data Processing (turn events into analytics)

**👉 All the events (data) collected from (identified) bots are excluded from the analytics, even if the notice has been displayed to the bot on purpose.**

**(5)** Analytics data is displayed in the dashboards

#### Bot protection tools

![schema\_1](/files/KSnOowxGaKtf4o2U0WwJ)

Some solutions are specialized in **bot detection and protection.** They protect your website from bot traffic.&#x20;

As these solutions detect bots before they reach the website (see drawing), they can prevent the bot to load any page and therefore prevent for impacting the CMP analytics data.

For more information, see solutions such as Datadome, Human, Cloudflare, Netacea, etc.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.didomi.io/consent-management-platform-cmp/introduction/cmp-behavior-with-bots.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
