Block detail

In your settings, you can choose the following kinds of block:

Hard block

Your site returns 403 Forbidden to IP addresses associated with a given bot or company.

Hard block obeys indexed directory exclusion (see below).

Generally, only IP addresses associated with the crawler bots are blocked. But, for Google and Amazon, all their addresses are blocked entirely, including Google Cloud, AWS and other services.

The "Others" include the following lists: ahrefsbot betteruptimebot bunnycdn cloudflare duckduckbot facebookbot freshpingbot imagekit imgix marginalia mojeekbot molliewebhook outageowl pingdombot rssapi stripewebhook telegrambot twitterbot uptimerobot webpagetestbot.

When hard block is active, soft block is also applied, just for case.

Soft block

Currently implemented as a standard robots.txt. Essentially, your site requests that specified robots ignore everything except of the indexed folder (see below).

Soft block allows your site to avoid a suspicion of what search companies call "deceptive hiding", if you care about their ranking.

Caution: a single instance of robots.txt is served to everywhere. The robots.txt which you upload to your files will never be used, because the generated version (controlled by settings) will take precedence.

Caution: AI companies, especially at their early stage, are known to ignore robots.txt. The battle for user data goes as far as many companies are trying to circumvent IP blocklists, not to mention robots.txt. Details: [1], [2].

Exclusion

A special folder indexed is created in the root of your file tree. This folder is allowed for crawling.

For example, you have Facebook blocked by "Others" radiobutton in your settings. You can still place a social preview for Facebook in the indexed folder, bypassing the block.