published:  11/09/21 (dd/mm/yy)
updated:    not yet

(sorry for my bad english)


Here are my logs today (at 10:42 UTC+2):

    user@fx160:/var/log/apache2$ sudo cat other_vhosts_access.log | grep l3m.in:443 | wc -l
    > 819
    user@fx160:/var/log/apache2$ sudo cat other_vhosts_access.log | grep l3m.in:443 | grep "bot" | wc -l
    > 319
    user@fx160:/var/log/apache2$ sudo cat other_vhosts_access.log | grep l3m.in:443 | egrep -v "bot" | wc -l
    > 500

~One third of all the visits I got on my website are from bots!


If I take another day (before misc.l3m.in/txt/github.txt made it on the frontpage oh hackernews[0]), the numbers are even worse:
[0]: https://news.ycombinator.com/item?id=28468977


    user@fx160:/var/log/apache2$ sudo zcat other_vhosts_access.log.3.gz | grep l3m.in:443 | wc -l
    > 651
    user@fx160:/var/log/apache2$ sudo zcat other_vhosts_access.log.3.gz | grep l3m.in:443 | grep "bot" | wc -l
    > 403
    user@fx160:/var/log/apache2$ sudo zcat other_vhosts_access.log.3.gz | grep l3m.in:443 | egrep -v "bot" | wc -l
    > 248


On all the visits on my wesite, ~two third are from bots! (thank you for the real humans visiting my website btw)

Here's a list of urls in bots headers and the number of requests they made through the day:
    https://webmaster.petalsearch.com/site/petalbot     (250!!!!)
    https://www.seokicks.de/robot.html                  (43)
    http://www.google.com/bot.html                      (42)
    http://ahrefs.com/robot/                            (22)
    http://www.bing.com/bingbot.htm                     (21)
    http://mj12bot.com/                                 (5)
    https://intelx.io                                   (5)
    http://www.apple.com/go/applebot                    (4)
    http://www.semrush.com/bot.html                     (3)
    nbertaupete95(at)gmail.com                          (2)
         (2 requests were made on the page /robots.txt on the same day at ~3 hours interval, I found it on
          http://stopbadbots.com/bots-table/page/6/?letter=N)
    http://yandex.com/bots                              (2)
    http://sur.ly/bot.html                              (2)
    http://go.mail.ru/help/robots                       (2)

They continuously index my websites, making queries and queries to my selh-hosted server, but if I search things on their
search engines relative to the content present on my websites, I struggle to find anything.
My websites are too small to be ranked on the first page of google, bing or yandex (even petalsearch), except when you copy/paste
the title of an article (try "Modifier les icônes des dossiers de raccourcis sous Firefox 82", I found it only on Google).

Why then spam them with queries?
I think some crawlers will propose (paid) SEO-related services, thus will make money through the content of my websites.

Since my blog is in french and mainly for tech-related things, it's like a "niche" audience (lots of tech frenchies search for
things in english), but it struggles to rank (even the pages related on my projects!).

For example, I made a tool called "Django Check SEO". It's a django/django-cms module that you just add to your website (no need for a
database or anything). You visit the related url, and it will give you a list of advices and problems. IMHO it's a cool tool,
it was missing from the django/django-cms ecosystem and it may help people better understand all this SEO-related crap.

I made this tool when working at Kapt, and if I search "Django Check SEO" on -lets pick one, google?-, it returns this list of
websites:

1) the (fr) article on the website of my company
2) the (en) github repo
3) my (en) article on dev.to
4) the (fr) page of the project on my website

Not bad.

And here are the results if I test from another location:
1) github repo
2) dev.to post
3) company website
4) djangopackages.org
5) snyk.io (wtf? an automatically-generated page?!)
6) forum.djangoproject.com
7) my website

My website comes after a page that display the health of a python package?

I followed every advice I could find to create a performant website, I got 98/100/100/100 on web.dev/measure, I wrote
"human content" (not those 30k words long pages that *not even answer* the questions people who visit the page have 
written), and my website isn't in the top 3 of the results on a fr query, and is 7th on an english query.

...

Well, it's not my goal to be first on every result relative to something I did, and appearing on the first page is good enough.
This rant is about the number of robots that crawls my websites everyday, but don't make anything useful of the data (at least
in public).


A few years ago I posted links on social networks and people visited my websites, but the pages were not really ranked on any
search engine.

Today I don't post anymore on social networks, search engines crawl my websites, and there's no more people viewing the content
of my sites. Because people won't find my websites using search engines if they don't already know what to search.

I don't know if you do the same thing, but when I visit an article, I will visit some other pages on the website all the time.
The root page indeed, but the projects page, the archive, some other articles. Sometimes when I do this, I visit multiple websites
just using hyperlinks, like we used to do 20 years ago (:P), without even searching "name of the website" on google
to find more pages.