cURL + Cloudflare = Information leak - But why?

To be honest, I really don't know. But I'm pretty sure, it will be figured out.

TL;DR - Don't enable Cloudflare Crawler Hints, if you don't want all of your URLs to be exposed to search engines.

The story

I created a few unique URLs for my  specific friends only. I knew I didn't share the URLs outside the very limited tech nerd crowd. But when I later reviewed logs, I found something alarming. The URLs were public knowledge, even if I'm quite sure that nobody shared those at least intentionally. How did this happen? Why the information about private URLs is shared with Bing & Yandex search engines.

I started narrowing the case down. Quickly checking the server access logs and questioning my friends. Also I did notice that some of the leaks happened so quickly after the URL was created, that it was unlikely that my friends even had time to react to the URL shares. Ok? Maybe it's some URL preview or something similar leaked the private unique URLs?

After thinking potential leak sources, I excluding stuff like: PrivateBin, SimpleX, Matrix, Fish shell, Teams, and so on. I finally got down to two possible options cURL or Cloudflare. Strange, way strange. Both of these should be safe (?) and have a great reputation. So, who I dare to blame for this? Daniel? Eastdakota? I don't know. And I'm not even exactly sure why this is happening. But I think this case deserves a closer look.

Facts

CF = request via Cloudflare

OH = request to Origin Host, over Internet using HTTPS with valid TLS cert

Does not leak, local test with cURL:

OH - [07/May/2023:05:53:17 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/all-my-secrets-locally-with-curl HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"


Does leak, cURL + Cloudflare, first one with key parts colorized and bolded.

CF - [07/May/2023:05:54:07 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/all-my-secrets-curl-cloudflare HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"

CF - [07/May/2023:05:56:14 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/all-my-secrets-curl-cloudflare HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:05:56:15 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/all-my-secrets-curl-cloudflare HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

CF - [07/May/2023:05:56:34 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/all-my-secrets-curl-cloudflare HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

Rest is just more of the same.


Did leak, cURL (socks5-hostname), Cloudflare, Tor:

CF - [07/May/2023:06:25:25 +0000] "GET /unique-url-with-curl-tor-and-cf-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:25:55 +0000] "GET /unique-url-with-curl-tor-and-cf-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:25:59 +0000] "GET /unique-url-with-curl-tor-and-cf-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Did not leak, direct request to origin server over using cURL + Tor without Cloudflare:

OH - [07/May/2023:06:32:38 +0000] "GET /unique-url-with-curl-tor-2-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"


Did leak, secondary test over Tor with Cloudflare:

CF - [07/May/2023:06:34:20 +0000] "GET /unique-url-with-curl-tor-cf-2-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:34:29 +0000] "GET /unique-url-with-curl-tor-cf-2-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Did not leak, PowerShell, with and without Cloudflare:

CF - [07/May/2023:06:34:35 +0000] "GET /powershell-unique-LifCIoKpe0HJR5Ua4Dws/ HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.20348.1"

OH - [07/May/2023:06:34:55 +0000] "GET /powershell-unique-without-cf-LifCIoKpe0HJR5Ua4Dws/ HTTP/1.1" 404 146 "-" "-" "Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.20348.1"

CF - [07/May/2023:06:39:59 +0000] "GET /powershell-unique-with-cf-LifCIoKpe0HJR5Ua4Dws/ HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.20348.1"

CF - [07/May/2023:06:42:38 +0000] "GET /powershell-unique-with-cf-2-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.20348.1"


Some intermediate conclusion, getting verified:

CF - [07/May/2023:06:49:44 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/so-curl-leaks HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"

CF - [07/May/2023:06:50:01 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/but-powershell-doesnt HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT; Windows NT 10.0; en-US) WindowsPowerShell/5.1.20348.1"

CF - [07/May/2023:06:52:54 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/so-curl-leaks HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:53:06 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/so-curl-leaks HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

CF - [07/May/2023:06:53:24 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/bash-curl-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:53:28 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/so-curl-leaks HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:06:53:34 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/bash-curl-cf HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

CF - [07/May/2023:06:53:44 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/bash-curl-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"


Mullvad browser test:

CF - [07/May/2023:07:10:07 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/Mullvad-Browser-BB HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Firefox/102.0"


Tor-browser test:

CF - [07/May/2023:07:10:27 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/Tor-Browser-Test HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (Windows NT 10.0; rv:102.0) Gecko/20100101 Firefox/102.0"


Hacker News URL tests, w leaks and without leaks:

CF - [07/May/2023:07:21:11 +0000] "GET /hackernews/LifCIoKpe0HJR5Ua4Dws/curl-leaks-private-urls-to-search-engines-cf HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"

OH - [07/May/2023:07:21:29 +0000] "GET /hackernews/LifCIoKpe0HJR5Ua4Dws/curl-leaks-private-urls-to-search-engines-private HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"

CF - [07/May/2023:07:23:00 +0000] "GET /hackernews/LifCIoKpe0HJR5Ua4Dws/curl-leaks-private-urls-to-search-engines-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:23:27 +0000] "GET /hackernews/LifCIoKpe0HJR5Ua4Dws/curl-leaks-private-urls-to-search-engines-cf HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

CF - [07/May/2023:07:37:33 +0000] "GET /hackernews/LifCIoKpe0HJR5Ua4Dws/curl-leaks-private-urls-to-search-engines-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"


Archive.today test:

CF - [07/May/2023:07:31:36 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/archive.today-test HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36"

CF - [07/May/2023:07:34:40 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/archive.today-test HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:35:00 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/archive.today-test HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Curl on Windows doesn't leak:

CF - [07/May/2023:07:35:32 +0000] "GET /WS-Windows/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-Windows-cURL HTTP/2.0" 404 146 "-" "-" "curl/8.0.1"


Fedora leaks:

CF - [07/May/2023:07:37:34 +0000] "GET /A/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-fedora-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:37:54 +0000] "GET /A/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-fedora-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:38:28 +0000] "GET /A/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-fedora-cf HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Gentoo & Debian leaks:

CF - [07/May/2023:07:38:31 +0000] "GET /S/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-gentoo-cf HTTP/2.0" 404 146 "-" "-" "curl/7.88.1"

CF - [07/May/2023:07:40:11 +0000] "GET /D/LifCIoKpe0HJR5Ua4Dws/curl-latest-debian-stable HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:40:12 +0000] "GET /S/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-gentoo-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:40:42 +0000] "GET /S/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-gentoo-cf HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:07:41:05 +0000] "GET /S/LifCIoKpe0HJR5Ua4Dws/curl-leak-test-gentoo-cf HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Python urllib doesn't leak:

CF - [07/May/2023:08:51:30 +0000] "GET /Ubuntu/Python3/LifCIoKpe0HJR5Ua4Dws/urllib-doesnt-leak HTTP/2.0" 404 146 "-" "-" "Python 3.10.6/urllib/request"


Alma Linux doesn't leak:

CF - [07/May/2023:08:59:44 +0000] "GET /H/LifCIoKpe0HJR5Ua4Dws/almalinux-curl-leaks HTTP/2.0" 404 146 "-" "-" "curl/7.29.0"


Sample for Disobey.fi crowd:

CF - [07/May/2023:09:23:24 +0000] "GET /disobey.fi/LifCIoKpe0HJR5Ua4Dws/information-leak-is-true HTTP/2.0" 404 146 "-" "-" "curl/7.81.0"

CF - [07/May/2023:09:25:16 +0000] "GET /disobey.fi/LifCIoKpe0HJR5Ua4Dws/information-leak-is-true HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:09:25:20 +0000] "GET /disobey.fi/LifCIoKpe0HJR5Ua4Dws/information-leak-is-true HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"

CF - [07/May/2023:09:25:46 +0000] "GET /disobey.fi/LifCIoKpe0HJR5Ua4Dws/information-leak-is-true HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"


Finally interesting one. Original request wasn't even ever received by the origin server, because Cloudflare blocked it. Yet it still got leaked, cURL + Tor:

CF - [07/May/2023:10:19:24 +0000] "GET /tor-4-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:10:19:54 +0000] "GET /tor-4-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:10:19:58 +0000] "GET /tor-4-LifCIoKpe0HJR5Ua4Dws HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


The query string part is dropped,when the information is leaked. Head only request is enough to trigger the leak:

CF - [07/May/2023:12:22:49 +0000] "HEAD /LifCIoKpe0HJR5Ua4Dws/with-query-string?is-dropped HTTP/2.0" 404 0 "-" "-" "curl/7.81.0"

CF - [07/May/2023:12:25:41 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/with-query-string HTTP/2.0" 404 170 "-" "-" "Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/103.0.5060.134 Safari/537.36"

CF - [07/May/2023:12:25:57 +0000] "GET /LifCIoKpe0HJR5Ua4Dws/with-query-string HTTP/2.0" 404 146 "-" "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)"


Sure, I could have investigated more, but this isn't really my case, I just got quite annoyed by this info-leak. Someone can take this forward from this point on.

Conclusion

Everyone should draw their own conclusions. I'm sure this will raise some questions. Feel free to run your own tests, and contact me back, if you've got anything to share / add. I'm happy to update this post, as well as making follow up posts later. I'm also happy to confess, if you guys think I made something silly. All URLs are now using basic auth.

Thanks to all of my tech friends whom helped me by making unique test requests from different platforms and narrowing down the leak problem.

Debian, Gentoo, Fedora, Ubuntu cURL leaked, Alma Linux didn't, PowerShell didn't, Python 3 didn't, wget didn't. Archive.today is probably using cURL, because it leaks just like cURL does? cURL on Windows doesn't leak? This is very confusing combination.

Even if request original IPs are removed from this post, the Yandex and BingBot IPs are valid. Those are real requests from their bots, somehow they got the information about aout private cURLed URLs.

Most interestingly, Cloudflare nor cURL alone does seem to leak. Afaik, it's only when these are used together. Or maybe we just should conclude that cURL and or Cloudflare shouldn't ever be used for anything that could be considered confidential or non-public information?

Update #1

As expected, I received tight scrutiny feedback. Including a friendly message from Daniel and sort chat with Piru. It was suspected that my clipboard, terminal, VPN, Antivirus or whatever on my system is leaking the information. Therefore, I took extra steps to make clean requests. I restarted my test environment from clean Ubuntu 23.04 server setup image and dropped it to live shell. As expected, the URLs with cURL still leak, with wget it didn't happen. Yet the network packet capture reveals that data is flowing only to Cloudflare. Who knows what criteria Cloudflare is using to forward the information to search engines.

Update #2

After some discussion on Twitter, ActivityPub and Matrix, the issue was located:

Crawler Hints [Beta]

Crawler Hints provide high quality data to search engines and other crawlers when sites using Cloudflare change their content. This allows crawlers to precisely time crawling, avoid wasteful crawls, and generally reduce resource consumption on origins and other Internet infrastructure.

This seems to forward queries based on some criteria to search engines. Well, I'm glad it's located now. But sure, this could cause some issues, if it's not known if it's enabled or not. Now checking how good logs CF got.

Update #3

If you care about your unlisted URL privacy with Cloudflare, make sure you don't have the Crawler Hints feature enabled. - A good hunt, thanks and sorry!

Update #4

Interestingly the Cloudflare audit log does show me enabling early hints, but it doesn't show enabling crawler hints, nor mention it in the logs at all? Shouldn't the enabling of the feature be in the logs? - It was confirmed on Twitter that not all configuration changes are getting logged. Personal opinion is that it's just inconsistent.

Something else: If you liked geeky tech stuff, check out dnskv.com

2023-05-08