My name is Bruce Dubbs. I am the managing editor for the LinuxFromScratch project. This project produces books that show users how to build packages from source and has been in existence for about 25 years. In total, we cover a little over 1000 open source packages. It is meant primarily as an educational resource but has also become a pseudo distribution.
As a part of the project we developed a set of scripts to go out to the internet and look at web pages to determine the latest version of each package. A lot of those pages are at download.gnome.org.
Recently (for about a week), our page requests from our server seem to be blocked. I can only speculate about why. I suspect it may be because we are making 100+ page requests in about a 30 minute period once a day.
I hope this is the right place to bring up this issue. If not, please direct me to the proper place.
@bdubbs would you mind providing more information on what URIs you fetch daily? We recently moved download.gnome.org to a CDN which we have been using for the last few years (see GNOME Mirroring system updates) so there may be ddos prevention mechanisms applied at that level now.
I just re-ran my scripts and the only issues I still have is with gedit, gtk-vnc, libadwaita, libnotify, and pango. That’s a great improvement over what I has a few days ago. I need to investigate what happens on those packages to double check it it is my script. I’ll try to get more info and report back later today.
OK, I think I have the problem solved. When you made the change I was getting some errors in my wget output about redirecting to the CDN site port 8000. It then tried 3 different ip addresses but got connection refused for all of them. That problem has gone away.
When I just checked, I was able to solve the remaining problems. Evidently the output format for directories changed. I admit my scripts are fragile but I can handle that.
I am still having problems accessing download.gnome.org. In order to have our scripts determine the latest version of gnome packages, we need to make two, and sometimes three, directory queries.
For instance checking gnome-bluetooth, we first get the directory /sources/gnome- bluetooth/ and determine the latest sub-directory, in this case 47. Then we get /sources/gnome-bluetooth/47. If that directory only has alpha or beta tarballs we back up to the 46 sub-directory.
My current problem is that these directory queries are very slow. When I checked just gnome-blutooth, it took 17 seconds. In some cases we get a timeout after 30 seconds.
If there was only one package to check it would not be a huge problem, but we do this for about 140 different gnome packages.
I just migrated download.gnome.org to the new infrastructure, it should be stable now. I’m discussing internally ways to mitigate what’s going on with the old infrastructure as we still need some additional time to finalize the migration of all the remaining services, I’ll keep you guys posted as usual, thanks!