On Jan. 26, 2015, a little before 3 pm, some of the student workers working on the TE 1.0 system informed us that the system had slowed down to a trickle. A few minutes later, the systems group hosting the TE (1.0) Web site informed us that both the HTTP request rate and the associated request wait times as well as database query times had all greatly increased. For individual users accessing the system, request wait times had grown so much that, as far as they were concerned, the system had halted.
Although symptoms such as these can have a variety of causes, one likely candidate is a so-called Denial of Service (DoS) attack. In a DoS attack, requests for service arrive at a machine at a rate which is higher than the rate at which the requests can be served. In process management language: the arrival rate outstrips the service rate. In any capacity-constrained system (we see the same when the rate of customers arriving at a restaurant outstrips the rate at which these customers are served, or when the rate of cars arriving at a highway on-ramp outstrips the capacity of the highway to absorb these vehicles and move them along) this results in longer wait times or more precisely, queueing times. Since the requests are waiting in the queue to be served, the issuer of the request has the impression that the entire service is halted, just like the driver of a car stopped in a traffic jam on a busy road has the impression that the road is blocked and vehicles ‘requesting’ passage are not at all served. When this principle is abused and lots of requests are purposefully directed at a service in order to overwhelm its service capacity, we call it a Denial of Service (DoS) attack. If these requests are arriving form a large number of different machines, we speak of a Distributed DoS (DDoS).
As our Jan. 26th, 2015 DoS incident illustrates, however, not all DoS instances originate as targeted attacks.
Although our TeachEngineering server served a variety of protocols and hence, a possible DoS could target any of these services, we took a look at its Apache Web server log to see if it would contain information about what was going on. The following is a random one-second excerpt from those logs:
126.96.36.199 - - [26/Jan/2015:10:00:40 +0000] "GET /announce.php?info_hash= %A8%82c%F92%7F9%150%A9%112%10%CF%0C%0E%D8d%87s&peer_id= %2DSD0100%2D%EC%9D%CB%BD%A7%2D%E5%EBw%A2F%BD&ip= 188.8.131.52&port=13337&uploaded=1005870892&downloaded= 1005870892&left=706329&numwant=200&key=2497&compact= 1&event=started HTTP/1.0" 302 587 "-" "Bittorrent" 184.108.40.206 - - [26/Jan/2015:10:00:40 +0000] "GET /announce.php?info_hash= %B8%A7%7B%11%9D%F2m%E8%EE%92%A8%DA%2Dxy%11%94%F8Z%E9&peer_id= %2DUT3000%2D0%1C%D5%23%3A%92%5B%B0%BC%2ExO&ip= 192.168.1.104&port=1080&uploaded=0&downloaded=0&left= 289742100&numwant=200&key=644621065&compact=1&event= started HTTP/1.0" 302 578 "-" "Bittorrent" 220.127.116.11 - - [26/Jan/2015:10:00:40 +0000] "GET /announce.php?info_hash= %8F%A6%81%3A%B7%2C%C1%C8%D1v%25%F8%B75Z%D2I%84%07H&peer_id= %2DSD0100%2D%C3%26v%06%94%DB%29%CA%DD%84%C7%7B&ip= 18.104.22.168&port=19678&uploaded=125304832&downloaded= 125304832&left=878468806&numwant=200&key=31199&compact= 1 HTTP/1.0" 302 575 "-" "Bittorrent" 22.214.171.124 - - [26/Jan/2015:10:00:40 +0000] "GET /announce.php?info_hash= %DC%C9%A9%2Cwl%ED%7F%0Fmm%21p%D1%01%0C7%16%EFk&peer_id= %2DSD0100%2D%9CkT%08%92%F2%CC%A8%AC%9E%00%7B&ip= 192.168.21.52&port=8123&uploaded=23068672&downloaded= 23068672&left=814367656&numwant=200&key=19228&compact= 1 HTTP/1.0" 302 566 "-" "Bittorrent" 126.96.36.199 - - [26/Jan/2015:10:00:40 +0000] "GET /announce?info_hash= %B5%F6%9CL%BF%A4%D0%2D%08%D7%13%070k%9C%80%29M%BA%BA&peer_id= %2DSD0100%2D%21%B3Q%22O%BF%28%22%B5%8F%40q&ip= 188.8.131.52&port=13318&uploaded=1062366471&downloaded= 1062366471&left=1040028409&numwant=200&key=5854&compact= 1 HTTP/1.0" 302 572 "-" "Bittorrent" 184.108.40.206 - - [26/Jan/2015:10:00:40 +0000] "GET /announce?info_hash= z6%D5%97g%C1%9CIS%9B%10%F4%C0%8B%C1%99%AF%09g%C3&peer_id= %2DSD0100%2D%D1%D4%E9%CF%1Ay%86%C7z%8F%95%F1&ip= 220.127.116.11&port=11338&uploaded=7219559940&downloaded= 7219559940&left=15781004769&numwant=200&key=17705&compact= 1 HTTP/1.0" 302 573 "-" "Bittorrent" 18.104.22.168 - - [26/Jan/2015:10:00:40 +0000] "GET /announce?info_hash= %A1%D0%BDy%FA%D7%27u%0A%96%8D%FDSb%EB%BF%8C%F3%EC%AD&peer_id= %2DSD0100%2D%1AS%8356%28%06%AEe%05%E7%E6&ip= 192.168.1.99&port=13897&uploaded=149680706&downloaded= 149680706&left=72221773&numwant=200&key=30556&compact= 1 HTTP/1.0" 302 563 "-" "Bittorrent"
The log entries are easy to parse:
IP-address_of_the_requestor – - [date and time of the request] “HTTP_request_method /requested_file?URL_parameters HTTP_version” HTTP_response_code number_of_bytes_returned “-“ “User_agent”
Looking at this series of requests, things started to become clear. The machines making the requests were all based in China (you can geographically trace these IP’s at www.yougetsignal.com/tools/visual-tracert) and the requests all came from a software identifying itself as BitTorrent, a well-known file-sharing protocol.
From this we concluded that somehow —most likely by accident but possibly on purpose— our TeachEngineering machine had become registered to be part of the BitTorrent file-sharing network and we were being flooded with BitTorrent requests.
Once we concluded that the problem was caused by a flood of BitTorrent requests coming from China, we had to decide on a remedy. The easiest and most obvious course of action would have been to block all BitTorrent requests. This would probably have worked just fine, but since we did not know whether the inclusion of our IP in the BitTorrent network was accidental or purposeful, we erred on the side of caution. We guessed that if the attack was purposeful, blocking the requests might anger (or challenge) the perpetrator(s), as a result of which we might become the target of more vicious attacks. Hence, rather than blocking the requests we decided to ‘deflect’ them. We replied to the requests with a page without content but which resulted in a 200 (Success) HTTP response code. Since serving these ‘null’ pages required very little effort from our server and raised the service rate considerably, this approach solved the problem for our users.
An alternative —and in hindsight perhaps preferable— strategy would have been to have the server reply with an HTTP 404 or 410 error. A 404 error signals the requester that the requested file cannot be found whereas a 410 indicates that the requested resource is no longer available.
What most likely happened
We have learned since that our TeachEngineering machine was very likely involved in an incident where Turkish and Chinese DNS servers performed a kind of DNS spoofing, a process which replaces server IP addresses with other, non-related ones. Here is the text from a Jan. 2015 jwz.org blogpost by Jamie Zawinsky.
“After a bit of logging and searching I found out that some Chinese ISP (probably CERNET according to the results of whatsmydns.net) and some Turkish ISP (probably TTNET) respond to dns queries such as a.tracker.thepiratebay.org with various IPs that have nothing to do with piratebay or torrents. In other words they seem to do some kind of DNS Cache Poisoning for some bizarre reason.
So hundreds (if not thousands) of bitTorrent clients on those countries make tons of ‘announces’ to my web servers which result pretty much in a DDoS attack filling up all Apache’s connections.
So basically, entire countries’ worth of porn hounds randomly start hammering on my server all at once, even though no BitTorrent traffic has ever passed to or from the network it’s on, because for some unknown reason, the now-long-defunct piratebay tracker sometimes resolves to my IP address. Hooray.”
TE 2.0 and DoS?
Although there is no reason to expect that TE 2.0 is less likely to be targeted by another DoS attack, either deliberately or by mistake, it does seem reasonable to expect that it being hosted in Microsoft’s Azure cloud should provide it with better, more robust protection than the 1.0 version which was hosted at the university. Without suggesting that the protection provided by a university lab or service is inherently insufficient, it stands to reason that large cloud service providers expend a lot of effort on keeping their renters safe from the vagaries of today’s Internet. Hence, it might be a good idea to host services such as TE in an environment which puts a premium on safety.