RFC8200 IPv6, VPN, Integration, ISP Networking, OVH Ceph, NTFS fun

  • RFC8200 IPv6 Specification - Yay, IPv6 is finally here! It was quite a long wait.
  • VPN discussion from previous post continued with a few arguments countered. Well. I've seen some VPN guides telling user to check, ignore certificate check box on SSTP VPN tunnels. How does that help? If data is actually important, you shouldn't trust HTTPS, you should use additional encryption layer where messages are encrypted and signed by the application and authenticated at the receiving end. Of course using the user account linked individual keys. As example online banking. Even if the HTTPS layer would be totally broken, the in application data layer security still holds. I know to trust my certificate. Because problem is that people think that HTTPS is secure, no it isn't because the certificate trust model is totally broken. I don't trust certificate for my domain from random authority. I trust specifically my key with fingerprint: 7E:07:A9:4B:BA:3A:7A:AF:3D:81:C0:EA:DE:4A:E5:C4:AF:F5:E3:2B:14:D3:B4:E2:3E:01:07:09:2F:C5:F9:59. I'm often confused by the guides telling that certificate from random authority should be trusted more than self signed. I think that's absolutely crazy tip. If someone is able to hijack the domain, they can trivially get new SSL cert for it. But it's very much harder to get the right keys.
  • Nice article about VPN Tunneling Protocols - IPsec, SSTP, L2TP, PPTP, IKEv2, IPSEC. PPTP is documented in RFC2637. L2TP is documented in RFC2661. L2TP/IPsec is documented in RFC3193. IKEv2 is documented in RFC4306.
  • Classic dilemma again, customer wants quick and cheap integration. What about handling exceptions? And proper transactionality? I can write an integration which is guaranteed to work for decades without problems. Or I can write integration which works, if everything happens to be well. And if anything goes wrong, it remains broken until some-one fixes it somehow. Even the failure models aren't known, because nobody cared enough to think about those. - I personally love projects which create something like a temp file, and if the temporary file exists the project crashes. Those are just typical issues with quick'n'dirty development. If there's anything outside the expected simplest possible running envelope, it will end up causing fail, which prevents future operation without manual intervention.
  • SuomiCom (Finnish ISP) is having some serious high latency and packet loss issues. Annoying. (This is old stuff from backlog, maybe they suffered DDoS or there was some other serious network issue)
  • OVH having once again problems with their Ceph storage. Extreme latencies, slowness and failed disk operations leading to repeated server crashes. - Seems to be a big problem, because this really isn't the first time. These problems are usually quite isolated. I don't really know what kind of 'storage segments' OVH is using. But usually the problems seem to be affecting only certain group of servers. And in this case the servers affected are in SBG DC. This is just the classic Amazon / Google / Microsoft, whatever case. They can always claim that everything is working. Because the systems are segmented and only small number of systems is being affected at a time. That's why they can A) Show all the time that everything is ok B) Have all the time bunch of tickets open that things aren't working. This particular storage system problem was earlier much worse. Actually since my posts, I've got one more crash due to storage system. Another thing which makes this particularly annoying is the way Windows works in these situations. Ram based stuff keeps working. Which means that A) Socket connections are accepted and server is 'available' for most of low level monitoring. B) Only stuff which requires access to storage isn't working. C) This is pretty annoying combination. But I'm sure this is nothing new for server experts around here.
  • Btw. Other service providers have had very unfortunately similar failure models. I've seen these exactly same fail on several other providers too. It seems that Windows somehow sucks with these situations, and systems running Linux won't end up similarly partially dead. - Basically this means that when the storage subsystem sucks badly enough, it requires manual hard reboot with all Windows instances.
  • NTFS - Oh wow. Yet another Windows Server with totally trashed NTFS. This is awesome. Not! Three recovery models: 1) Fix NTFS / OS - Hope stuff isn't too messed up. 2) Get new server and recover only key data from messed up environment. 3) Blah, let's just get new server and restore from backup. - All options are viable, depending how serious the fail was. - These are just the things which never happen, until it happens. Ugh.
  • Something different? Caihong 5 (CH-5) aka Rainbow 5 UAV / UCAV combat drone.

2018-09-30