Quantcast
Channel: NearlyFreeSpeech.NET Blog
Viewing all articles
Browse latest Browse all 34

Bigger, better, faster, more

$
0
0

I debated whether to write a humorous intro, but I’ve ultimately decided it’s more important to get succinct information out to everyone, so here’s the TLDR:
Over the next few weeks, we will migrate NearlyFreeSpeech.NET to all-new equipment and greatly upgraded network infrastructure.

  • We’re replacing our Intel Xeon servers with brand-new AMD Epyc servers.
  • All our existing file storage will be migrated from SATA SSDs to NVMe PCIe 4.0 SSDs.
  • Most of our content will be served from New York City rather than Phoenix after the upgrade.
  • Various things may be intermittently weird or slow for the next couple of weeks as we shift them around, but we’re working hard to minimize and avoid disruptions to hosted services.

NearlyFreeSpeech goes Team Red

There’s no question that Intel has been good to us. Xeons are great processors. But these days, AMD Epyc… wow. The processors aren’t cheap, but the compute performance and I/O bandwidth are outstanding. 128 PCIe 4.0 lanes? Per CPU? Except for the speedup, this change should be transparent to most people. By and large, we’ve tried to protect people from building things too specific to exact CPU models by masking certain features, but there is probably some random instruction set supported on the old machines that isn’t present on the new ones. So if you’ve done something super-weird, you may have to recompile.

I don’t want to make any specific promises about performance. After all the speculative branch execution fixes, the security layers needed for our system to protect you properly, and other overhead, these things never quite reach their maximum potential. But, so far, they’re so fast!

Here’s the catch. Some ancient site plans bill based on storage space but not CPU usage. These plans have been gone for about ten years. They were an incredibly bad deal for people who wanted to store lots of data, but it cost basically nothing if your site was tiny and used lots of CPU. That wasn’t sustainable for us. We grandfathered those sites at the time because we’ve always paid a flat rate for a fixed amount of electricity whether we use it or not, and those sites have been running on the same hardware ever since (Intel Xeon X5680s!). Neither of those things will be true going forward, so it’s the end of the road for those plans. We plan to temporarily allocate a bare minimum amount of hardware to those sites for a few months and then let affected people know that they’ll be migrated to current plans around the end of the year.

If you want to check this now:

  1. Go to the Site Information panel for your site.
  2. Find the “Billing Information” box.
  3. If there’s been a red-text message “($10.24/GB/Month – Legacy Billing!)” on the “Storage Class” line for the last ten years, you’re affected.

To change it, find the “Config Information” box and edit the Server Type. Pick the closest option. (If in doubt, “Apache 2.4, PHP, CGI.”)

Quoth the raven, “NVMe more!”

It’s something of a sore point that our file storage performance has always been a bit lackluster. That’s largely because of the tremendous overhead in ensuring your data is incredibly safe. Switching from SATA SSDs to NVMe will give a healthy boost in that area. The drives are much faster, and the electrical path between a site and its data will be shorter and faster. And it’ll give all those Epyc PCIe lanes something to do.

But there’s a little more to the story. To get adequate resiliency, sacrificing some performance is a necessary evil. It just flat-out takes longer to write to multiple SSDs in multiple physical servers and wait for confirmation than to YOLO your data into the write cache of a device plugged into the motherboard and hope for the best. We accept that. And we’ve always accepted that our less-than-stellar filesystem performance was the compromise we had to make to get the level of resiliency we wanted. However, we’ve always suspected we were giving up too much. It’s taken years, but we’ve finally confirmed that some weird firmware issues have created intermittent slowness above and beyond the necessary overhead.

So we expect our filesystem performance to be dramatically better after the upgrade. Don’t get me wrong; it won’t be a miracle. The fastest SAN in the world is still slower than the NVMe M.2 SSD on the average gaming PC (or cheap VPS). But one keeps multiple copies of your data live at all times and does streaming backups, and one doesn’t. And it should be a hell of a lot better than it has been.

Related to this, we’ve made some structural changes to site storage that will make moving them easier and faster. That has some other benefits we care a lot about that you probably don’t, like making storage accounting super fast. It should also make some other neat things possible. But we need to explore that a little more before we announce anything.

New York, New York!

Things have changed quite a bit since we started. As much as I love Phoenix, it’s not the Internet hub it was when I lived there in the 1990s. While some benefits remain, I no longer believe it’s the best place for our service. We see dumb stuff we can’t control, like Internet backbones routing traffic for the US east coast and Europe from Phoenix through Los Angeles because it’s cheaper. New York, on the other hand, is functionally the center of the Internet. (More specifically, the old Western Union building at 60 Hudson Street in Manhattan.)

It will surprise no one that Manhattan real estate is not exactly in our budget, but we got close. And, more importantly, we are parked directly on top of the fiber serving that building. It’d cost about ten times more to shave 0.1 milliseconds of our ping times.

This change will make life demonstrably better for most people visiting hosted sites; they’re in the eastern US and Europe. But we’re not leaving the west hanging out to dry. We can finally do what I always wanted: deploy our own CDN. After we’re finished, traffic for customer sites will be able to hit local servers in Phoenix, New York, and Boston. Those servers will transparently call back to the core for interactive stuff but can serve static content directly, much like our front-end servers do today. That’s already tested and working. You might be using it right now.

The new design is completely flexible. It doesn’t matter where your site is located; traffic enters our network at the closest point to the requestor, and then our system does the right thing to handle it with maximum efficiency.

It’s now technically possible for us to run your site’s PHP in New York, store your files in Boston, and have your MySQL database in Phoenix. But “could” doesn’t always mean “should.” We’re still constrained by the speed of light; a two-thousand-mile round trip on every database query would suck pretty hard. (But I’ve done it myself with the staging version of the member site. It works!) So everything’s going to New York for now.

Keeping it weird

This change means we have to move all your data across the country. Sometime in the next few weeks, each site and MySQL process will be briefly placed in maintenance and migrated across our network from Phoenix to New York. For most sites, this should take less than a minute. We’ll start with static sites because they don’t have any external dependencies. Then we’ll move each member’s stuff all at once so we don’t put your MySQL processes and site software into a long-distance relationship for more than a few minutes. Once we have a specific schedule, we’ll attempt to make some information and, hopefully, some control available via the member UI to help you further minimize disruption. But our goal is that most people won’t even notice.
There may be some other weirdness during this period, like slowness on the ssh server, and you may actually have to start paying attention to what ssh hostname to use. All that will be sorted out by the time we’re done.

Some longtime members may recall the 2007 move where it took us over a day to move our service a few miles across town. At the time, we wrote, “Should we ever need to move facilities in the future, no matter how long it takes or how much it costs, we will just build out the new facility in its entirety, move all the services between the two live facilities, and then burn down the old one for the insurance money.” Oh my god, it took a long time and cost so much money, but that’s exactly what’s happening. (Sans burning down the facility! We love our Phoenix facility and hope to continue to have equipment there as long as Arizona remains capable of sustaining human life.)

Final thoughts

These changes represent an enormous investment. Thus, much like everyone else these past couple of years, we will have to pass along a huge price increase.

No, just kidding.

Our prices will stay exactly the same, at least for now. (Except for domain registration, where constant pricing fuckery at the registries and registrar remain the status quo. Sadly, there’s nothing we can do about that. Yet.) In fact, they might go down. We bill based on how much CPU time you use, and it’s likely to take less time to do the same amount of work on the new hardware.

The last few years have been pretty weird. COVID aside, NearlyFreeSpeech.NET has been keeping pretty quiet. There’s a reason for that. I’m proud of what NearlyFreeSpeech.NET is. But there’s a gap between what is and what I think should be. There always has been. And that gap is probably bigger than you think.

So I spent some time… OK, nearly three years… more or less preserving the status quo while I did a very deep dive to learn some things I felt I needed to know. And then, I spent a year paying off tech debt, like getting our UI code cleaned up and onto PHP 8 and setting up this move. So four years went by awfully fast with little visible change, all in pursuit of a long-term plan. And in a few weeks, we’ll be finished. With the foundation.

“It’s a bold strategy, Cotton. Let’s see if it pays off for ’em!”


Viewing all articles
Browse latest Browse all 34

Trending Articles