Here, we Look ahead to 2018 – and specifically what the future has in store for some of the major HPC suppliers…
Ready Player One
The widely-praised Ryzen CPU release saw AMD re-enter the consumer CPU market as a strong competitor to Intel, and perhaps even the front-runner where cost optimisation is a more important metric than all-out performance. The Vega GPUs have similarly been praised for offering a strong challenge to Nvidia in the consumer graphics market. However, the popularity of AMD GPUs for mining ASIC-resistant cryptocurrencies has seemingly had a knock-on effect on availability, and perhaps created some strategic confusion.
This new use-case presents a certain difficulty for AMD – while I’m sure they appreciate the sales cryptocurrencies are driving, the market is fantastically volatile and therefore difficult to factor into technical roadmaps. GPU manufacturers and suppliers may be tempted to devote their production capability to these consumer devices in an effort to make hay while the sun shines, but where will that leave them when Ethereum and other coin prices go south? At best, they will be stuck with a glut of mid-range cards and little following in the high-end consumer or enterprise GPU markets.
The Epyc line of CPUs are a different story. Coming along at just the right time to sandwich between “old faithful” Intel Xeons and various ARM upstarts, Epyc gives customers the option to shift towards more data-centric computing (more memory bandwidth and more PCIe lanes for those shiny new NVMe drives) without gambling on a new software ecosystem.
If Epyc and subsequent models are targeted to the right customers and the right applications, 2018 could be the year that AMD truly return to the stage and make a big impact in the datacentre. If not, they might be doomed to remain “Altcoin Mining Devices” for a while longer.
Ender’s Game
Between their continued efforts to drive GPU adoption in traditional HPC and their dominance of the machine learning segment, it is easy to see why Nvidia are widely regarded as the biggest success story of recent years. The problem they now face is – where to go from here?
We saw a glimpse of what might come next from Nvidia with the tensor cores introduced in the V100 – since they realistically have nowhere to go on the silicon area front, the most logical next step would be to introduce more specialised processing elements to deliver the performance boosts customers have come to expect.
The interesting part will be seeing whether Nvidia stick with their current “one chip to rule them all” strategy for the datacentre, sharing the same silicon between traditional HPC use-cases and machine learning applications. It has worked well for them so far, but it’s easy to see how specialised accelerators might start to take the wind out of their sails in the next year or two. Will we therefore see Nvidia split their product line? Balancing HPC and ML accelerator quantities might make procurements a bit more tricky, but as long as the same software stack can be used by both products I’m sure Nvidia won’t mind pointing their customers in the right direction – as long as it’s not towards their consumer cards of course…
Inception
UK-based Graphcore and various other new hardware companies are poised to disrupt the trendiest bit of the market with ML-focused accelerators. With the sun setting on Moore’s Law and no replacement for silicon on the cards yet, it is natural to expect a return to heterogeneous, highly specialised computing by way of accelerators. Machine learning is an excellent target for this sort of push, thanks in part to the hyped-up expectations which are set whenever someone utters the term “artificial intelligence”.
The same trend is gathering pace in the quantum computing world – where previously D-wave stood alone as more of a curiosity, chip designers are now openly pushing towards demo-capable hardware. While we don’t expect to see any major leaps in the technology this year, now is the time for users to take a closer look at what might be possible down the line, and familiarise themselves with the programming approaches which are being trialled.
The Empire Strikes Back
2017 was not a good year for Intel, at least not on the morale front. In the traditional CPU space, the combination of eye-watering Skylake prices, the return of AMD, and ARM finally moving into the enterprise mainstream has meant that customers may now be ready to accept their part in supporting a competitive ecosystem by buying from elsewhere. As we move into 2018 Intel might have hoped to put their troubles behind them, but recent reports have unveiled a crippling bug in their CPUs, leaving users stuck between a fundamental security flaw in their system and the performance hit which results from the needed kernel fix.
While the CPU market is not likely to be kind in 2018, the coming year sees Intel poised to release their Nervana AI chip in an effort to push back against the gains Nvidia have made in the datacentre off the back of the massive growth in machine learning usage. As well as keeping up with the Jones’ on the ML/AI front, an opportunity for Intel to achieve some much-needed differentiation (or vendor lock-in, depending on how cynical you are feeling) is coming in the form of XPoint DIMMs. Will these new products be enough to keep Intel at the fore of HPC, or is this diversification more of a desperate scramble to fend off an impending decline?
The World’s End
Will this be the year that cloud vendors get serious about HPC and (if you ask some people), ruin everything?
Until now, cloud folks have proudly talked up the individual projects which they have been able to run on their platform, but they have made little serious effort to eat into the on-premise HPC market. This shouldn’t be too surprising; for many vendors, HPC is a vanity exercise which delivers little to no profit relative to their enterprise sales. Unfortunately for the research computing crowd, the economics of the centralising shift driven by enterprise are likely to catch up on us sooner rather than later.
Traditional HPC will, of course, stick around for quite some time yet – none of the big cloud providers have the right combination of billing strategy, hardware suitability and ease-of-use features to make the transition happen quickly. Frustratingly, none of the major vendors seem to have a coherent plan for how to fix their problems and get your business, a situation which has led to the emergence of various HPC-as-a-Service offerings such as those from IBM, Verne Global and CPU 24/7. These dedicated solutions side-step some of the common criticisms levelled at cloud usage for research (virtualisation overheads, charging for data transfer), but their specialism means that costs are likely to be just as unpalatable for the time being.
2018 might be the year of cloud HPC… but probably only if you are willing to use a loose definition of cloud, and a loose definition of HPC. Before public cloud providers can really take over, there are some structural challenges which need to be dealt with; at present, research council funding models aren’t a good fit for OpEx spending, effectively locking the HEI sector into their own datacentres. UK university HPC managers are already expressing their struggles to cope with the operational cost liabilities associated with large capital grants, such as EPSRC Tier-2 funding – until this issue is resolved, cloud HPC is likely to stay targeted at engineering and pharmaceutical industry customers, who have more freedom in managing their money.
Even if a good approach to funding HPC as an operational expense is achieved, there remains the worry that drawing a straight line between each core-hour or terabyte-month and an invoice item will discourage support for “blue skies” speculative research activities, which until now have been hidden inside the utilisation statistics of on-premise HPC centres, away from the prying eyes of accountants and CFOs. Expect to hear more about this over the coming year, as academic staff and service managers try to defend their on-premise purchases from the higher-ups who fear being left behind by the inevitability of the cloud. Executives, managers and users could all benefit by taking in a broader perspective – something we hope to offer throughout 2018.
Chris Downing
Senior Consultant – Red Oak Consulting