If you read part 1 of this series on Intel’s lithography problems, you’d be forgiven for thinking that the future is all doom and gloom for them. (If you haven't I highly recommend you do so - it's good background knowledge for this post). However, the reality is that’s not necessarily the case. Their longer-term future bears some promise. Although it’s incredibly important, lithography is not the be-all and end-all of integrated circuit design manufacturing. Another critical part which has often been overlooked in the past is packaging – essentially how you make the calculations done on the transistors be transported to the outside world. The process of making a semiconductor device consists of fabricating on wafers with lithography, slicing the wafer into dies, which are then packaged. Packaging protects the die from damage, and lets the chip communicate with the outside world – as well as within the package itself. One of the few things Intel management have said over the past few years that I whole-heartedly agree with is that packaging, not lithographic process, is the future. The main reason for this is the falling benefits of moving to smaller processes [1], coupled with rapidly increasing costs – a EUV machine to make advanced processes can cost up to $125million [2], while the R&D costs to make smaller processes are also increasing [3].
A EUV machine from ASML, the sole supplier in the world.
What is packaging?
The best analogy for understanding packaging design is that of architectural history. Imagine each transistor is a human worker. Traditional lithographic improvements have been focused around making the ‘workers’ more productive in smaller spaces – first hunter-gathering, then farming, then offices, etc. However, we’re now at the point where workers have so little room to move in their workspaces (cubicles) it’s extremely difficult to do any more cost-savings the traditional way. Instead, what we need to do is break out of the traditional paradigm we have of placing everyone in one massive building and move to more radical designs. There are two main ways we could go about doing this: chiplets and 3D stacking[4].
In the terminology of our analogy, chiplets are like if you split up the old massive office building into a campus with multiple smaller offices, where each team can work on a job that they’re specialised in, or where you can cut costs by having inexperienced interns working in one particular department that’s not too important instead of being forced to use identical workers of the same quality across all departments under a ‘monolithic’ , single-office design. The challenge we have is to ensure that communication flows easily and quickly between each different office building.
3D stacking is like going from a society where we only have bungalows (single-layer chips, the traditionally dominant choice in high-performance computing) to high-rise buildings (3D stacked chips, where you have multiple dies on top of each other). The challenge we have is constructing life-support systems like elevators, that can enable data to transfer rapidly out of the CPU, or sewage pipes, that help let waste (heat) escape.
Chiplets
Chiplets are sometimes called 2.5D stacking, referring to the fact that they’re commonly viewed as a stepping stone between 2D and 3D packaging.
The most common method, the one that’s widely used right now (notably in AMD’s Zen 2 architecture), has an ‘interposer’ sitting underneath the chips you want to connect.[5] This interposer acts as a conduit between the two (or more) chips, enabling rapid communication. Think of it as having a concrete floor between the office buildings, where before you had knee-height grass making communication between offices inefficient (PCB traces). AMD famously uses Infinity Fabric, which we’ll discuss later, but on a hardware level it’s likely that they’re using TSMC’s CoWoS – Chip on Wafer on Substrate technology. In this implementation, the Chip is put on a Wafer that helps communication between the chips (the concrete floor), which is put on a substrate (the ground). [6]
The second approach, which Intel is pursuing with EMIB (and TSMC are exploring in very early stages), is basically forgoing the concrete floor and choosing to use tunnels between the office buildings instead. [7]While this approach is faster – each tunnel is dedicated to only connecting two dies, so you won’t have issues where data is trying to get to lots of different places at once and getting ‘stuck’, as you would with AMD’s approach, it’s harder to design products for – you have to design and place a tunnel, rather than simply placing a concrete foundation and plonking whatever you want on the foundation. Really, this is a simple continuation of the two companies’ philosophies. AMD’s HyperTransport, the predecessor of Infinity Fabric, is an open source interconnect that can easily be implemented across processors. Meanwhile, Intel’s EMIB predecessor, FSB, was proprietary and needs to be specially modified for each processor design. While Intel has made it’s successor to FSB, AIB (advanced interface bus) open-source in an attempt to alleviate flaws, much like its 10nm process, the differences inherent in initial design choices are difficult to overcome. Which approach is going to be superior remains to be seen. In my opinion, it’s evident that each company has chosen the approach that’s best for it – Intel with its larger budget/manpower can afford have teams of engineers working on specialised implementations of packaging for each product, to get slightly better performance; while AMD’s smaller size drives them to seek a scalable, easily replicable design that can be used across many designs with minimal adaptation. Infinity Fabric, which is AMD’s own protocol implementing extremely high speed connections between different dies (essentially, the secret sauce in how the ‘concrete foundation’ is made), can in theory be used by any company to connect their custom chiplets to Zen chiplets with minimal input from AMD, which is impossible with Intel. [8] At the same time, it could also be used by AMD in GPUs to make chiplet-based designs, which could have a massive impact on scalability of performance, manufacturing costs etc. – just like what happened in the CPU area. One only needs to look at the massive Nvidia A100 chip, with its correspondingly massive manufacturing costs, to see how much of an impact this could have on the market.
Chiplets are here right now from AMD/TSMC, they’re sort of here from Intel, expected to come online at a much larger scale from Intel in the next few years, and will be here to stay. The two ‘teams’ have differing philosophies in how their chiplets are constructed, and it’ll be fascinating to see which is more successful. In my opinion, if Intel can improve the morale of its engineers and kick things into high gear, EMIB will be more successful in high-performance computing workloads, while the simpler approach from AMD/TSMC have secured them dominance for the next two years at the very least.
A comparison of different techniques. The middle technique is AMD/TSMC's solution, while the bottom one is Intel's EMIB.
3D stacking
In theory, going 3D is just another form of chiplets (you put each chip on top of each other, instead of next to each other), but the challenges are sufficiently different that it’s easier to discuss them separately.
3D stacking is probably a lot closer than what most people think. Despite the fact that Intel announced its’ ‘Foveros’ stacking tech in 2019 [9], Samsung announced their ‘X-cube’ packaging a few weeks ago [10], and TSMC announcing their ‘3DFabric’ family just a few days ago [11], the reality is that 3D stacking has been around for years. As explained earlier, the problem with 3D stacking is that you need to dissipate the heat, as well as transfer data. High-performance chips like those made by Intel, AMD and Nvidia produce large amounts of heat and require large amounts of data throughput, making it difficult for them to use 3D stacking. However, other types of chips with lower heat and data, like mobile phones, have been using PoP (Package on Package) stacking forever, which basically folds a 2-chip processor in half like a book, to reduce to horizontal footprint while increasing the vertical footprint.[12] Even ‘true’ 3D stacking – where you build layers one on top of another, is viable in certain use cases. DRAM manufacturers have been using 3D stacking since 2011, and NAND manufacturing wasn’t far behind.[13] SK Hynix recently released a 128-layer NAND SSD! For the high-performance chips we are most concerned with, we have mostly solved the issues with transferring data, and simply need to solve the cooling issues and manufacturing challenges. The way that data is transferred through different layers is using TSV (through-silicon vias). [14] These are essentially elevators going through the layers that transfer data. Compared to PoP solutions, the chip is higher density, and the connection is shorter -elevators are direct and faster. While we can technically stack any chips we want together right now – TSMC’s SoIC is slated for mass production in 2021 (likely for mobile chips), [11] the issue is that cutting edge chips – especially x86 chips used in datacentres which go for thousands of dollars, throw off a lot of heat. AMD Epyc’s 7742 has a TDP (thermal design power – an approximation of how much heat a processor outputs under load) of 225W, [15]while Intel’s Xeon 9282 has a TDP of 400W. [16] When you consider the fact that all this heat is coming off a piece of silicon <500mm2, it’s apparent that it’s extremely difficult to disperse all that energy. Current techniques, where we essentially have a lump of metal on top of the chip conducting heat up to a fan that shifts huge volumes of cool air past the chip, cooling it down, are about as optimised as they can be. Before we can stack CPU logic chips on each other, we most likely need a paradigm shift in design philosophies – either drastic improvements in efficiency, or more likely some way to conduct heat out of the chip better than we have now. The most advanced suggestion so far seems to be tiny channels between layers that circulate coolant inside the chip, making it so each layer is cooled well rather than just the top layer as is the case with current designs. [17] Obviously, this is completely uneconomical to mass produce currently. However, it is possible with current technology to place a less hot component on top of a logic die and reap the speed benefits between those two chips – for instance Samsung’s newly announced X-cube technology, which places SRAM on top of a logic die. [10] We’re likely to see a few years of this sort of in-between state, where RAM; IO; FPGA etc. are stacked on top of logic dies, before we reach the ‘Holy Grail’ of high-power logic on logic – and perhaps even every component in a system, from CPU, GPU, RAM, to various accelerators being all packaged into one single chip horizontally, laterally and vertically. True 3D stacking is so far away that it’s impossible to see the winner, however when judging the intermediary efforts that various vendors have announced, I am of the opinion that Intel has the greatest potential with Foveros for the reasons below; while Samsung is extremely competitive – even leading, in the areas where it is offering solutions, and TSMC is the overall leader with the widest array of offerings which are all highly effective, both in cost and performance.
A representation of a 3D stacked chip - there are 4 layers of die (yellow), with TSVs (black rods) connecting them to the substrate (blue) and bumps (balls) that act as an interface to the outside.
After reading all that, you probably feel cheated. Wasn’t I going to tell you why Intel’s future is promising? It feels like I just explained the opposite doesn’t it? Well, the reasoning for why I said Intel’s future is promising is threefold.
1. Packaging is in its’ infancy compared to process. Intel might appear to be behind now, but the distance is definitely not as much as they are behind in process, and given where we are in packaging maturity (we’re at the very start of the diminishing returns graph, where it’s easy to make big strides), Intel could easily climb ahead given their engineering prowess. In addition, opposite of what has become evident in lithography, where outsourcing to a specialised foundry has ‘won’ the battle against vertical integration, it could very well be that integration will be more successful in packaging, with a holistic, unified approach to designing a chip proving useful.
2. Intel spends far more money on research than anyone else - $13 billion in 2019, while TSMC spent $3 billion. (or 18.6% of revenue vs 8.5%). [18][19] Although Intel’s operations are broader, with investments in anything from process/packaging to self-driving and memory, it’s likely that Intel’s spending on packaging and process exceeds that of competitors. In addition, Intel’s engineers are colloquially known to be some of the best in the business. It’s current woes are largely a function of management problems, not monetary or talent-based.
3. Intel’s (mis)management deserves a post, if not a book of its own. Suffice it to say the corporate equivalent of Game of Thrones has been going down for the past decade, and now Littlefinger has finally managed to slay his enemies and gain complete control. [20] Littlefinger may not be the best person to lead an semiconductor company, and his position may be constantly under threat from the Iron Bank (the board), but at least the internal politics and strife should be largely out of the way, opening up the chance for a focused Intel putting their full momentum behind a comeback – a terrifying thought for any competitors.
Semiconductors are an incredibly complex industry. The very things that create such huge barriers to entry for competitors, and impressive returns for shareholders, make it difficult for us to truly understand the technological aspect of the companies are we invested in. My investing style relies heavily on an understanding of the things I’m investing in, and I hope that this series looking at some of the more technical aspects of semiconductors can help all of you gain an understanding as well, and to cut through the marketing spiel that management regularly throws at us.
Feel free to contact me if there’s anything you’d like me to look at, and thanks for taking the time to read!
[1] https://cartesianproduct.wordpress.com/2013/04/15/the-end-of-dennard-scaling/
[2] https://www.eetimes.com/euv-tool-costs-hit-120-million/
[3] http://euvlsymposium.lbl.gov/pdf/2012/pres/G.%20Yeric.pdf
[4] https://semiengineering.com/knowledge_centers/packaging/advanced-packaging/
[5] https://www.researchgate.net/publication/340843129_Chiplet_Heterogeneous_Integration_Technology-Status_and_Challenges
[6] https://www.tsmc.com/english/dedicatedFoundry/services/cowos.htm
[7] https://www.intel.com/content/www/us/en/foundry/emib.html
[8] https://www.amd.com/en/campaigns/hpc-solutions-webinar
[10] https://news.samsung.com/global/tag/samsung-x-cube
[11]TSMC Technology Symposium 2020
[12] https://sst.semiconductor-digest.com/chipworks_real_chips_blog/2019/01/16/the-packaging-of-apples-a12x-is-weird/
[13] https://www.nvmdurance.com/history-of-3d-nand-flash-memory/
[14] http://www.appliedmaterials.com/files/Applied_TSV_Primer.pdf
[16] https://ark.intel.com/content/www/us/en/ark/products/194146/intel-xeon-platinum-9282-processor-77m-cache-2-60-ghz.html
[17] https://www.nature.com/articles/d41586-020-02503-1
[18] https://www.intc.com/filings-reports/all-sec-filings/content/0000050863-20-000011/0000050863-20-000011.pdf
[19] https://www.tsmc.com/download/ir/annualReports/2019/english/index.html
[20] https://www.tomshardware.com/news/intel-leadership-tech-team-changes-not-delayed-murthy-renduchintala-leaves
Comments