System on a Chip – Inflection Point

Introduction

In computing I typically look for inflection points. Inflection points are where everything changes. There are usually important inflection points, or breakthroughs of some sort, every several years. One I clearly remember noticing occurred when I was walking through a place we called “shopping malls”, a collection of stores with a common connecting area.

In one of the store windows was an Apple Macintosh running a video using a new technology called “QuickTime”. The video was 320×240 pixels running in black and white at 30 fps. I stopped in my tracks, and a friend I was with later said that I was incoherently babbling (or maybe he meant more so than usual). Unreasonably excited, I asked “Do you know what this means?” Naturally he said “No.” Now he has an 80″ High Definition TV in his living room. Granted, it’s years later but you get the idea.

This particular inflection was when computers became fast enough to process compressed digital video and display it.

Why chip technology plays a role

The hardware on the Mac at the time was a Motorola 68040 32 bit CPU. The production process of that chip was ~ 0.8μm (1 μm = 1000 nm). A production process defines the minimum size of a feature when etching a semiconductor. Typically we think of features as the size of a given transistor, and the semiconductor being silicon.

As transistor density on chips increases, functionality grows. Like everything engineering, there are tradeoffs. In this article we are going to be talking about performance per watt. Performance per watt is a measure of the energy efficiency of a particular computer architecture block.

Here’s something that seems unintuitive at first. As the density of transistors on a chip rises, so does the performance and power efficiency. In other words, you get more performance with the same amount of energy or variations thereof. The performance part is intuitive, the closer everything is together the less distance electrons need to cover between features. The power part, not so much.

Mobile Phones

As mobile phones went from analog to digital, people recognized the need for a complete computer in the phone in a small amount of space. This is the main driver for what we now call a System On a Chip (SoC). Motorola, a leading chip manufacturing at the time, lead the push to shrink down the complete workings of a computer onto just a few chips. Remembering that the SoC was running on a battery and lithium ion batteries weren’t in use then, you can see why a lot of the engineering tradeoffs went towards power management.

Even today, mobile phone chips use the most advanced manufacturing techniques to provide the best performance possible in a small package. These manufacturing techniques are so advanced that ASML is the only company in the world that manufactures a lithography machine capable of etching chips at a 4nm process. I’ll mention that this is an Extreme Ultra Violet (EUV) lithography machine, mostly because I like the word extreme. This is the current production state of the art process for chip making as of this writing. Two companies, TSMC and Samsung, are the only companies currently capable of using this ASML machine to produce complete chips.

Manufacturing

To give a sense of scale, an ASML 4mm lithography machine is ~220 Million USD give or take (a person like you will need more than one). A plant to manufacture the chips, called a fabrication facility or fab, is ~10 Billion USD. In addition to the ASML machines, you’ll need extremely sensitive and expensive manufacturing and support machines which are nearly as magical as the lithography one.

Additionally there are the robots, chemicals and facilities, water and air handling, environmental controls, and so on. Most importantly there are bunny suits, which makes everything worthwhile. The fab needs to be in a seismically stable region of the world, you can’t have things moving around when you’re making things that small. Plus, it takes a few years to build a fab. Oh, and the highly trained people to work in the fab? Turns out they expect a paycheck too. $10 billion to build, and then big operating costs going forward.

That’s just the manufacturing side. You’ll need an extra $2-8 Billion and a few years to design, simulate and test any chip you want to produce. There are only a handful of companies in the world that are capable of doing this part. You are familiar with them, NVIDIA, Apple, Samsung, Qualcomm, Intel, AMD. There are a few others, but the point is that you need some pretty deep pockets and a real commitment.

Some of these companies are what we call “fabless”. For example, NVIDIA and Apple design their own chips, but do not have their own chip fabrication facilities. Rather they rely on outside manufacturers.

Heaven help you if you get behind in the race or get your strategy or tactics messed up. Just ask Intel. For a half a century Intel was the premier chip manufacturer in the world. Then as many industry analysts observed “they fucked up” and Intel started falling behind. Intel is working on getting back in the game, but it will take the better part of a decade and 10s of billions of dollars to do so.

The SoC Inflection Point

By the early 2010s SoCs were becoming much more powerful. Around that time, the chip process was ~28nm. This meant that an SoC could have multiple core CPUs, several different I/O controllers, a graphics controller, and video encoders/decoders. In 2014, NVIDIA introduced the Jetson TK1 SoC which included a 192 core programmable GPU.

Now, I didn’t mention this development to my friend. I don’t want him to have a 80″ TK1 on his living room wall in 30 years. However to me it is obvious what this means.

We will now have systems on a chip that are as powerful as the desktops of just a few short years ago, running on a fraction of the power. But what does that mean?

Autonomous Vehicles?

In 2004 there were fully autonomous vehicles competing in the Darpa Grand Challenge. If you looked in the back of the vehicles, you would see several full size tower PCs running the ‘self driving’ software. Typically the vehicle was a large SUV just so it could fit all of the computer equipment and sensors. As computer folks we always “know” that all of the hardware will be shrunk down in size eventually. What is not clear at the time is how that actually happens.

Today, less than 20 years later, companies like NVIDIA and TESLA have single board computers that handle the task.

Less Sexy

Of course autonomous vehicles are the most famous seductress in having this much computing power in a small package. However, as the Jetson community has shown, there is a limitless variety of applications that a big computer in a small, power efficient package enables. Robotics, sure. Also intelligent video analytics. But we’re at the very start of this revolution that people call “AI on the Edge”.

On the other side of the coin, once you pack in 17 billion transistors on chip like a Jetson Orin, you have a computational equivalent to a modern day desktop. That’s another inflection point. There is a hidden gem in all this, not immediately obvious. You can run an SoC in different power ranges with relatively few changes. Let’s look at how another company you may have heard of handles takes advantage of this.

Apple

Apple has two SoCs worth mentioning here. The A series chips run their iPhone lineup. The current A chip is A15 which uses a 4nm process. The power budget for a large smart phone is typically ~7-9 Watts.

The second Apple SoC is the M series. Apple switched over from Intel silicon to their own Apple Silicon a year or two ago. Currently they are on M1. The M1 powers the iPad Pro, the Mac Mini and the new Mac Studio. These are all 5nm process chips. These products are all faster than the previous generation, and much more power efficient. The power budget for these devices range from less than 20W on the iPads to less than 150W for the higher end Mac Studios.

There’s a bunch of marketing hype around the Apple stuff of course. Is it as fast as a top of the line PC with a NVIDIA RTX 3090? No. After all the 3090 has 28B transistors and can use 500W, and it’s using a 8nm process vs the 5nm process of the M1. The 3090 has more transistors than the M1, and is specialized towards graphics performance. On the other hand, the Mac uses ~4x less power than the graphics card alone on the PC. It is not that unusual to see 1000 Watt power supplies in PCs now. For the vast majority of tasks the Mac Studio exceeds most users expectations.

So we’re seeing the latest SoCs now in the full computer continuum, from small mobile devices to the desktop. More speed, less power. Also, the nature of the compute blocks on the SoCs are changing as manufacturers have been adding machine learning compute blocks, along with security blocks. For example, on the Jetson Xaviers and Orins, 1/3 of the compute capability is provided by the Tensor Cores and Deep Learning Accelerator (DLA).

Conclusion

In this article we went over why recent SoCs are an inflection point in computer engineering. As is the nature of my writings, we very lightly went through material about which volumes have been written. In future articles, we’ll talk about how the Jetsons fit into this new landscape.

Leave a Reply

Your email address will not be published. Required fields are marked *