Application Notes | January 07, 2015

8 bitter for IoT? - criteria for choosing 8051 IP Cores

Many, especially young engineers, when asked about 8051 MCUs ask if these good old rascals still exist. But it looks like history tends to repeat itself.

When Intel introduced 51 CPU in the beginning of 80’ in the 20th century, I bet no one would ever believe how popular it become. Nevertheless almost every five years many predicted the sudden death of Intel’s MCU. It should be dead by 1990, then in the beginning of 21st century many suggested that new era doesn’t need good old fashioned 8 bitters any more. And where are we now? 2013 is almost in the end and 8051 CPU are still in the game. Moreover, they seem to have a stable market share and their production number is still… growing. When we look at these numbers, we can easily see that only in 2013 6.7 billion 4- and 8-bit MCUs will be shipped, up 6% over last year. Considering that so many feel 8 bits is dead that's pretty astonishing growth, especially when we compare it with the previous years. Those parts are up 40% since 2009. So after this this lengthy introduction (yes, I just wanted to say that we all writing code for an 8051, are not the dinosaurs), let me just move forward. Of course we cannot say that 8 bitters will beat the 32 bit CPUs popularity, but believe me – 8051 designed in the 21st century are something completely different that 51s designed 20 years earlier. The best example is eg DQ80251, world’s fastest 8051 CPU, more than 66 times faster than original presented by Intel. 8051 still in the game? The 8051 was the major 8-bit microcontroller I used during my studies at university. All of our microcontroller classes were based on this device, and all of the students were convinced that the 8051 was a good all-around product, but what about the more demanding applications? The 12MHz maximum clock frequency and the need for 12 clocks per machine cycle (with most instructions executing in one or two machine cycles) associated with the original 8051 architecture were insufficient to run more advanced applications that required higher-performance MCUs. This led some of us to start to think about how one might improve the performance of the 8051. Thus it was that, in 1999, just after my graduation, I co-founded Digital Core Design (DCD) with two of my colleagues and we started work on improving the 8051 architecture. To this day, we believe that due to its popularity and widespread use, which has resulted in deep familiarity by highly trained electronics engineers around the world, the 8051 continues to offer an excellent solution for an extremely wide variety of embedded systems and consumer electronic devices. Many MCU providers and IP core developers have created some extremely powerful MCUs that are fully compatible with the original 8051. The architecture and implementation of these new versions is so innovative that, as recently as a few years ago, no one would have believed that such power and performance could be possible. The original 8051 devices developed in 1980 supported a maximum clock frequency of 12MHz. By comparison, today’s state-of-the-art DQ8051 from DCD can now be easily clocked at more than 300MHz. Some people might say that it’s easy to overclock a CPU, but there’s much more to this than simply increasing the frequency of the system clock. First, we completely redesigned the architecture. After many years of work, we had developed an architecture that allowed our processor to perform tasks up to 25 times faster than the original 8051 running at the same frequency. The instruction set of our DQ8051 is exactly the same as conventional 8051 processors, but internally those instructions are executed in a completely different manner to assure the highest possible increase in performance. Let’s calculate just how fast the DQ8051 really is. As I said before, this architecture can execute applications up to 25 times faster than the original devices, even when running at the same 12MHz clock frequency. When we now take into account the fact that the DQ8051 can run at 300MHz, which is 25 times the clock frequency of the original devices, the result is a performance increase of 25 x 25 = 625. This means that for the original 8051 to achieve the same performance as the DQ8051, it would have to be clocked at 12 x 625 = 7,500MHz or 7.5GHz. And for those who will say, that it’s still not enough, there’s something more, because the numbers are even higher when we come to consider the latest and greatest version available on the market. As I mentioned in the beginning, DCD’s DQ80251 architecture achieves 66 times the speed of a traditional 8051 architecture. It can be also clocked at up to 300MHz. So once again we can ask ourselves: What would be the necessary frequency of the original 8051 to achieve the same performance? The result is 12 x 25 x 66 = 19,800MHz or 19.8GHz. During the past three decades, almost every year has brought improved 8051 microcontrollers to the market. There’s only one small “but” - the main criterion is that every new device has to be fully software compatible with the 8051 standard. This provides engineers with the ability to improve their applications and to renew their designs, making them faster and more powerful, without having to make any changes to the code written for the 8051 from decades before. 8051 rocks Moreover, 8051 IP Cores have much more advantages than “just” the performance. 51s soft IP cores can be implemented in both FPGAs and ASICS. This means that designers now have the ability to define their own configurations for the MCU. For example, engineers are no longer limited to just one UART, two or three counter-timers, and an insufficient quantity of interrupt lines. Now, we all can add extra UARTs, Ethernet MAC controllers, SPI, CAN, USB, I2C, and a host of other interfaces. Designers can also add their own dedicated hardware accelerator functions, thereby allowing them to further personalize their MCUs. Additionally, for applications that need it, the DQ8051 and DQ80251 cores can be equipped with arithmetical co-processors to make floating-point computations even faster. One might ask, is that all what you can do with 8051? The answer is easy, of course not, cause when you asked experienced IP Core provider about additional pieces of that puzzle, you can get a “brand new world”. It's been years since anyone interested in purchasing an IP core would say "OK, I just need an 8051 -- what's the performance and the price?" Now, pure IP is far less interesting in isolation. That's why, if you're selecting and integrating 8051 in your design, ask yourself or third-party vendor, about all of the additional stuff like peripherals, deliverables, and configurability issues. Let me based this example on the IP Cores I’m working with, cause they’ve been a part of my life since 1999, so they should be good representatives… 8051 configuration Let's say that some customer calls my company, Digital Core Design (DCD), and asks for an 8051 IP core. The first step would be to choose the most appropriate solution for his target application, you can see a brief comparison shown below:


Feature	DQ8051	DT8051	DP8051
Architecture SPEED	x 25.1	x 8.1	x 15.5
Dhrystone SPEEDCPU	0.18527 DMPIS/MHz	0.0763 DMIPS/MHz	0.106 DMIPS/MHz
Dhrystone SPEEDPWR	0.23650 DMPIS/Mhz(CPU+DPTRs)	N/A	0.146DMIPS/Mhz(CPU+DPTRs)
Gate countCPU	7250 ASIC Gates -CPU	3200 ASIC Gates – CPU	5900 ASIC Gates -CPU
Gate countPWR	8000 ASIC Gates –CPU+DPTRs	N/A	6450 ASIC Gates –CPU+DPTRs
CODE size	64 kB	64 kB	64 kB
CODE banking	YES	YES	YES
CODE writes	YES	YES	YES
Sync. CODE/XDATA	YES 2 spaces	YES up to 2 spaces	YES up to 3 spaces
Async CODE/XDATA	YES 2 spaces	YES up to 2 spaces	YES up to 3 spaces
XDATA size	16 MB	64 kB	16 MB
IDATA type	64B to 256B Dual port	64B to 256B Single port	64B to 256B Single port
Debugger	YES – JTAG DoCD made by DCD	YES – TTAG DoCD made by DCD	YES – JTAG (or TTAG) DoCD made by DCD
	5-wire interface	2-wire interface	5-wire interface
CODE/XDATA wait states	YES	NO	YES
Harward Architecture	YES	YES	YES
Von-Neuman Architecture	NO	YES	YES
Power Management(PMU)	STOP, PMM with Switchback	STOP, PMM with Switchback	STOP, PMM with Switchback
Number of clock trees	1	1	1
Scan test ready	YES	YES	YES
ASIC/FPGA proven	YES	YES	YES
Gate countFULL	9900 ASIC Gates - DQ8051	5600 ASIC Gates - DT8051	7650 ASIC Gates – DP8051
Included peripherals	32-bit PORT; Timers 0,1	8-bit PORT; Timers 0,1	32-bit PORT; Timers 0,1
for “Gate Count FULL”	CPU; UART0; PMU; INT 0-1	CPU; UART0; PMU; INT 0-7	CPU; UART0; PMU; INT 0-1

Once you have determined what combination of performance, size, and power consumption is best-suited to your project, we're ready for the next step, which is to select the necessary combination of peripherals. For example, the 8051 can be augmented with a wide variety of peripheral functions as follows:

DUSB2: USB 2.0 device including HID (human interface device), MS (mass storage), and audio devices
Parallel I/O ports
UART's
Timers?counters with compare capture
Watchdog timer
Power management unit
I2C bus interfaces, master and slave
Serial peripheral interface (SPI) master/slave
Floating point math coprocessors
Media access controller (DMAC)
32-bit multiply divide unit
Data pointers

Next, when you can see all of the functions you need on your list of peripherals, you should ask about configurability issues. Honestly speaking, configuration has never been easier thanks to the usage of the constants in the IP core package". An example is shown in the image below:

I think it's safe to say that the future of the IP core market belongs to the concept of the "superset core," which includes the main core along with any necessary peripherals along with other cores that play the role of any necessary subsystems. The end result is that the customer has access to an implementation-ready solution that is fully compatible with industry standards, but that offers totally non-standard performance. There is no wonder that today's third-party IP core vendors sell more differentiated IP than commoditized IP. And, if you decided to take 3rd party IP Core, you shouldn’t forget to ask about the deliverables and the documentation, like:

Synthesizable Verilog or VHDL source code for the core/peripheral
Verilog or VHDL test bench environments: (Active-HDL automatic simulation macros, NCSim automatic simulation macros, ModelSim automatic simulation macros, tests with reference responses...)
Technical documentation (installation notes, HDL core specifications, datasheets, instruction set details, test plan and code coverage reports…)
Synthesis scripts
Example applications and/or reference designs
Technical support (including implementation support; "x" months of maintenance; access to core updates and minor and major versions changes; access to documentation updates; phone and email support...

8051 on chip debugger Last but not least, very important for selection criteria of 8051 is a question of chip debugger – useful tool, which make your life and your design run much more smooth. Many times it’s not only a question of comfort, because the problem is that today's SoC designs are facing the problem of inaccessibility with regard to important control and bus signals. This is because these signals often lay behind the physical pins of the device, thereby making traditional measurement instrumentation useless. The best way to get around these limitations is to use on-chip debug tools for the verification and software debugging tasks. The other advantage of an on-chip debugger is its improved design productivity when provided as part of an integrated environment with a modern GUI (graphical user interface). Key elements that help to improve the design process and increase productivity are the ability to display/modify memory contents and processor/peripheral register windows, along with information tracing and the ability to "cross-probe" to see the related C/ASM source code. The ideal situation is to obtain the debugger together with the IP core -- all from one vendor. The things to look for in a debugger are features like real-time and non-intrusive debug capability, thereby enabling both pre-silicon validation and post-silicon, on-chip software debugging - all in one place. Moreover, modern debug software can work as a hardware debugger as well as a software simulator; some tasks can be validated at the software simulation level and -- following this step -- you can continue real-time debugging by uploading your code into the silicon. Furthermore, designers appreciate freedom of choice to choose their favorite C compilers or assemblers. Consider the following High Level Object files produced by C/ASM compiler tools like based on DCD's 8051 cores, for example:

Extended OMF-51 produced by the Keil compiler
OMF-51 produced by the Tasking compiler
OMF-51 produced by the Franklin compiler
Standard OMF-51 produced by some 8051 compilers
Extended OMF-251 produced by the Keil compiler
NOI format file produced by the SDCC-51 compiler
Intel HEX-51 format produced by every 8051 compiler
Intel HEX-386 format produced by every 80390 & 80251 compiler
BIN format produced by every 8051 & 80390 & 80251 compiler

Generally speaking, a complete debugging system should consist of three major blocks: a Debug IP Core, a hardware-assisted debugger, and associated debug software. Once again please excuse me that I’m basing this part on the DoCDTM Hardware Debugger supplied by DCD, but I know this solution from scratch, so it’ll be easier to explain each part this way. The DoCDTM Hardware Debugger provides debugging capability of a whole System on Chip (SoC). Unlike other on-chip debuggers, the DoCDTM provides non-intrusive debugging of a running application. It can also efficiently save designer’s time, thanks to hardware trace, called Instructions Smart Trace buffer (IST). The DoCD-IST captures instructions in a smart and non-intrusive way, so it doesn’t capture addresses of all executed instructions, but only these related to the start of tracing, conditional jumps and interrupts. This method does not only save time, but also allows to improve the size of the IST buffer and extend the trace history. Captured instructions are read back by the DoCD-debug software, analyzed and then presented to the user as an ASM code and related C lines. OK, when we’re familiar with a “must be” like IST, let’s look at hardware debugger’s components.

The Debug IP Core is a real-time hardware debugger, which provides access to all on-chip registers, memories, and peripherals that are connected to the core. This core can be used to monitor the CPU and control the way the CPU works using non-intrusive techniques. Depending on the designer's requirements, the Debug IP Core is provided as VHDL or Verilog source code, or as a CPLD/FPGA EDIF netlist.

Many SoC and FPGA designs have both power and area limitations, so it's useful that the Debug IP Core can be scaled to control gate count. The benefit is fewer gates for lower use of power and core size while maintaining excellent debug abilities. Typically, all of the features are utilized in pre-silicon debug (i.e., hardware debugging or FPGA evaluation), with a sub-set of features realized in the final silicon. DCD's debug software is a Windows-based application that is fully compatible with all existing C compilers and assemblers. It's been designed to work in two major modes: software simulator mode and hardware debugger mode. Pre-silicon software validation is performed in the simulation mode followed by the real-time debugging of developed software inside the silicon using the hardware debugger mode. Once loaded, the program may be observed in Source Window, run at full-speed, single stepped by machine or C-level instructions, or stopped at any specified breakpoints. Last, but certainly not least, there is the hardware-assisted debugger, which is connected to the target system containing the IP core, either in an FPGA or an ASIC/SoC. As is illustrated in the following image, this is a small hardware adapter that manages communication between the Debug IP Core (JTAG protocol) inside the silicon and a USB port on the host PC that is running the debug Software. OK, once we’ve got talked over every internal aspect of selecting criteria for 8051 IP Core, let’s end this paper with external write-up. Several improved versions of the 8051 are available right now. Some are available to any engineer who wishes to use them in projects. Others are produced as custom processors for internal use only. Many of these 8051 variants were designed by different IP vendors. That typically means different internal architectures -- the main requirement is that they are all compatible with the original 8051's instruction set. Each of these architectures may offer different improvement factors over the traditional 8051. How is it possible that modern versions of this microcontroller can execute the same instruction set and be clocked with the same frequency but offer much higher performance? In fact, designers who work on a new architecture for the 8051 must decompose each and every instruction executed by the processor into its basic factors. For example, the original processor architecture required 12 clock cycles to execute even the simplest instruction like a NOP (no operation). More complex operations required some multiple of 12 cycles. To speed up the instruction execution flow, we determine the actions a particular instruction will perform, and then we consider how to design the ALU (arithmetic logic unit) and the control unit responsible for internal operations to make the time required for instruction execution as short as possible. In the case of the NOP instruction, one has to ask, "Why should an instruction that doesn't actually do anything require so much time to execute?" I mean, 12 clock cycles to do nothing? Since we are in the depths of a worldwide economic crisis, this is far too long. Let's cut the execution time to a single clock cycle. Next, we look at more complex operations like the MUL (multiplication) instruction, which consumed a humongous 48 clock cycles in the original architecture. We must analyze what this instruction is required to do, ask what is required to perform an eight-by-eight bit multiplication in binary, and then try to find a better solution. Why not execute several non-overlapping steps in a single clock period? Our machinations reduced the MUL instruction to only two clock cycles. Similarly, we reduced a DIV (division) instruction from 48 clock cycles to six and an ADD (addition) instruction from 12 clock cycles to one. And so it goes, instruction by instruction. Of course, we can also add some features that will help the CPU fetch new instructions from memory. For example, we can use additional DPTRs (data page pointers) and automatic DPTR increment/decrement to speed up external memory addressing and accessing. The final result is to make the 8051 architecture approximately 15 times faster than the original for the DP8051 and more than 26 times faster in the case of the DQ8051. This is all when running at the same clock frequency. (The new architectures can actually be run up to 25 times faster than the original 8051.) 8051 universe The bottom line is that there are as many different approaches to improving the 8051 architecture as there are IP vendors working on the problem. Purely for the sake of interest, here are some comparisons of alternative implementations. First, let's consider performance (or processing power) as measured in DMIPS/MHz.

Next, let's consider the silicon area required to implement the different cores. In this case, we will use the number of equivalent ASIC (two-input NAND) gates as our metric. These values are readily available, and the area on the silicon for an ASIC/SoC, or the amount of lookup table (LUT) resources consumed in an FPGA, is a function of the number of equivalent gates.

I don't know about you, but like many people, I find it easier to visualize what these numbers mean when they are presented in the form of a graphical image, as shown below.

In both cases, the DQ8051 is approximately 21 percent smaller (that is, it requires 21 percent fewer equivalent gates) than its competitors. Of course, this also reduces power consumption. So as you can see, 21st century’s 8051 is something complete different than its ancestor, ready to be customized and enhanced by additional features. Moreover, it’s got one biiiiiig advantage: we all know 8051 (architecture). I bet that almost every designer worked, works or will be working on 51 based project. That’s why it’s easy to make it as a very useful tool in the environments which don’t need 32 or 64 bit architecture. It’s easy to compare it with super cars – we all love Ferrari, Viper or Lamborghini. But would you like to drive this car every day to your work, standing in the traffic jams and seeing just the bottom of your neighbor’s tyre? I bet you wouldn’t, cause other cars are much more convenient to use them on the day to day basis. The same story is with 8051 CPUs – they’re practically workhorses, very convenient to use them on the day to day basis. Just tailor it to your needs and enjoy trusted and functional design. Author: Jacek Hanke, CEO of Digital Core Design Images:© Digital Core Design