CryptoURANUS Economics

Anti-AdBlocker

Wednesday, October 14, 2020

FPGA Heterogeneous Self-Healing


FPGA Autonomous Acceleration Self-Healing



This example uses FPGA-in-the-Loop (FIL) simulation to accelerate a video processing simulation with Simulink® by adding an FPGA. The process shown analyzes a simple system that sharpens an RGB video input at 24 frames per second.
This example uses the Computer Vision System Toolbox™ in conjunction with Simulink® HDL Coder™ and HDL Verifier™ to show a design workflow for implementing FIL simulation.













Products required to run this example:
  • MATLAB
  • Simulink
  • Fixed-Point Designer
  • DSP System Toolbox
  • Computer Vision System Toolbox
  • HDL Verifier
  • HDL Coder
  • FPGA design software (Xilinx® ISE® or Vivado® design suite or Intel® Quartus® Prime design software)
  • One of the supported FPGA development boards and accessories (the ML403, SP601, BeMicro SDK, and Cyclone III Starter Kit boards are not supported for this example)
  • For connection using Ethernet: Gigabit Ethernet Adapter installed on host computer, Gigabit Ethernet crossover cable
  • For connection using JTAG: USB Blaster I or II cable and driver for Altera FPGA boards. Digilent® JTAG cable and driver for Xilinx FPGA boards.
  • For connection using PCI Express®: FPGA board installed into PCI Express slot of host computer.
MATLAB® and FPGA design software can either be locally installed on your computer or on a network accessible device. If you use software from the network you will need a second network adapter installed in your computer to provide a private network to the FPGA development board. Consult the hardware and networking guides for your computer to learn how to install the network adapter.
Note: The demonstration includes code generation. Simulink does not permit you to modify the MATLAB installation area. If necessary, change to a working directory that is not in the MATLAB installation area prior to starting this example.

1. Open and Execute the Simulink Model

Open the fil_videosharp_sim.mdl and run the simulation for 0.21s.

Due to the large quantity of data to process , the simulation is not fluent. We will improve the simulation speed in the following steps by using a FPGA-in-the-Loop.

2. Generate HDL Code

Generate HDL code for the Streaming Video Sharpening subsystem by performing these steps:
a. Right-click on the block labeled Streaming 2-D FIR Filter.
b. Select HDL Code Generation > Generate HDL for Subsystem in the context menu.
Alternatively, you can generate HDL code by entering the following command at the MATLAB prompt:
>> makehdl('fil_videosharp_sim/Streaming 2-D FIR Filter')
If you do not want to generate HDL code, you can copy pre-generated HDL files to the current directory using this command:
>> copyFILDemoFiles('videosharp');

3. Set Up FPGA Design Software

Before using FPGA-in-the-Loop, make sure your system environment is set up properly for accessing FPGA design software. You can use the function hdlsetuptoolpath to add ISE or Quartus II to the system path for the current MATLAB session.
For Xilinx FPGA boards, run
hdlsetuptoolpath('ToolName', 'Xilinx ISE', 'ToolPath', 'C:\Xilinx\13.1\ISE_DS\ISE\bin\nt64\ise.exe');
This example assumes that the Xilinx ISE executable is C:\Xilinx\13.1\ISE_DS\ISE\bin\nt64\ise.exe. Substitute with your actual executable if it is different.
For Altera boards, run
hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\altera\11.0\quartus\bin\quartus.exe');
This example assumes that the Altera Quartus II executable is C:\altera\11.0\quartus\bin\quartus.exe. Substitute with your actual executable if it is different.

4. Run FPGA-in-the-Loop Wizard

To launch the FIL Wizard, select Tools > Verification Wizards > FPGA-in-the-Loop (FIL)... in the model window or enter the following command at the MATLAB prompt:
>> filWizard;

4.1 Hardware Options

Select a board in the board list.

4.2 Source Files

a. Add the previously generated HDL source files for the Streaming Video Sharpening subsystem.
b. Select Streaming_2_D_FIR_Filter.vhd as the Top-level file.

4.3 DUT I/O Ports

Do not change anything in this view.

4.4 Build Options

a. Select an output folder.
b. Click Build to build the FIL block and the FPGA programming file.
During the build process, the following actions occur:
  • A FIL block named Streaming_2_D_FIR_Filter is generated in a new model. Do not close this model.
  • After new model generation, the FIL Wizard opens a command window where the FPGA design software performs synthesis, fit, place-and-route, timing analysis, and FPGA programming file generation. When the FPGA design software process is finished, a message in the command window lets you know you can close the window. Close the window.
c. Close the fil_videosharp_sim model.

5. Open and Complete the Simulink Model for FIL

a. Open the fil_videosharp_fpga.slx.
b. Copy in it the previously generated FIL block to fil_videosharp_fpga.slx where it say "Replace this with FIL block"

6. Configure FIL Block

a. Double-click the FIL block in the Streaming Video Sharpening with FPGA-in-the-Loop model to open the block mask.
b. Click Load.
c. Click OK to close the block mask.

7. Run FIL Simulation

Run the simulation for 10s and observe the performance improvement.

This concludes the Video Processing Acceleration using FPGA-In-the-Loop example.

FPGA Monero Working IP-Cores Shares


FPGA Monero Working IP-Cores Shares








Build Status

DownLoad - Click Here - SiaFpgaMiner

This project is a VHDL FPGA core that implements an optimized Blake2b pipeline to mine Siacoin.

Motivation

When CPU mining got crowded in the earlier years of cryptocurrencies, many started mining Bitcoin with FPGAs. The time arrived when it made sense to invest millions in ASIC development, which outperformed FPGAs by several orders of magnitude, kicking them out of the game. The complexity and cost of developing ASICs monopolized Bitcoin mining, leading to relatively dangerous mining centralization. Therefore, emerging altcoins decided to base their PoW puzzle on other algorithms that wouldn't give ASICs an unfair advantage (i.e. ASIC-resistant). The most popular mechanism has been designing the algorithm to be memory-hard (i.e. dependent on memory accesses), which makes memory bandwith the computing bottleneck. This gives GPUs an edge over ASICS, effectively democratizing access to mining hardware since GPUs are consumer electronics. Ethereum is a clear example of it with its Ethash PoW algorithm.
Siacoin is an example of a coin without a memory-hard PoW algorith and no ASIC miners some ASIC miners are being rolled out (see Obelisk and Antminer A3). So was a perfect candidate for FPGA mining! (more for fun than profit)

Design theory

To yield the highest posible hash rate, a fully unrolled pipeline was implemented with resources dedicated to every operation of every round of the Blake2b hash computation. It takes 96 clock cycles to fill the pipeline and start getting valid results (4 clocks per 'G' x 2 'G' per round x 12 rounds).
  • MixG.vhd implements the basic 'G' function in 4 steps. Eight and two-step variations were explored but four steps gave the best balance between resource usage and timing.
  • QuadG.vhd is just a wrapper that instantiates 4 MixG to process the full 16-word vectors and make the higher level files easier to understand.
  • Blake2bMinerCore.vhd instantiates the MixG components for all rounds and wires their inputs and outputs appropiately. Nonce generation and distribution logic also lives in this file.
  • /Example contains an example instantiation of Blake2bMinerCore interfacing a host via UART. It includes a very minimalist Python script to interface the FPGA to a Sia node for mining.

MixG

The diagram below shows the pipeline structure of a single MixG. Four of these are instantiated in parall to constitute QuadGs, which are chained in series to form rounds.
MixG logic
The gray A, B, C, D boxes contain combinatorial operations to add and rotate bits according to the G function specification. The white two-cell boxes represent two 64-bit pipelining registers to store results from the combinatorial logic that are used later on the process.

Nonce Generation and Distribution

Pipelining the hash vector throughout the chain implies heavy register usage and there is no way around it. Fortunately the X/Y message feeds aren't as resource-demanding because the work header can remain constant for a given block period, with the exception of the nonce field, which must obviously be changing all the time to yield unique hashes. Therefore, the nonce field must be tracked or kept in memory for when a given step in the mixing logic requires it. The most simplistic approach would be to make a huge N-bit wide shift register to "drag" the nonce corresponding to each clock cycle across the pipeline. This is not an ideal solution, for we would require N flip-flops (e.g. 48-bit counter) times the number of clock cycles it takes to cross the pipeline (48 x 96 = 4608 FF!)
Luckily, the nonce field is only used once per round (12 times total). This allows hooking up 12 counters statically to the X or Y input where the nonce part of the message is fed in each round. To make the counter output the value of the nonce corresponding to a given cycle, the counters' initial values are offset by the amount of clock cycles between them. The following diagram illustrates the point:
Nonce counters
In this case the offsets show that the nonce used in round zero will be consumed by round one 8 clock cycles after, by round two 20 cycles after, and so on. (The distance in clock cycles between counters is defined by the Blake2b message schedule)

Implementation results

It is evident that a single core is too big to fit in a regular affordable FPGA device. A ballpark estimate of the flip-flop resources a single core could use:
  • 64-bits per word x 16 word registers per MixG x 4 MixG per QuadG x 2 QuadG per round x 12 rounds = 98,308 registers (not considering nonce counters and other pieces of logic).
The design won't fit in your regular Spartan 6 dev board, which is why I built it for a Kintex 7 410 FPGA. Here are some of my compile tests:
CoresClockHashrateMix stepsStrategyUtilizationWorst Setup SlackWorst Hold SlackFailuresNotes
12002004Default18.00%0.1680
22004004Default38.00%0
32006004Default56.00%-0.246602 failing endpoints
32006004Explore56.00%-0.2460.011602 failing endpoints
3166.67500.014Default56.00%0.1320.020
4166.67666.684Default75.00%0.0510.0090
51668304ExplorePlacing error
4173.33693.324Explore75.00%0.03900
4173.33693.324Explore75.00%0.170.02201 BUFGs per core
As seen in the table, the highest number of cores I was able to instantiate was 4 and the highest clock flequency that met timing was 173.33 MHz.
~700 MH/s is no better than a mediocre GPU, but power draw is way less! (hey, I did say it was for fun)

Further work

  • Investigate BRAM as alternative to flip-flops (unlikely to fit the needs of this application).
  • Fine-tune a higher clock frequency to squeeze out a few more MH/s.
  • Porting to Blake-256 for Decred mining. That variant adds two rounds but words are half as wide, so fitting ~2x the number of cores sounds possible.
  • Do more in-depth tests with different number of steps in the G function (timing-resources tradeoff).
  • Play more with custom implementation strategies.

Resources



Reference: Xilinx-Vivado/YosysHQ/yosys All Free Synthesis VHDL


YosysHQ/yosys


By EoptEditor 0


Xilinx Vivado Made Free Synthesis. En
Document your code; Every project on GitHub comes with a version-controlled wiki to give your documentation the high level of care it deserves. It’s easy to create well-maintained, Markdown or rich text documentation alongside your code.

Migrating from Vivado:

Aleks-Daniel Jakimenko-Aleksejev edited this page · 4 revisions

This page is WIP.

At this point it is not possible to work with Xilinx FPGAs by using only free software. If you are looking for a full free software toolchain for working with FPGAs, see Project IceStorm. That being said, most of your workflow can still be done using Yosys, Icarus Verilog and other free software tools. You will have to use Vivado for place&route, bitstream generation and writing your bit file onto your device. However, this can be done by using tcl scripts, meaning that you will not have to open Vivado GUI at all. This page will show how to get commonly used Vivado functionality with Yosys.

Elaborated Design Schematic / RTL Schematic:

All you have to do is load your Verilog source files and run prep. Then, use show to see parts that are of any interest to you. You probably also want to use -colors and -stretch flags to make the graph a bit more readable. Therefore, the command you want to use is: yosys -p 'prep; show -colors 42 -stretch show top' top.sv foo.sv bar.sv / You can also export this graph directly to SVG file: yosys -p 'prep; show -colors 42 -stretch -format svg -prefix mygraph show top' top.sv foo.sv bar.sv












Bitstream and Programming:

You can run Vivado in batch or tcl modes. The difference is that in batch mode it will run the script and exit, while in tcl you will be left with the tcl shell. The problem with Vivado is that it has a very long startup delay, therefore running it in batch mode is very likely not what you want (but you can still do it, if you wish).
  1. place&route and bitstream generation. This script does not have open_hw command, so perhaps consider adding it (otherwise you will get an error message).
  2. writing the bitstream file to your device



The first one is where all of the magic happens. Feel free to add a couple of other commands, for example report_power. You may also want to modify the second file if you are working with multiple devices at the same time. You will also need an .xdc file (you are probably already aware of it). See this example. You can use Vivado GUI to generate it, or you can just write it by hand. The structure of the file is simple enough so there should be no problem.
So, you can run it in batch mode: vivado -mode batch -source run_vivado.tcl / Or, you can run it tcl mode: vivado -mode tcl
Once it is loaded, you will see the tcl shell. Write source run_vivado.tcl to run your tcl script. The latter approach might be slightly more preferable to you if you do not like the startup delay of vivado. / Both examples assume that you have vivado binary in your PATH. If you don't, feel free to substitute it with an actual path (e.g. ~/opt/Xilinx/Vivado/2016.2/bin/vivado).



























Below is the ToDo List:

Simulation

Wave Viewer

Post-Synthesis Simulation

Synthesized Design Schematic / Technology Schematic

Makefile to the Rescue!

Conclusion ?

Recent Changes | Atom

Pages 6


Sunday, February 9, 2020

Ref: VHDL Tutorial: Learn by Example

We are the present moment, we are love.


 
 VHDL Tutorial: Learn by Example
-- by Weijun Zhang, July 2001

NEW BOOKS, See the new books on: FPGA, Digital Design: Online Interactive zyBook, HDL, VHDL, Verilog, System-Virlog

If we hear, we forget; if we see, we remember; if we do, we understand.
                                                                      --  Proverb
ESD book | VHDL Projects | VHDLReference | Auburn.edu_SynopsysTutorial | ActiveHDLTutorial | VHDL_Online_Tutorial
  Table of Contents
Foreword
Basic Logic Gates
Combinational Logic Design
Typical Combinatinal Logic Components
Latch and Flip-Flops
Sequential Logic Design
Typical Sequential Logic Components
Custom Single-Purpose Processor Design
General-Purpose Processor Design
Appendix: Modeling an industry core


Foreword (by Frank Vahid)
<> HDL (Hardware Description Language) based design has established itself as the modern approach to design of digital systems, with VHDL (VHSIC Hardware Description Language) and Verilog HDL being the two dominant HDLs. Numerous universities thus introduce their students to VHDL (or Verilog). The problem is that VHDL is complex due to its generality. Introducing students to the language first, and then showing them how to design digital systems with the language, tends to confuse students. The language issues tend to distract them from the understanding of digital components. And the synthesis subset issues of the language add to the confusion.
We developed the following tutorial based on the philosophy that the beginning student need not understand the details of VHDL -- instead, they should be able to modify examples to build the desired basic circuits. Thus, they learn the importance of HDL-based digital design, without having to learn the complexities of HDLs. Those complexities can be reserved for a second, more advanced course. The examples are mostly from the textbook Embedded System Design by Frank Vahid and Tony Givargis. They start from basic gates and work their way up to a simple microprocessor. Most of the examples have been simulated by Aldec ActiveHDL Simulator and Synopsys Design Analyzer, as well as synthesized with Synopsys Design Compiler . Several sequential design examples have been successfully tested on Xilinx Foundation Software and FPGA/CPLD board.



Basic Logic Gates
(ESD Chapter 2: Figure 2.3)
Every VHDL design description consists of at least one entity / architecture pair, or one entity with multiple architectures. The entity section of the HDL design is used to declare the I/O ports of the circuit, while the description code resides within architecture portion. Standardized design libraries are typically used and are included prior to the entity declaration. This is accomplished by including the code "library ieee;" and "use ieee.std_logic_1164.all;".
 Driver 
 Behavior Code
Behavior Simulation
 Inverter 
 Behavior Code
Behavior Simulation
 OR gate
 Behavior Code
Behavior Simulation
 NOR gate 
 Behavior Code
Behavior Simulation
AND gate
Behavior Code
Behavior Simulation
NAND gate
Behavior Code
Behavior Simulation
 XOR gate
 Behavior Code
Behavior Simulation
 XNOR gate
 Behavior Code
Behavior Simulation



Combinational Logic Design
(ESD Chapter 2: Figure 2.4)
We use port map statement to achieve the structural model (components instantiations). The following example shows how to write the program to incorporate multiple components in the design of a more complex circuit. In order to simulate the design, a simple test bench code must be written to apply a sequence of inputs (Stimulators) to the circuit being tested (UUT). The output of the test bench and UUT interaction can be observed in the simulation waveform window.
Combinational Logic
Behavior Code
Test Bench
 Behavior Simulation
Synthesis Schematic
Gate-level Simulation
Tri-State Driver
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation

Discussion I: Signal vs. Variable Siganls are used to connect the design components and must carry the information between current statements of the design. On the other hand, variables are used within process to compute certain values. The following example shows their difference: 
 
Signal/Variable Example
Behavior Code
Behavior Simulation



Typical Combinational Components
(ESD Chapter 2: Figure 2.5)
The following behavior style codes demonstrate the concurrent and sequential capabilities of VHDL. The concurrent statements are written within the body of an architecture. They include concurrent signal assignment, concurrent process and component instantiations (port map statement). Sequential statements are written within a process statement, function or procedure. Sequential statement include case statement, if-then-else statement and loop statement.
 Multiplexor
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation
 Decoder
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation
 Adder
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation
 Comparator
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation
 ALU
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation
Multiplier
Behavior Code
Test Bench
Behavior Simulation
Synthesis Schematic
Gate-level Simulation



Latch & Flip-Flops
(ESD Chapter 2.3)
Besides from the circuit input and output signals, there are normally two other important signals, reset and clock, in the sequential circuit. The reset signal is either active-high or active-low status and the circuit status transition can occur at either clock rising-edge or falling-edge. Flip-Flop is a basic component of the sequential circuits.
 Simple Latch
Behavior Code
 Test Bench
Behvaior Simulation
Gate-level Implementation
Gate-level Simulation
D Flip-Flop
Behavior Code
Test Bench
Behavior Simulation
Gate-level Implementation
Gate-level Simulation
JK Flip-Flop
Behavior Code
Test Bench
Behavior Simulation
Gate-level Implementation
Gate-level Simulation



Typical Sequential Components
(ESD Chapter 2: Figure 2.6)
Typical sequential components consist of registers, shifters and counters. The concept of generics is often used to parameterize these components. Parameterized components make it possible to construct standardized libraries of shared models. In the behavioral description, the output transitions are generally set at the clock rising-edge. This is accomplished with the combination of the VHDL conditional statements (clock'event and clock='1'). During the testbench running, the expected output of the circuit is compared with the results of simulation to verify the circuit design.
 Register
Behavior Code
 Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Structural Simulation
Shift Register
Behavior Code
 Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Structural Simulation
 Counter
Behavior Code
 Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Structural Simulation



Sequential Logic Design
(ESD Chapter 2: Figure 2.7)
The most important description model presented here may be the Finite State Machine (FSM). A general model of a FSM consists of both the combinational Logic and sequential components such as state registers, which record the states of circuit and are updated synchronously on the rising edge of the clock signal. The output function computes the various outputs according to different states. Another type of sequential model is the memory module, which usually takes a long time to be synthesized due to the number of design cells.
FSM Model
Behavior Code
Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
    • Memories (ESD Chapter 5)
RAM Module
Behavior Code
Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
ROM Module
Behavior Code
Test Bench
Behavior Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation

Discussion II: Behavior vs. RTL Synthesis (Y Chart) RTL stands for Register-Transfer Level. It is an essential part of top-down digital design process. Logic synthesis offers an automated route from an RTL design to a Gate-Level design. In RTL design a circuit is described as a set of registers and a set of transfer functions describing the flow of data between the registers, (ie. FSM + Datapath). As an important part of a complex design, this division is the main objective of the hardware designer using synthesis. The Synopsys Synthesis Example illustrates that the RTL synthesis is more efficient than the behavior synthesis, although the simulation of previous one requires a few clock cycles. 
 
GCD Caculator 
Behavior Code
RTL Code (FSM+D)
Comparison
Following section illustrates the RTL (FSM+Datapath) method further using several design examples.





Custom Single-Purpose Processor Design
(ESD Chapter 2, Chapter 4)
The first three examples illustrate the difference between RTL FSMD model (Finite State Machine with Datapath buildin) and RTL FSM + DataPath model. From view of RT level design, each digital design consists of a Control Unit (FSM) and a Datapath. The datapath consists of storage units such as registers and memories, and combinational units such as ALUs, adders, multipliers, shifters, and comparators. The datapath takes the operands from storage units, performs the computation in the combinatorial units, and returns the results to the storage units during each state. This process typically takes one or two clock cycles.  Data-flow (looks more like an Algorithm) modeling is presented in the fourth example. The FIR digital filter algorithm is simulated and synthesized using VHDL. A comparison of the coding styles between the RTL modeling and Algorithm level modeling highlights the different techniques.
    • GCD Calculator (ESD Chapter2: Figure 2.9-2.11)
FSMD Modeling
RTL Code
Test Bench
RTL Code Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
 FSM + Datapath Modeling
RTL Code
Test Bench
RTL Code Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
    • Simple Bridge (ESD Chapter 2: Figure 2.13-2.14)
FSMD Modeling
RTL Code
Test Bench
RTL Code Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
FSM + Datapath Modeling
RTL Code
Test Bench
RTL Code Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
    • ISA Bus Interface (ESD Chapter 4, Chapter 6)
FSM + Datapath Modeling
RTL Code
Test Bench
RTL Code Simulation
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
    • FIR Digital Filter (DSP Example)
Data-Flow Modeling
Behavior Code
Test Bench
Behavior Simulation(1,2)
Gate-level Implementation
Synthesis Schematic
Gate-level Simulation
 

Discussion III: Synopsys Power Analysis
Synopsys tools can be used to perform Power Analysis for all the VHDL designs. Generally, the better design has smaller power consumption. On the other hand, improve the power always means sacrificing other design metrics such as performance, area size or NRE cost. Therefore, a designer need to balance these metrics to find the best implementation for the given application and constraints. Please check out the power analysis results of Adder, Counter, ISA controller, Bridge controller and FIR Filter. As we expected, FIR digital filter has the biggest power consumption because it has a more complex circuit doing DSP computation. Synopsys power analysis tutorial can be found here.
Discussion IV: Synthesis withTiming Constraints When we design and simulate the high-level (either behavior or RTL) code, we only care about design functionality. However, in VHDL synthesis, the timing and the functionality of a design must always be considered together. Therefore, once the design has been synthesized, the second goal of simulation is to quickly verify that the gate-level implementation meets timing requirements. We use this idea (coding -> simulation -> synthesis -> simulation) to test all of the examples in this tutorial. 
Another common way is to apply the timing constrains on the design during synthesis. then the timing report is checked to see if the slack, which is the required delay minus the actual delay, is MET or VIOLATED. If VIOLATED, we should go back to the VHDL code and re-write it to improve timing. The whole design will be compiled and tested again. 
 
Counter
Behavior Code
Sythesis Script File
Timing Report
Discussion V: Relationship between Area and Timing
During Synopsys synthesis, ordinary combinational logic will go through several of what are known as mapping optimizations. In a normal optimization, the synthesis tool will optimize in relation to the set constrains. It is usual to talk about moving along a "banana curve" on the area and time axes. This means that the tougher the timing constrains, the larger the design will be, and vice versa. The results from two different synthesis constrains applied on the same design are shown below.
 
FIR filter
Sample Synthesis Script
Comparison Table
Banana Curve


General-Purpose Processor Design
(ESD Book Chapter 3, Figure 3.15)
As indicated in the previous part, an Application Specific Integrated Circuit (ASIC) is specified with behavior descriptions which are presented in the form of particular algorithm or flowchart. A general purpose processor, on the other hand, is specified completely by its instruction set (IS). A sequence of instructions is required for the computation of a mathematical expression or any other similar computational task. To illustrate the whole procedure, a simple Pseudo-Microprocessor model is used which contains seven instructions (ESD book figure 3.7). The RT level design method from previous examples is used again to construct this microprocessor. The CPU will fetch, decode, and execute each instruction in order to get the final result.  For test purposes, a short program (sequential instructions) is loaded into the memory. After execution, this program will obtain 10  Fabonacci Numbers, and store the results into specific memory address. The design was implemented using Active-HDL and Synopsys Design Compiler. (Please note that PC.vhd need a little modify to get correct synthesis result. Just a practice for the reader.)
 
Top Level Structural Code
microprocessor.vhd
Design Hierarchy
cpuhierarchy.jpg
Test Bench of CPU Design
TB_mp.vhd
Control Unit
ctrl_unit.vhd
Block Diagram
cpublock.jpg
Data Path
datapath.vhd
controller.vhd  PC.vhd
IR.vhd   bigmux.vhd
Memory
memory.vhd
smallmux.vhd reg_file.vhd
alu.vhd obuf.vhd
Synopsys Sythesis Script Files
cpusyn.scr
syn_ctrl_unit.inc, syn_PC.inc, syn_IR.inc, syn_datapath.inc, syn_reg_file.inc
syn_alu.inc, syn_memory.inc
syn_controller.inc, syn_obuf.inc, syn_bigmux.inc, syn_smallmux.inc
Behavor Simulation 
Script File
cpusim.scr
Simulation Waveform
cpusim1.jpg cpusim2.jpg
Gate-Level Simulation 
Script File
cpugatesim.scr

Discussion V: VHDL vs. Verilog There are now two industry standard hardware description languages, VHDL and Verilog. It is important that a designer knows both of them although we are using only VHDL in class. Verilog is easier to understand and use. For several years it has been the language of choice for industrial applications that required both simulation and synthesis. It lacks, however, constructs needed for system level specifications. VHDL is more complex, thus difficult to learn and use. However it offers a lot more flexibility of the coding styles and is suitable for handling very complex designs. Here is a great article to explain their difference and tradeoffs. 



Appendix: Modeling a real industry chip - HD 6402
(ESD Chapter 4)
 
I. Specification of HD 6402 II. Behavior Modeling of UART Transmitter 
(1) Behavior Code (2) Gate-level design (3) Test Benches - 1, 2, 3 (4) Synopsys Simulation 
Case#1: one 8-bit word, 1 start, 2 stops, and even parity, or Data=11000101, Control Word=11011.  ( Gate-level Simulation )
Case#2: three 5-bit words, 1 start, 1 stop, and no parity, or Data=11010 & 00101 & 10001, Control Word=00100.  ( Gate-level Simulation )
Case#3: two 6-bit words, 1 start, 2 stops, and odd parity, or Data=110010 & 101101, Control Word=01000.  ( Gate-level Simulation )
III. Behavior Modeling of UART Receiver 
(1) Behavior Code (2) Gate-level design (3) Test Benches - 1, 2, 3 (4) Synopsys Simulation
Case#1: two 6-bit words, 1 start, 2 stops, and even parity, (Data=111001 & 100101, Control Word=01101). ( Gate-level Design Simulation )
Case#2: one 8-bit words, 1 start, 1 stop, and odd parity, (Data=10111001, Control Word=11000). ( Gate-level Design Simulation )
Case#3: three 5-bit words, 1 start, 1 stop, and no parity, (Data=01001 & 01110 & 00100, Control Word=00010. ( Gate-level Design Simulation )
IV. Structural Modeling of HD-6402 
(1) Behavior Code (2) Gate-level design (3) Test Bench (4) Synopsys Simulation

Created by Weijun Zhang (weijun_92507@yahoo.com)
at UC, Riverside, 06/2001