CryptoURANUS Economics: 07/26/19


Friday, July 26, 2019

CryptoMining via FPGA

CryptoMining via FPGA's:

FPGA for Dummies & Experts Alike!

The ASICs had overtaking GPU mining, but an alternative to ASIC mining was born. 

The new wolfpack leader appears as the Field Programmable Gate Arrays, or shortly FPGA and it is taking over very fast.

The only issue here is the boards are difficult to find each passing month.

Disclaimer: this post is not sponsored by any company nor have any referral links.


Let’s look at why FPGA is interesting for mining.

The Two Main Issues FPGA Are Meant to Solve Cryptocurrencies are volatile and unstable in the current market, August-2018. 


Cryptocurrency market have been jumping from Ethereum to Monero to Zcash, back and forth, depending on the volatility of coin profitability. 

The ASICs storming the mining pool strategy is to buy an ASIC miner and pray that it pays off in time. GPU mining  and the amount of coins you can mine is limited and people find this unsatisfactory 75% of the time.

The ASIC's issue is it offers zero flexibility when it comes to a single coin that can be mined and no other type of cryptocurrency coin. 


An ASIC is hard-wired to mine one algorithm type of coin only. 

Highest End, Lowest Cost:

Ultra96 is an Arm-based, Xilinx Zynq UltraScale+ MPSoC development board based on the Linaro 96Boards specification. 

The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers. 


Ultra96 represents a unique position in the 96Boards community with a wide range of potential peripherals and acceleration engines in the programmable logic that is not available from other offerings. 

Ultra96 boots from the provided Delkin 16 GB MicroSD card, pre-loaded with PetaLinux. 


Engineers have options of connecting to Ultra96 through a Webserver using integrated wireless access point capability or to use the provided PetaLinux desktop environment which can be viewed on the integrated Mini DisplayPort video output. 


Multiple application examples and on-board development options are provided as examples. 


Ultra96 provides four user-controllable LEDs. 


Engineers may also interact with the board through the 96Boards-compatible low-speed and high-speed expansion connectors by adding peripheral accessories such as those included in Seeed Studio’s Grove Starter Kit for 96Boards. 


Micron LPDDR4 memory provides 2 GB of RAM in a 512M x 32 configuration. Wireless options include 802.11b/g/n Wi-Fi and Bluetooth 4.2 (provides both Bluetooth Classic and Low Energy (BLE)). 


UARTs are accessible on a header as well as through the expansion connector. JTAG is available through a header (external USB-JTAG required). I2C is available through the expansion connector. 


 Ultra96 provides one upstream (device) and two downstream (host) USB 3.0 connections. A USB 2.0 downstream (host) interface is provided on the high speed expansion bus. 


Two Microchip USB3320 USB 2.0 ULPI Transceivers and one Microchip USB5744 4-Port SS/HS USB Controller Hub are specified. 


 The integrated power supply generates all on-board voltages from an external 12V supply (available as an accessory).

What’s the Third Option: There is always other option$. 


FPGA is the hardware taking over the market by storm. This is the new favorite in the cryptocurency mining community. 


FPGA have been around since 1979. 


They heavily used in science, vehicle modeling and even military deployment applications.


The first manufacturer of these devices is an American technology company called "Xilinx"


Years followed and another American company called Altera, (now owned by Intel), has joined the industry and has been the main Xilinx competitor since then.


The development of FPGA circuitboards have been welcomed very in many industries and the demand FPGA hardware technology is still booming. 

In 2013, the market for FPGA circuit boards was $6.1 billion and estimated $21.3 billion by year 2020.

Why FPGA Have not been used in cryptocurrency mining:

Since Bitcoin became popular, average people tried to mine with FPGA, but failed because they did not have the programming skill sets to utilize the FPGA circuit-boards.


The only people have mined FPGA circuit-boards were large mining exchanges, and they kept the FPGA circuit boards a secret for years.


When the first open source FPGA Bitcoin miner was released from private sectors until May 20, 2011. 

Juan Antonio Ernesto's Great Adventure:

A man named Juan Antonio Ernesto's, [his named was changed here to protect his immigration innocence], from Tijuana Mexico, who illegally migrated into Canada from Mexico, and was hired by Canada's silicon-valley, (in Waterloo Ontario).

After three years Juan left Canada, because of bias Canadian in that country, so Juan Antonio said, he had to leave to U.S..

When Juan entered the U.S. he was immediately granted citizenship by the U.S. government, because enrolled into U.S. college for free and acquired his CS-masters degree from NMT Socorro New-Mexico under his real name.

While he was in Socorro NM he and other Mexicans-Americans secretly managed the cryptocurrency mining software exchanged named "Macho-Rio-Grande".

The Mexican trio enabled the FPGA Xilinux cards workable and online usable for cryptocurrency mining.

The trio secretly made millions of dollars and they were the only private-public sector aware of this technology.

When Juan returned with millions of dollars of wealth to share with his friends in Canada he died from a fatal gunshot wound by MS13 in Vancouver Canada on the highway of tears.

Juan's bank accounts and cryptocurrencies was transferred before his death never to be found again.

Juan's friends ran to Mexico with the FPGA software technology and where also found dead months later and their accounts where all transferred the same.

All deaths related to these events where determined suicide, and the mystery of their deaths continues as everyone suspects MS13 hackers.

NOW, there are three reasons why FPGA circuit-boards have never really made it to the masses until today and the above mentioned is the first.

The Reason #1, is the lacking of non-programmable flexibility and software to architecture specifics. 

FPGA boards are not easy to software program, and they can be programmed to mine cryptocurrency. 

In order to use a FPGA board you must have hardware and software programming abilities.

The GPU works differently and the only changes enabled is to tweak the clock speed, and mining software.

The FPGA circuit-board has got to be programmed in raw-code from scratch in order to mine cryptocurrency. Writing the code in Verilog or VHDL language -and– neither Python nor C++ works, but only Verilog or VHDL languages.

Only dedicated programmers are capable managing this task from beginning equation to end resolved solution.

The Reason #2, is the creation of the first ASIC for mining cryptocurrencies, Unlike FPGA, was an ASIC hard-coded as a plug and play hardware only and not reprogrammable. 

Anyone can use an ASIC Miner-Box. There were a lot of alternatives to ASIC mining-box. Computer programmers have had the option of the GPU rigs and resolved into mining lesser coins than an FPGA circuit-board capable of.

The ASIC miner-box's are dominating the mining pools and Personal Computer Graphic Card GPU's are now less used technology.

The FPGA are becoming the average miner hardware these days.

There are several reasons FPGA are way faster. 

The FPGA circuit-board cards perform 3x to 100x times more efficient than GPU while having the same wattage power voltage draw saving hundred$

Depending on the algorithm matched to bitcoin, FPGA never fall behind ASICs miner-box's.

Upsides of FPGA

+ Compatibility for all mining currencies provided you are a  flexibility with Verilog or VHDL programming languages, or have partnered up with a programmer regards all cryptocurrency mining algorithms. There are no soft-forks affecting mining operations provided the programmer updates FPGA bitstream.

+ Extreme power efficiency compared to CPU's and GPUs.

Downsides of FPGA

+ FPGA have to be plugged into operational computers, just like GPUs.

+ Xilinux Vertex FPGA available to the mainstream for now (there are some exceptions though, which is why this article exists, more on that below).

+ Are quite pricey compared to GPUs.

+ Can be slightly outperformed by ASICs depending on the algorithm.


The Bitstream is the program written on a low-level programming language known as Verilog or VHDL that tells the FPGA what to do.

If you want to mine a specific algorithm you must have a bitstream that tells the FPGA how to mine that specific algo.

Bitstreams are loaded to the FPGA once the system boots. The bitstream is loaded into the volatile FPGA RAM memory. This is the same DDR4 memory – the FPGA model people use for mining has got 64GB of it. 

This huge amount of RAM allows the FPGA to store hundreds of bitstreams and switch between those in fractions of a second.

This functionality allows an FPGA to mine algorithms such as Timetravel10, X11Evo, X16R and X16S that require the chip to switch between various “lesser” hashing algorithms every few minutes.

While the bitstream can be changed in a fraction of a second, the board can still mine only one algorithm at a time with a couple rare exceptions.

Anyone can create bitstreams for existing mining algos and Zetheron (name of the company) will be collecting a fixed fee on behalf of the developers. This will ensure:

  • safety to the developers of bitstreams – they will get paid for their work and

  • no entry fee for FPGA owners – you pay only if the bitstream you have downloaded works

  • Plus, the access to a diversity of community-made bitstream will certainly guarantee that we will be able to mine virtually any algo and fork we want.

As for today, Zethereon has working bitstream for Cryptonote and Lyra2z algos. “The current plan is to release approximately one algorithm per month, until all major algorithms in the above table have been covered.”  – Zetheron writes. Here is a table of all the planned coins for the VU9P FPGA.

This means that thanks to the work those guys did, we will now have a seamless, pretty much plug-and-play experience when using our FPGA boards.

The ecosystem Zetheron is creating will give us all the bitstream solutions we need to mine any popular algorithm we want without the need to know anything about programming. Plus, developers will be motivated to push the plank higher and create better bitstreams.



Ultra96 Monero Miner

Ultra96: Defined

Avnet Ultra96
Price: $249
Part Number: AES-ULTRA96-G
Device Support:
Zynq UltraScale+ MPSoC
 Vendor: Avnet
The Future is AVNET...

Program Tier:

  1. Premier
  2. View Partner Profile

Product Description:

Ultra96™ is an ARM-based, Xilinx Zynq UltraScale+™ MPSoC development board based on the Linaro 96Boards specification. The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers. Ultra96 represents a unique position in the 96Boards community with a wide range of potential peripherals and acceleration engines in the programmable logic that is not available from other offerings.

Key Features and Benefits:

  • Linaro 96Boards Consumer Edition compatible
  • 85mm x 54mm form factor
  • 60-pin 96Boards High speed expansion header
  • 40-pin 96Boards Low-speed expansion header
  • 2x USB 3.0, 1x USB 2.0 Type A downstream ports
  • 1x USB 3.0 Type Micro-B upstream port
  • Mini DisplayPort (MiniDP or mDP)
  • Wi-Fi / Bluetooth
  • Delkin 16 GB MicroSD card + adapter
  • Micron 2 GB (512M x32) LPDDR4 Memory
  • Xilinx Zynq UltraScale+ MPSoC ZU3EG A484

What's Included:

  • 16 GB pre-loaded microSD card + adapter
  • Quick-start instruction card
  • Ultra96 development board
  • Voucher for SDSoC license from Xilinx


  • The Ultra96 is the Arm-based, Xilinx Zynq UltraScale+ MPSoC development board based on the Linaro 96Boards specification.
  • We sat down with Robert Wolff and Sahaj Sarup from the 96Boards team within Linaro, to talk about the new Ultra96 FPGA board.
  • This powerful, adaptable single-board computer runs PetaLinux, and is perfect for flexible application development within image processing, AI, and more.
Part 2: Demos –
Part 3: SDSoC –

The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers.

Ultra96 represents a unique technological position in the 96Boards tier community with a wide range of potential peripherals and acceleration-software engines.

This is all umbrella by the programmable logic that is not available from other offerings other-than the tier group community standardized; CISCO and what you qualify for in consumer sales.

Ultra96 boots from the provided Delkin 16 GB Micro-SD card, pre-loaded with Peta-Linux Operating-System.

Engineers have options of connecting to Ultra96 through an intranet/internal Webserver using integrated wireless access point capability or to use the provided Peta-Linux desktop environment.

The Peta-Linux desktop environment can be viewed on the integrated Mini DisplayPort video output onto a PC and-or MAC hardware monitor interface.

Multiple application examples and on-board development options are provided as examples.

Ultra96 provides novice to professional four user-controllable LEDs.

Engineers may also interact with the board through the 96Boards-compatible low-speed and high-speed expansion connectors by adding peripheral accessories such as those included in Seeed Studio’s Grove Starter Kit for 96Boards.

Micron LPDDR4 memory provides 2 GB of RAM in a 512M x 32 configuration. Wireless options include 802.11b/g/n Wi-Fi and Bluetooth 4.2 (provides both Bluetooth Classic and Low Energy (BLE)). UARTs are accessible on a header as well as through the expansion connector. JTAG is available through a header (external USB-JTAG required). I2C is available through the expansion connector.

Ultra96 provides one upstream (device) and two downstream (host) USB 3.0 connections. A USB 2.0 downstream (host) interface is provided on the high speed expansion bus. Two Microchip USB3320 USB 2.0 ULPI Transceivers and one Microchip USB5744 4-Port SS/HS USB Controller Hub are specified.
The integrated power supply generates all on-board voltages from an external 12V supply (available as an accessory).



  • Xilinx Zynq UltraScale+ MPSoC ZU3EG SBVA484.
  • Micron 2 GB (512M x32) LPDDR4 Memory.
  • Delkin 16 GB MicroSD card + adapter.
  • Pre-loaded with PetaLinux environment.
  • Wi-Fi / Bluetooth.
  • Mini DisplayPort (MiniDP or mDP).
  • 1x USB 3.0 Type Micro-B upstream port.
  • 2x USB 3.0, 1x USB 2.0 Type A downstream ports.
  • 40-pin 96Boards Low-speed expansion header.
  • 60-pin 96Boards High speed expansion header.
  • 85mm x 54mm form factor.
  • Linaro 96Boards Consumer Edition compatible.

 Target Applications:

  • Artificial Intelligence.
  • Machine Learning.
  • IoT/Cloud connectivity for add-on sensors.

 Optional Add-On Items:

  •  External 2.0A @ 12V power supply
  •  USB-to-JTAG/UART pod (coming soon)
  •  Seeed Studios Grove Starter Kit for 96Boards
  •  Compatible Accessories


 Other Qualified microSD Cards:

  •  Delkin Utility MLC 128 GB microSD Card

Squirrels Research CVP-13_FPGA


Squirrels Research Labs and BittWare Launch World's Most Powerful Cryptocurrency FPGA Card.


 --- Squirrels Research Labs (SQRL) recently partnered with BittWare, a Molex company and provider of high-performance computer boards, to offer the world's most powerful FPGA cryptocurrency mining hardware, the BittWare CVP-13.

The CVP-13 uses the largest Virtex UltraScale+ FPGA chip available from Xilinx Inc. XLNX, -1.10%

"By utilizing the largest chip available from Xilinx, the VU13P, BittWare's CVP-13 offers the most powerful cryptocurrency FPGA card in existence," SQRL president David Stanfill said.

The CVP-13 provides the most processing power of any FPGA cryptocurrency mining card available--46 percent more logic, 31 percent more on-chip memory and a 66 percent larger power supply than other popular boards that use the Xilinx VU9P FPGA chip.

Unlike other FPGA mining hardware in the market, the CVP-13 uses a 300 ampere power supply, allowing users to maximize the potential of the VU13P FPGA chip on the board.

Similar hardware in the market requires tuning that pulls more power than the hardware is rated for, causing efficiency losses.

The CVP-13's factory-designed and installed Viper cooling options include liquid cooling to increase efficiency.

"The efficiency and power gains obtained with the CVP-13 allow for higher density deployments with less hosting overhead," Stanfill explained.

"For every three VU9P-based boards, you only need two CVP-13 units to achieve similar performance goals."

In addition to cooler and more efficient performance, the increased amount of on-chip logic allows larger algorithms to fit that aren't available on other hardware. These algorithms include X17r, X16r and TimeTravel10.

CVP-13 cards can be chained together with QSFP28 and SlimSAS cables, enabling them to run larger algorithms like Equihash variants.

The CVP-13 has quadruple the amount of bandwidth of competitive cards and includes four QSFP28 cages and two SlimSAS connectors for board-to-board communication.

The CVP-13 includes the ability to run secure bitstreams developed and supported by AllMine Inc., and SQRL.

"BittWare has been producing high-end boards for three decades," Stanfill continued.

"They're a well-known and respected brand in the FPGA industry and now have the strength of Molex for even stronger market agility to take on projects like this."

BittWare CVP-13 FPGA boards are available for purchase now through SQRL with deliveries expected in November. Hosted CVP-13 hardware is also available through The Mineority Group.

Both options can be purchased at

"We're very excited to continue bringing groundbreaking FPGA technology into this market," Stanfill said.

About Squirrels Research Labs

Squirrels Research Labs, or SQRL, is a research and development division of Squirrels LLC. SQRL focuses time on projects that keep Squirrels forward thinking and adaptable in competitive markets.

Squirrels Research Labs found at

About BittWare, a Molex company

For three decades, BittWare has designed and deployed high-end signal processing, network processing and high-performance computing board-level solutions that significantly reduce technology risk and time-to-revenue for OEM customers.

BittWare products are based on the latest FPGA technology and industry-standard COTS form factors, including PCIe.

When customer requirements make it difficult to use industry-standard boards, BittWare provides modified solutions and/or licensed designs.

Need more information on BittWare and FPGA:

FPGA Monero Mining Source-Codes


Video Source in Title link Blow:

AltCoin Mining, Source-code DeCompile ReCompile.

Reference Material Link To Siste embedded in Title.

Xilinx Virtex-7 2000T FPGA provides over 20 million ASIC gates per-chip Date: 09m-24d-2018y, the present purchase cost of a Xilinx XC7V2000T Chip is $1k.

Xilinx has announced the first shipments of its Virtex-7 2000T Field Programmable Gate Array (FPGA). The Virtex-7 2000T is the world’s highest-capacity programmable logic device – it contains 6.8 billion transistors, providing customers access to 2 million logic cells.

This is equivalent to 20 million ASIC gates, which makes these devices ideal for system integration, ASIC replacement, and ASIC prototyping and emulation.

This capacity is made possible by Xilinx’s Stacked Silicon Interconnect technology – also referred to as 2.5D ICs. The simplest packaging technology is to have a single die in the package.

The next step up the “complexity ladder” is to have multiple die is the same package, but for all of these die to be attached directly to the package substrate. In this case, compared to the tracks on the die, the tracks on the package substrate are relatively large, slow, and driving signals onto them consumes a lot of power.

In this first incarnation of the technology, four FPGA die are attached to the silicon interposer, which – in addition to connecting the FPGAs to each other – provides connections to the package as illustrated below.

In the case of the Virtex-7 2000T, the FPGA die are implemented at the 28 nm technology node, while the passive silicon interposer is implemented at the 65 nm technology node. Implementing the large silicon interposer at this higher node reduces costs and increases yield without significantly degrading performance.

One way to think about this is that the silicon interposer essentially adds four additional tracking layers that can be used to connect the FPGAs to each other with more than 10,000 connections between each pair of adjacent die!

On top of this, Through-Silicon Vias (TSVs) are used to pass signals through the silicon interposer to C4 bumps on the bottom of the interposer. These bumps are then used to connect the interposer to the package substrate.

A view of Xilinx’s Virtex-7 2000T device showing the
packaging substrate (bottom), silicon interposer (middle),
and four FPGA die (top).

Compared with having to use standard I/O connections to integrate two FPGAs together on a circuit board, this stacked silicon interconnect technology is said to provide over 100X the die-to-die connectivity bandwidth-per-watt, at one-fifth the latency, without consuming any of the FPGAs' high-speed serial or parallel I/O resources.

Of particular interest to designers is the fact that, despite being composed of four die, the Virtex-7 2000T preserves the traditional FPGA use model in that users will program the device as one extremely large FPGA with the Xilinx tool flow and methodology.

Xilinx’s first application of 2.5D IC stacking gives customers twice the capacity of competing devices and leaps ahead of what Moore’s Law could otherwise offer in a monolithic 28-nanometer (nm) FPGA.

Xilinx says that its customers can use Virtex-7 2000T FPGAs to replace large capacity ASICs to achieve overall comparable total costs in a third of the time, creating integrated systems that increase system bandwidth and reduce power by eliminating I/O interconnect, and accelerating the prototyping and emulation of advanced ASIC systems.

A top and bottom view of Xilinx’s Virtex-7 2000T
the world’s highest-capacity FPGA using
Stacked Silicon Interconnect technology.

 “The Virtex-7 2000T FPGA marks a major milestone in Xilinx’s history of innovation and industry collaboration,” said Victor Peng, Xilinx Senior Vice President, Programmable Platforms Development.  

“Of significance to our customers is the fact that Stacked Silicon Interconnect technology offers capacities that otherwise wouldn’t be possible in an FPGA for at least another process generation. 

They can immediately add new functionality to existing designs while forgoing an ASIC, cost reduce a 3 or 5 FPGA solution into a single FPGA or move ahead with prototyping and building system emulators using our largest FPGAs at least a year earlier than typical for a new generation.”

The Virtex-7 2000T device also provides equipment manufacturers with an integration platform that will help them overcome the challenges of lowering power while increasing performance and capabilities.

By eliminating the I/O interfaces between different ICs on a circuit board, a system’s overall power consumption can be reduced considerably.

Consider the following example provided by Xilinx that compares a single Virtex-7 2000T with four of the largest monolithic ICs as illustrated below:

Actually, this is not really a fair comparison, because in terms of capacity the Virtex-7 2000T is equivalent to only around two of the largest monolithic ICs. But even comparing to two monolithic ICs results in a significant power advantage. (Having said this, I’d be interested to know just what was being exercised in this example – Logic? Memory? DSP slices? SERDES channels? – and at what frequency.)

Reference Material Link To Siste embedded in Title.

FPGA programming step by step...

FPGAs and microprocessors are more similar than you may think. Here's a primer on how to program an FPGA and some reasons why you'd want to. Small processors are, by far, the largest selling class of computers and form the basis of many embedded systems. The first single-chip microprocessors contained approximately 10,000 gates of logic and 10,000 bits of memory. Today, field programmable gate arrays (FPGAs) provide single chips approaching 10 million gates of logic and 10 million bits of memory. Figure 1 compares one of these microprocessors with an FPGA.

Figure 1: Comparison of first microprocessors to current FPGAs

Powerful tools exist to program these powerful chips. Unlike microprocessors, not only the memory bits, but also the logical gates are under your control as the programmer. This article will show the programming process used for FPGA design.
As an embedded systems programmer, you're aware of the development processes used with microprocessors. The development process for FPGAs is similar enough that you'll have no problem understanding it but sufficiently different that you'll have to think differently to use it well. We'll use the similarities to understand the basics, then discuss the differences and how to think about them.
Table 1 shows the steps involved in designing embedded systems with a microprocessor and an FPGA. This side-by-side comparison lets you quickly assess the two processes and see how similar they are.

Table 1: Step-by-step design process for microprocessors and FPGAs

FPGA Monero Working IP-Cores Shares

Build Status

DownLoad - Click Here - SiaFpgaMiner

This project is a VHDL FPGA core that implements an optimized Blake2b pipeline to mine Siacoin.


When CPU mining got crowded in the earlier years of cryptocurrencies, many started mining Bitcoin with FPGAs. The time arrived when it made sense to invest millions in ASIC development, which outperformed FPGAs by several orders of magnitude, kicking them out of the game. The complexity and cost of developing ASICs monopolized Bitcoin mining, leading to relatively dangerous mining centralization. Therefore, emerging altcoins decided to base their PoW puzzle on other algorithms that wouldn't give ASICs an unfair advantage (i.e. ASIC-resistant). The most popular mechanism has been designing the algorithm to be memory-hard (i.e. dependent on memory accesses), which makes memory bandwith the computing bottleneck. This gives GPUs an edge over ASICS, effectively democratizing access to mining hardware since GPUs are consumer electronics. Ethereum is a clear example of it with its Ethash PoW algorithm.
Siacoin is an example of a coin without a memory-hard PoW algorith and no ASIC miners some ASIC miners are being rolled out (see Obelisk and Antminer A3). So was a perfect candidate for FPGA mining! (more for fun than profit)

Design theory

To yield the highest posible hash rate, a fully unrolled pipeline was implemented with resources dedicated to every operation of every round of the Blake2b hash computation. It takes 96 clock cycles to fill the pipeline and start getting valid results (4 clocks per 'G' x 2 'G' per round x 12 rounds).
  • MixG.vhd implements the basic 'G' function in 4 steps. Eight and two-step variations were explored but four steps gave the best balance between resource usage and timing.
  • QuadG.vhd is just a wrapper that instantiates 4 MixG to process the full 16-word vectors and make the higher level files easier to understand.
  • Blake2bMinerCore.vhd instantiates the MixG components for all rounds and wires their inputs and outputs appropiately. Nonce generation and distribution logic also lives in this file.
  • /Example contains an example instantiation of Blake2bMinerCore interfacing a host via UART. It includes a very minimalist Python script to interface the FPGA to a Sia node for mining.


The diagram below shows the pipeline structure of a single MixG. Four of these are instantiated in parall to constitute QuadGs, which are chained in series to form rounds.
MixG logic
The gray A, B, C, D boxes contain combinatorial operations to add and rotate bits according to the G function specification. The white two-cell boxes represent two 64-bit pipelining registers to store results from the combinatorial logic that are used later on the process.

Nonce Generation and Distribution

Pipelining the hash vector throughout the chain implies heavy register usage and there is no way around it. Fortunately the X/Y message feeds aren't as resource-demanding because the work header can remain constant for a given block period, with the exception of the nonce field, which must obviously be changing all the time to yield unique hashes. Therefore, the nonce field must be tracked or kept in memory for when a given step in the mixing logic requires it. The most simplistic approach would be to make a huge N-bit wide shift register to "drag" the nonce corresponding to each clock cycle across the pipeline. This is not an ideal solution, for we would require N flip-flops (e.g. 48-bit counter) times the number of clock cycles it takes to cross the pipeline (48 x 96 = 4608 FF!)
Luckily, the nonce field is only used once per round (12 times total). This allows hooking up 12 counters statically to the X or Y input where the nonce part of the message is fed in each round. To make the counter output the value of the nonce corresponding to a given cycle, the counters' initial values are offset by the amount of clock cycles between them. The following diagram illustrates the point:
Nonce counters
In this case the offsets show that the nonce used in round zero will be consumed by round one 8 clock cycles after, by round two 20 cycles after, and so on. (The distance in clock cycles between counters is defined by the Blake2b message schedule)

Implementation results

It is evident that a single core is too big to fit in a regular affordable FPGA device. A ballpark estimate of the flip-flop resources a single core could use:
  • 64-bits per word x 16 word registers per MixG x 4 MixG per QuadG x 2 QuadG per round x 12 rounds = 98,308 registers (not considering nonce counters and other pieces of logic).
The design won't fit in your regular Spartan 6 dev board, which is why I built it for a Kintex 7 410 FPGA. Here are some of my compile tests:
CoresClockHashrateMix stepsStrategyUtilizationWorst Setup SlackWorst Hold SlackFailuresNotes
32006004Default56.00%-0.246602 failing endpoints
32006004Explore56.00%-0.2460.011602 failing endpoints
51668304ExplorePlacing error
4173.33693.324Explore75.00%0.170.02201 BUFGs per core
As seen in the table, the highest number of cores I was able to instantiate was 4 and the highest clock flequency that met timing was 173.33 MHz.
~700 MH/s is no better than a mediocre GPU, but power draw is way less! (hey, I did say it was for fun)

Further work

  • Investigate BRAM as alternative to flip-flops (unlikely to fit the needs of this application).
  • Fine-tune a higher clock frequency to squeeze out a few more MH/s.
  • Porting to Blake-256 for Decred mining. That variant adds two rounds but words are half as wide, so fitting ~2x the number of cores sounds possible.
  • Do more in-depth tests with different number of steps in the G function (timing-resources tradeoff).
  • Play more with custom implementation strategies.


FPGA Heterogeneous Self-Healing

FPGA Autonomous Acceleration Self-Healing

This example uses FPGA-in-the-Loop (FIL) simulation to accelerate a video processing simulation with Simulink® by adding an FPGA. The process shown analyzes a simple system that sharpens an RGB video input at 24 frames per second.
This example uses the Computer Vision System Toolbox™ in conjunction with Simulink® HDL Coder™ and HDL Verifier™ to show a design workflow for implementing FIL simulation.

Products required to run this example:
  • Simulink
  • Fixed-Point Designer
  • DSP System Toolbox
  • Computer Vision System Toolbox
  • HDL Verifier
  • HDL Coder
  • FPGA design software (Xilinx® ISE® or Vivado® design suite or Intel® Quartus® Prime design software)
  • One of the supported FPGA development boards and accessories (the ML403, SP601, BeMicro SDK, and Cyclone III Starter Kit boards are not supported for this example)
  • For connection using Ethernet: Gigabit Ethernet Adapter installed on host computer, Gigabit Ethernet crossover cable
  • For connection using JTAG: USB Blaster I or II cable and driver for Altera FPGA boards. Digilent® JTAG cable and driver for Xilinx FPGA boards.
  • For connection using PCI Express®: FPGA board installed into PCI Express slot of host computer.
MATLAB® and FPGA design software can either be locally installed on your computer or on a network accessible device. If you use software from the network you will need a second network adapter installed in your computer to provide a private network to the FPGA development board. Consult the hardware and networking guides for your computer to learn how to install the network adapter.
Note: The demonstration includes code generation. Simulink does not permit you to modify the MATLAB installation area. If necessary, change to a working directory that is not in the MATLAB installation area prior to starting this example.

1. Open and Execute the Simulink Model

Open the fil_videosharp_sim.mdl and run the simulation for 0.21s.
Due to the large quantity of data to process , the simulation is not fluent. We will improve the simulation speed in the following steps by using a FPGA-in-the-Loop.

2. Generate HDL Code

Generate HDL code for the Streaming Video Sharpening subsystem by performing these steps:
a. Right-click on the block labeled Streaming 2-D FIR Filter.
b. Select HDL Code Generation > Generate HDL for Subsystem in the context menu.
Alternatively, you can generate HDL code by entering the following command at the MATLAB prompt:
>> makehdl('fil_videosharp_sim/Streaming 2-D FIR Filter')
If you do not want to generate HDL code, you can copy pre-generated HDL files to the current directory using this command:
>> copyFILDemoFiles('videosharp');

3. Set Up FPGA Design Software

Before using FPGA-in-the-Loop, make sure your system environment is set up properly for accessing FPGA design software. You can use the function hdlsetuptoolpath to add ISE or Quartus II to the system path for the current MATLAB session.
For Xilinx FPGA boards, run
hdlsetuptoolpath('ToolName', 'Xilinx ISE', 'ToolPath', 'C:\Xilinx\13.1\ISE_DS\ISE\bin\nt64\ise.exe');
This example assumes that the Xilinx ISE executable is C:\Xilinx\13.1\ISE_DS\ISE\bin\nt64\ise.exe. Substitute with your actual executable if it is different.
For Altera boards, run
hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\altera\11.0\quartus\bin\quartus.exe');
This example assumes that the Altera Quartus II executable is C:\altera\11.0\quartus\bin\quartus.exe. Substitute with your actual executable if it is different.

4. Run FPGA-in-the-Loop Wizard

To launch the FIL Wizard, select Tools > Verification Wizards > FPGA-in-the-Loop (FIL)... in the model window or enter the following command at the MATLAB prompt:
>> filWizard;

4.1 Hardware Options

Select a board in the board list.

4.2 Source Files

a. Add the previously generated HDL source files for the Streaming Video Sharpening subsystem.
b. Select Streaming_2_D_FIR_Filter.vhd as the Top-level file.

4.3 DUT I/O Ports

Do not change anything in this view.

4.4 Build Options

a. Select an output folder.
b. Click Build to build the FIL block and the FPGA programming file.
During the build process, the following actions occur:
  • A FIL block named Streaming_2_D_FIR_Filter is generated in a new model. Do not close this model.
  • After new model generation, the FIL Wizard opens a command window where the FPGA design software performs synthesis, fit, place-and-route, timing analysis, and FPGA programming file generation. When the FPGA design software process is finished, a message in the command window lets you know you can close the window. Close the window.
c. Close the fil_videosharp_sim model.

5. Open and Complete the Simulink Model for FIL

a. Open the fil_videosharp_fpga.slx.
b. Copy in it the previously generated FIL block to fil_videosharp_fpga.slx where it say "Replace this with FIL block"

6. Configure FIL Block

a. Double-click the FIL block in the Streaming Video Sharpening with FPGA-in-the-Loop model to open the block mask.
b. Click Load.
c. Click OK to close the block mask.

7. Run FIL Simulation

Run the simulation for 10s and observe the performance improvement.

This concludes the Video Processing Acceleration using FPGA-In-the-Loop example.

10-FPGA Programming Methods

10 Ways To Program Your FPGA

6/10/2016 09:47 AM EDT
6 saves
Despite the recent push toward high level synthesis (HLS), hardware description languages (HDLs) remain king in field programmable gate array (FPGA) development. Specifically, two FPGA design languages have been used by most developers: VHDL and Verilog. Both of these “standard” HDLs emerged in the 1980s, initially intended only to describe and simulate the behavior of the circuit, not implement it.

However, if you can describe and simulate, it’s not long before you want to turn those descriptions into physical gates.
For the last 20 plus years most designs have been developed using one or the other of these languages, with some quite nasty and costly language wars fought. Other options rather than these two languages exist for programming your FPGA. Let’s take a look at what other tools we can use.
C / C++ / System C
The C, C++ or System C option allows us to leverage the capabilities of the largest devices while still achieving a semblance of a realistic development schedule... although that may just be my engineering management side coming out.

The ability to use C-based languages for FPGA design is brought about by HLS (high level synthesis), which has been on the verge of a breakthrough now for many years with tools like Handle-C and so on. Recently it has become a reality with both major vendors, Altera and Xilinx offering HLS within their toolsets Spectra-Q and Vivado HLx respectively.
A number of other C-based implementations are available, such as OpenCL which is designed for software engineers who want to achieve performance boosts by using a FPGA without a deep understanding of FPGA design. Whereas HLS is still very much in the area of FPGA engineers who want to increase productivity.
As with HDL, HLS has limitations when using C-based approaches, just like with traditional HDL you have to work with a subset of the language. For instance, it is difficult to synthesize and implement system calls, and we have to make sure everything is bounded and of a fixed size.
What is nice about HLS, however, is the ability to develop your algorithms in floating point and let the HLS tool address the floating- to fixed-point conversion.
As with many things, we are still at the start of the journey: I am sure over the coming years, we will see HLS increasingly used in different languages, making HLS similar to very low level of a software engineer’s C.

More info:

10 FPGA dev tools:

  • Page 1: C / C++ / System C
  • Page 2: MyHDL
  • Page 3: CHISEL
  • Page 4: JHDL
  • Page 5: BSV
  • Page 6: MATLAB
  • Page 7: LabVIEW FPGA
  • Page 8: SystemVerilog
  • Page 9: VHDL / VERILOG
  • Page 10: SPINAL HDL 

  • 123

    fpga4fun.comwhere FPGAs are fun

    FPGA software 1 - FPGA design software

    FPGA vendors provide design software that support their devices. It does four main things:
    • Design-entry.
    • Simulation.
    • Synthesis / place-and-route.
    • Programming through special cables (JTAG).
    There are usually two versions: one free that supports low to medium density FPGA devices, and a full (non-free) version of the same software for big devices.
    The free software is usually fine to start with because it is similar in functionality to the full version, and today's low to medium density devices are very capable.
    Here's a summary of the features/limitations of the software:

    Xilinx's ISE or the free ISE WebPACK
    Altera's Pro, Standard or (free) Web/Lite Quartus software
    Design-entryVHDL, Verilog, ABEL, Schematic, EDIFVHDL, Verilog, SystemVerilog, AHDL, Schematic, EDIF
    Core generatorYes (CORE Generator)Yes (MegaWizard Plug-Ins)
    Functional simulationNoNo (last version with simulation was 9.1SP2)
    Testbench simulationUse ISimUse ModelSim-Altera Starter Edition
    Synthesis/P&RFree version limited to small & medium devicesFree version limited to small & medium devices
    FPGA editorYes (FPGA editor)Yes (Chip Editor)
    Embedded logic analyzerChipScope PRO (a separate product - not free)SignalTap II (included in Quartus II Web/Lite edition)
    Older versionsAvailable from ISE ClassicsAvailable from the Quartus II Software Archive
    OS supportWindows + LinuxWindows + Linux
    PriceFree version: $0
    Full version: starting at $2995 for a 12 month license
    Free version: $0
    Full version: $2995 for a 12 month license
    Software matrixCheck hereCheck here
    Which is better?
    As of this writing (May 2013), Quartus-II is better overall - it runs faster, has a better GUI, better HDL support and includes one killer feature: SignalTap II embedded logic analyzer, which is easy to use and available in the free edition. Altera's low point is their simulator - they dropped their own integrated simulator but didn't have anything to replace it so rely on ModelSim for now.
    ISE is pretty good overall. Its low points are basic HDL support and ChipScope PRO (not part of the free suite).
    Xilinx has a new software suite called Vivado but limited to high-end devices.
    Xilinx traditionally had better silicon, and Altera better software... this seems to still hold true.

    OpenCores: EDA Tools


    OpenCores is the world largest community focusing on open source development targeted for hardware. Designing IP cores, is unfortunately not as simple as writing a C program. A lot more steps are needed to verify the cores and to ensure they can be synthesized to different FPGA architectures and various standard cell libraries.

    Open Source EDA tools

    There are plenty of good EDA tools that are open source available. The use of such tools makes it easier to collaborate at the opencores site. An IP that has readily available scripts for an open source HDL simulator makes it easier for an other person to verify and possibly update that particular core. A test environment that is built for a commercial simulator that only a limited number of people have access to makes verification more complicated.

    Icarus Verilog Simulator

    Icarus Verilog is a Verilog simulation and synthesis tool. It operates as a compiler, compiling source code writen in Verilog (IEEE-1364) into some target format. For batch simulation, the compiler can generate an intermediate form called vvp assembly. This intermediate form is executed by the &qout;vvp&qout; command. For synthesis, the compiler generates netlists in the desired format.
    The compiler proper is intended to parse and elaborate design descriptions written to the IEEE standard IEEE Std 1364-2005.
    Icarus web site


    Verilator is a free Verilog HDL simulator. It compiles synthesizable Verilog into an executable format and wraps it into a SystemC model. Internally a two-stage model is used. The resulting model executes about 10 times faster than standalone SystemC.
    Verilator has been used to simulate many very large multi-million gate designs with thousands of modules. Therefor we have chosen this tool to be used in the verification environment for the OpenRISC processor.
    Verilator web site

    GHDL VHDL simulator

    GHDL implements the VHDL87 (common name for IEEE 1076-1987) standard, the VHDL93 standard (aka IEEE 1076-1993) and the protected types of VHDL00 (aka IEEE 1076a or IEEE 1076-2000). The VHDL version can be selected with a command line option.
    GHDL web site

    EMACS - text editor

    GNU Emacs is an extensible, customizable text editor—and more.
    Very good support for both Verilog HDL and VHDL editing.
    Emacs web site

    Fizzim is a FREE, open-source GUI-based FSM design tool

    The GUI is written in java for portability. The backend code generation is written in perl for portability and ease of modification.



    • Runs on Windows, Linux, Apple, anything with java.
    • Familiar Windows look-and-feel.
    • Visibility (on/off/only-non-default) and color control on data and comment fields.
    • Multiple pages for complex state machines.
    • "Output to clipboard" makes it easy to pull the state diagram into your documentation.


    • Verilog code generation based on recommendations from experts in the field.
    • Output code has "hand-coded" look-and-feel (no tasks, functions, etc).
    • Switch between highly encoded or onehot output without changing the source.
    • Registered outputs can be specified to be included as state bits, or pulled out as independent flops.
    • Mealy and Moore outputs available.
    • Transition priority available.
    • Automatic grey coding available.
    • Code and/or comments can be inserted at strategic places in the output - no need to "perl" the output to add your copyright or `include
    Fizzim web site


    TCE is a toolset for designing application-specific processors (ASP) based on the Transport triggered architecture (TTA). The toolset provides a complete co-design flow from C programs down to synthesizable VHDL and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.
    TCE has been developed internally in the Tampere University of Technology since the early 2003. The current source code base consists of roughly 400 000 lines of C++ code.
    TCE web site

    C to Verilog translation

    Available is an online C to Verilog compiler. The code generated by the site is licensed under BSD (use it "as is").
    C-to-Verilog web site

    Fedora Electronic Lab

    Fedora Electronic Lab tries to provide a complete hardware design flow with the best opensource tools. We try to ensure interoperability as far as we can and we work with other opensource developers to improve existing EDA tools.
    Fedora Electronic Lab web site

    The FreeHDL Project

    Linux - The logical choice for EDA

    Subproject Teams:

    AIRE Implementation
    Frontend -parser/analyzer/codegen
    Waveform Viewer


    Related Projects...

    Commercial EDA software for Linux

    Subscribe to mailing list

    Mailing list archives


    A project to develop a free, open source, GPL'ed VHDL simulator for Linux!

    Project goals:
    To develop a VHDL simulator that:
    • Has a graphical waveform viewer.
    • Has a source level debugger.
    • Is VHDL-93 compliant.
    • Is of commercial quality. (on par with, say, V-System - it'll take us a while to get there, but that should be our aim)
    • Is freely distributable - both source and binaries - like Linux itself. (Under the Gnu General Public License (GPL)).
    • Works with Linux. If others want to port it to other platforms they may, but it is not the goal of this project.
    FreeHDL is used by Qucs for digital simulation. Qucs is a circuit simulator with graphical user interface. Qucs aims to support all kinds of circuit simulation types, e.g. DC, AC, S-parameter, Transient, Noise and Harmonic Balance analysis. It is available from
    Release 0.0.7 of the FreeHDL compiler/simulater system can be downloaded from here.
    Release 0.0.6 of the FreeHDL compiler/simulater system can be downloaded from here.
    Release 0.0.5 of the FreeHDL compiler/simulater system can be downloaded from here.


    Search This Blog