CryptoURANUS Economics: July 2019

CryptoCurrencies


Tuesday, July 30, 2019

Xilinx ISE-LabTools Tutorial


Xilinx Tools Tutorial

Introduction

The Xilinx Integrated Software Environment (ISE) is a powerful and complex set of tools. The purpose of this guide is to help new users get started using ISE to compile their designs. This guide provides a very high-level overview of how the tools work, and takes the reader through the process of compiling. The ultimate reference to ISE is of course the official documentation, which is installed on every PC in the lab, and is available from the Xilinx website. Because the documentation is so voluminous, this guide will attempt to provide some help with finding the right sections of the documentation to read. Don't miss the required reading section at the end of this guide which points out some sections of the documentation that every 6.111 student should read before beginning a complex labkit project.

From HDL to FPGA

The process of converting hardware design language (HDL) files into a configuration bitstream which can be used to program the FPGA, is done several steps.
First, the HDL files are synthesized. Synthesis is the process of converting behavioral HDL descriptions into a network of logic gates. The synthesis engine takes as input the HDL design files and a library of primitives. Primitives are not necessarily just simple logic gates like AND and OR gates and D-registers, but can also include more complicated things such as shift registers and arithmetic units. Primitives also include specialized circuits such as DLLs that cannot be inferred by behavioral HDL code and must be explicitly instantiated. The libraries guide in the Xilinx documentation provides an complete description of every primitive available in the Xilinx library. (Note that, while there are occasions when it is helpful or even necessary to explicitly instantiate primitives, it is much better design practice to write behavioral code whenever possible.)
In 6.111, we will be using the Xilinx supplied synthesis engine known as XST. XST takes as input a verilog (.v) file and generates a .ngc file. A synthesis report file (.srp) is also generated, which describes the logic inferred for each part of the HDL file, and often includes helpful warning messages.
The .ngc file is then converted to an .ngd file. (This step mostly seems to be necessary to accommodate different design entry methods, such as third-part synthesis tools or direct schematic entry. Whatever the design entry method, the result is an .ngd file.)
The .ngd file is essentially a netlist of primitive gates, which could be implemented on any one of a number of types of FPGA devices Xilinx manufacturers. The next step is to map the primitives onto the types of resources (logic cells, i/o cells, etc.) available in the specific FPGA being targeted. The output of the Xilinx map tool is an .ncd file.
The design is then placed and routed, meaning that the resources described in the .ncd file are then assigned specific locations on the FPGA, and the connections between the resources are mapped into the FPGAs interconnect network. The delays associated with interconnect on a large FPGA can be quite significant, so the place and route process has a large impact on the speed of the design. The place and route engine attempts to honor timing constraints that have been added to the design, but if the constraints are too tight, the engine will give up and generate an implementation that is functional, but not capable of operating as fast as desired. Be careful not to assume that just because a design was successfully placed and routed, that it will operate at the desired clock rate.
The output of the place and route engine is an updated .ncd file, which contains all the information necessary to implement the design on the chosen FPGA. All that remains is to translate the .ncd file into a configuration bitstream in the format recognized by the FPGA programming tools. Then the programmer is used to download the design into the FPGA, or write the appropriate files to a compact flash card, which is then used to configure the FPGA.

Constraints

By itself, a Verilog model seldom captures all of the important attributes of a complete design. Details such as i/o pin mappings and timing constraints can't be expressed in Verilog, but are nonetheless important considerations when implementing the model on real hardware. The Xilinx tools allow these constraints to be defined in several places, the two most notable being a separate "universal constraints file" (.ucf) and special comments within the Verilog model.
A .ucf file is simply a list of constraints, such as
net "ram0_data<35>" loc="ab25"
which indicates that bit 35 of the signal ram0_data (which should be a port in the top-level Verilog module) should be assigned to pin AB25 on the FPGA. Sometimes it is useful to combine several related constraints on one line, using "|" characters to separate constraints.
net "ram0_data<35>" loc="ab25" | fast | iostandard=lvdci_33 | drive=12;
The above example again assigns bit 35 of the signal ram0_data to pin AB25, and also specifies that the i/o driver should be configured for fast slew rate, 3.3V LVTTL level signaling (with a built-in series termination resistor), and a drive strength of 12mA. See the Xilinx documentation for more details, but don't worry: all of the pin constraints have been written for you (more on this later). Constraints can also be specified within special comments in the Verilog code. For example,
reg [7:0] state;
// synthesis attribute init of state is "03";
The Xilinx synthesis engine will identify the phrase "synthesis attribute" within any comments, and will add the constraint following this phrase to the list of constraints loaded from a .ucf file. In the above example, the initial state of the state signal after the FPGA finishes configuring itself will be set to 8'h03. In general, it is bad form to specify the initial state of signals using constraints, rather than implementing a reset signal. In some advanced designs, however, such initializations are sometimes necessary. The tools recognize a huge variety of constraints, and an entire section of the online manual is dedicated to explaining them. Fortunately, understanding a few simple constraints is sufficient for most designs.

ISE and the 6.111 Labkit

The FPGA used in the labkit has 684 i/o pins, and most of them are actually being used. To simplify the process of adding pin constraints to a new design, two template files have been developed The file labkit.v is a template top-level Verilog module. This module defines names for all of the signals going in or out of the FPGA. Additionally, it provides default assignments for all FPGA outputs, so that unused circuits on the labkit are disabled. A template constraints file, labkit.ucf ties the signals in labkit.v to the appropriate physical FPGA pins.

A Tutorial

Download the following source files:

Create a New Project

  1. Select "New Project" from the "File" menu
  2. Enter a project name and choose a project location. Be sure the top-level module type is set to HDL (hardware design language).
  3. On the next form, enter the target device information. The FPGA on the labkits is a Virtex 2 family XC2V6000, speed grade 4, in a BF957 package.
  4. The next form allows you to create a new source file. If you have already written the Verilog code for your project, click "Next" to advance to the next form, which allows you to add existing source files. You can also add existing files to your project later, after the project has been created.
  5. After creating or adding any source files, click "Next" until you reach the project summary form, and then click "Finish" to create the project.

Add Sources and Constraints

  1. If you did not add all of your Verilog source files when creating the project, add them now by choosing "Add Source..." from the "Project" menu.

Modify the Top-Level Template

  1. To instantiate the counter module, add the following code to labkit.v, just before the endmodule statement.
    counter counter1 (.clock(clock_27mhz), .led(led));
    
  2. The stock labkit.v file assigns a default value to the LED outputs. In order to drive the LEDs with the counter module, we need to delete the following line, which can be found near the end of labkit.v.
    assign led = 8'hFF; // Turn off all LEDs
    

Compile the Design

  1. The labkit.ucf file includes constraints for all of the FPGA outputs defined in labkit.v. Most designs, however, will not use all of these signals, in which case the synthesis engine will optimize the unused signals out of the design. It is therefore necessary to tell the place-and-route tool not to generate an error if it encounters a pin constraint for a (now) nonexistent signal. To do this, right-click on the "Implement Design" item in the process pane, and select "Properties...". Check the box to "allow unmatched LOC constraints".
  2. Make sure the top-level labkit module is selected in the sources window.
  3. Double click "Generate Programming File" in the tasks window. This will cause the "Synthesize", "Implement Design" and "Generate Programming File" tasks to be run in order. A green check mark will appear beside each task when it is successfully completed. It will take approximately 5 minutes for everything to complete. (The tools seem to hang for a few minutes while generating the pad report, but this normal.)

Move the Design to a Compact Flash Card

  1. In the process pane, double-click on "Generate PROM, ACE or JTAG file". (You may have to expand the "Generate Programming File" line to see this item.)
  2. The iMPACT programming tool will launch, and a wizard will ask you several questions about what you want to do. You want to generate a SystemACE file.
  3. When asked to choose between the CF and MPM versions of SystemACE, choose CF (Compact Flash). The operating mode is unimportant.
  4. It is not necessary to specify the size of the CF card.
  5. Choose a name and a location for the SystemACE file collection you are about to generate.
  6. SystemACE allows up to eight designs to be stored on a single CF card. A switch on the labkit selects which design will be loaded into the FPGA. Check the boxes for as many designs as you want to include on the CF card.
  7. iMPACT will ask you to add design files for each configuration. You can potentially add more than one design file for each of the eight configurations (if there were more than one FPGA on the board, for example), but for the labkits, only add one design per configuration.
  8. Ignore the warnings about changing the configuration clock.
  9. Once all of the configurations have been specified, choose "Finish" and generate the .ace files. This takes a minute or two.
  10. Right click anywhere in the upper pane of the iMPACT window (with the system diagram) and select "Copy to CompactFlash...". Select the collection you just generated and copy it to the CF card.
  11. Insert the CF card in the labkit. Make sure the configuration source is set to "JTAG/CF", and select the configuration selector switch to the appropriate number. Power on the labkit, and your configuration should load.

Sunday, July 28, 2019

Designing FPGA Tutorial




New Horizons

image


Welcome to Sven Andersson's blog

My name is Sven Andersson and I
work as a consultant in embedded
system design, implemented in ASIC
and FPGA.
In my spare time I write this blog
and I hope it will inspire others to
learn more about this fantastic field.
I live in Stockholm Sweden and have
my own company...

 Content
New Horizons
What's new
Starting a blog
Writing a blog
Using an RSS reader

Zynq Design From Scratch
Started February 2014
1 Introduction
Changes and updates
2 Zynq-7000 All Programmable SoC
3 ZedBoard and other boards
4 Computer platform and VirtualBox
5 Installing Ubuntu
6 Fixing Ubuntu
7 Installing Vivado
8 Starting Vivado
9 Using Vivado
10 Lab 1. Create a Zynq project
11 Lab 1. Build a hardware platform
12 Lab 1. Create a software application
13 Lab 1. Connect to ZedBoard
14 Lab 1. Run a software application
15 Lab 1. Benchmarking ARM Cortex-A9
16 Lab 2. Adding a GPIO peripheral
17 Lab 2. Create a custom HDL module
18 Lab 2. Connect package pins and implement
19 Lab 2. Create a software application and configure the PL
20 Lab 2. Debugging a software application
21 Running Linux from SD card
22 Installing PetaLinux
23 Booting PetaLinux
24 Connect to ZedBoad via ethernet
25 Rebuilding the PetaLinux kernel image
26 Running a DHCP server on the host
27 Running a TFTP server on the host
28 PetaLinux boot via U-boot
29 PetaLinux application development
30 Fixing the host computer
31 Running NFS servers
32 VirtualBox seamless mode
33 Mounting guest file system using sshfs
34 PetaLinux. Setting up a web server
35 PetaLinux. Using cgi scripts
36 PetaLinux. Web enabled application
37 Convert from VirtualBox to VMware
38 Running Linaro Ubuntu on ZedBoard
39 Running Android on ZedBoard
40 Lab2. Booting from SD card and SPI flash
41 Lab2. PetaLinux board bringup
42 Lab2. Writing userspace IO device driver
43 Lab2. Hardware debugging
44 MicroZed quick start
45 Installing Vivado 2014.1
46 Lab3. Adding push buttons to our Zynq system
47 Lab3. Adding an interrupt service routine
48 Installing Ubuntu 14.04
49 Installing Vivado and Petalinux 2014.2
50 Using Vivado 2014.2
51 Upgrading to Ubuntu 14.04
52 Using Petalinux 2014.2
53 Booting from SD card and SPI flash
54 Booting Petalinux 2014.2 from SD card
55 Booting Petalinux 2014.2 from SPI flash
56 Installing Vivado 2014.3

Chipotle Verification System
Introduction

EE Times Retrospective Series
It all started more than 40 years ago
My first job as an electrical engineer
The Memory (R)evolution
The Microprocessor (R)evolution

Four soft-core processors
Started January 2012
Introduction
Table of contents
Leon3
MicroBlaze
OpenRISC 1200
Nios II

Using the Spartan-6 LX9 MicroBoard
Started August 2011
Introduction
Table of contents
Problems, fixes and solutions

FPGA Design From Scratch
Started December 2006
Introduction
Table of contents
Index
Acronyms and abbreviations

Actel FPGA design
Designing with an Actel FPGA. Part 1
Designing with an Actel FPGA. Part 2
Designing with an Actel FPGA. Part 3
Designing with an Actel FPGA. Part 4
Designing with an Actel FPGA. Part 5

CAD
A hardware designer's best friend
Zoo Design Platform

Linux
Installing Cobra Command Tool
A processor benchmark

Mac
Porting a Unix program to Mac OS X
Fixing a HyperTerminal in Mac OS X
A dream come true

Bicycling
Stockholm by bike

Running
The New York City Marathon

Skiing/Skating
Kittelfjall Lappland

Tour skating in Sweden and around the world
Top
Introduction
SSSK
Wild skating
Tour day
Safety equipment
A look at the equipment you need
Skate maintenance
Links
Books, photos, films and videos
Weather forecasts

Travel
38000 feet above see level
A trip to Spain
Florida the sunshine state

Photo Albums
Seaside Florida
Ronda Spain
Sevilla Spain
Cordoba Spain
Alhambra Spain
KittelfjÀll Lapland
Landsort Art Walk
Skating on thin ice

Books
100 Power Tips for FPGA Designers

Favorites
Adventures in ASIC
ChipHit
Computer History Museum
DeepChip
Design & Reuse
Dilbert
d9 Tech Blog
EDA Cafe
EDA DesignLine
Eli's tech Blog
Embedded.com
EmbeddedRelated.com
FPGA Arcade
FPGA Blog
FPGA Central
FPGA CPU News
FPGA developer
FPGA Journal
FPGA World
Lesley Shannon Courses
Mac 2 Ubuntu
Programmable Logic DesignLine
OpenCores
Simplehelp
SOCcentral
World of ASIC



Friday, July 26, 2019

CryptoMining via FPGA



CryptoMining via FPGA's:







FPGA for Dummies & Experts Alike!


The ASICs had overtaking GPU mining, but an alternative to ASIC mining was born. 

The new wolfpack leader appears as the Field Programmable Gate Arrays, or shortly FPGA and it is taking over very fast.

The only issue here is the boards are difficult to find each passing month.

Disclaimer: this post is not sponsored by any company nor have any referral links.

 

Let’s look at why FPGA is interesting for mining.

The Two Main Issues FPGA Are Meant to Solve Cryptocurrencies are volatile and unstable in the current market, August-2018. 

 

Cryptocurrency market have been jumping from Ethereum to Monero to Zcash, back and forth, depending on the volatility of coin profitability. 

The ASICs storming the mining pool strategy is to buy an ASIC miner and pray that it pays off in time. GPU mining  and the amount of coins you can mine is limited and people find this unsatisfactory 75% of the time.

The ASIC's issue is it offers zero flexibility when it comes to a single coin that can be mined and no other type of cryptocurrency coin. 

 

An ASIC is hard-wired to mine one algorithm type of coin only. 



Highest End, Lowest Cost:

Ultra96 is an Arm-based, Xilinx Zynq UltraScale+ MPSoC development board based on the Linaro 96Boards specification. 

The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers. 

 

Ultra96 represents a unique position in the 96Boards community with a wide range of potential peripherals and acceleration engines in the programmable logic that is not available from other offerings. 

Ultra96 boots from the provided Delkin 16 GB MicroSD card, pre-loaded with PetaLinux. 

 

Engineers have options of connecting to Ultra96 through a Webserver using integrated wireless access point capability or to use the provided PetaLinux desktop environment which can be viewed on the integrated Mini DisplayPort video output. 

 

Multiple application examples and on-board development options are provided as examples. 

 


Ultra96 provides four user-controllable LEDs. 

 

Engineers may also interact with the board through the 96Boards-compatible low-speed and high-speed expansion connectors by adding peripheral accessories such as those included in Seeed Studio’s Grove Starter Kit for 96Boards. 

 

Micron LPDDR4 memory provides 2 GB of RAM in a 512M x 32 configuration. Wireless options include 802.11b/g/n Wi-Fi and Bluetooth 4.2 (provides both Bluetooth Classic and Low Energy (BLE)). 

 

UARTs are accessible on a header as well as through the expansion connector. JTAG is available through a header (external USB-JTAG required). I2C is available through the expansion connector. 

 

 Ultra96 provides one upstream (device) and two downstream (host) USB 3.0 connections. A USB 2.0 downstream (host) interface is provided on the high speed expansion bus. 

 

Two Microchip USB3320 USB 2.0 ULPI Transceivers and one Microchip USB5744 4-Port SS/HS USB Controller Hub are specified. 

 

 The integrated power supply generates all on-board voltages from an external 12V supply (available as an accessory).



What’s the Third Option: There is always other option$. 

 

FPGA is the hardware taking over the market by storm. This is the new favorite in the cryptocurency mining community. 

 

FPGA have been around since 1979. 

 

They heavily used in science, vehicle modeling and even military deployment applications.

 

The first manufacturer of these devices is an American technology company called "Xilinx"

 

Years followed and another American company called Altera, (now owned by Intel), has joined the industry and has been the main Xilinx competitor since then.

 

The development of FPGA circuitboards have been welcomed very in many industries and the demand FPGA hardware technology is still booming. 

In 2013, the market for FPGA circuit boards was $6.1 billion and estimated $21.3 billion by year 2020.



Why FPGA Have not been used in cryptocurrency mining:

Since Bitcoin became popular, average people tried to mine with FPGA, but failed because they did not have the programming skill sets to utilize the FPGA circuit-boards.

 

The only people have mined FPGA circuit-boards were large mining exchanges, and they kept the FPGA circuit boards a secret for years.

 

When the first open source FPGA Bitcoin miner was released from private sectors until May 20, 2011. 




Juan Antonio Ernesto's Great Adventure:

A man named Juan Antonio Ernesto's, [his named was changed here to protect his immigration innocence], from Tijuana Mexico, who illegally migrated into Canada from Mexico, and was hired by Canada's silicon-valley, (in Waterloo Ontario).

After three years Juan left Canada, because of bias Canadian in that country, so Juan Antonio said, he had to leave to U.S..

When Juan entered the U.S. he was immediately granted citizenship by the U.S. government, because enrolled into U.S. college for free and acquired his CS-masters degree from NMT Socorro New-Mexico under his real name.

While he was in Socorro NM he and other Mexicans-Americans secretly managed the cryptocurrency mining software exchanged named "Macho-Rio-Grande".

The Mexican trio enabled the FPGA Xilinux cards workable and online usable for cryptocurrency mining.

The trio secretly made millions of dollars and they were the only private-public sector aware of this technology.

When Juan returned with millions of dollars of wealth to share with his friends in Canada he died from a fatal gunshot wound by MS13 in Vancouver Canada on the highway of tears.

Juan's bank accounts and cryptocurrencies was transferred before his death never to be found again.

Juan's friends ran to Mexico with the FPGA software technology and where also found dead months later and their accounts where all transferred the same.

All deaths related to these events where determined suicide, and the mystery of their deaths continues as everyone suspects MS13 hackers.

NOW, there are three reasons why FPGA circuit-boards have never really made it to the masses until today and the above mentioned is the first.

The Reason #1, is the lacking of non-programmable flexibility and software to architecture specifics. 

FPGA boards are not easy to software program, and they can be programmed to mine cryptocurrency. 

In order to use a FPGA board you must have hardware and software programming abilities.

The GPU works differently and the only changes enabled is to tweak the clock speed, and mining software.

The FPGA circuit-board has got to be programmed in raw-code from scratch in order to mine cryptocurrency. Writing the code in Verilog or VHDL language -and– neither Python nor C++ works, but only Verilog or VHDL languages.

Only dedicated programmers are capable managing this task from beginning equation to end resolved solution.

The Reason #2, is the creation of the first ASIC for mining cryptocurrencies, Unlike FPGA, was an ASIC hard-coded as a plug and play hardware only and not reprogrammable. 

Anyone can use an ASIC Miner-Box. There were a lot of alternatives to ASIC mining-box. Computer programmers have had the option of the GPU rigs and resolved into mining lesser coins than an FPGA circuit-board capable of.

The ASIC miner-box's are dominating the mining pools and Personal Computer Graphic Card GPU's are now less used technology.

The FPGA are becoming the average miner hardware these days.

There are several reasons FPGA are way faster. 


The FPGA circuit-board cards perform 3x to 100x times more efficient than GPU while having the same wattage power voltage draw saving hundred$

Depending on the algorithm matched to bitcoin, FPGA never fall behind ASICs miner-box's.

Upsides of FPGA

+ Compatibility for all mining currencies provided you are a  flexibility with Verilog or VHDL programming languages, or have partnered up with a programmer regards all cryptocurrency mining algorithms. There are no soft-forks affecting mining operations provided the programmer updates FPGA bitstream.

+ Extreme power efficiency compared to CPU's and GPUs.

Downsides of FPGA

+ FPGA have to be plugged into operational computers, just like GPUs.

+ Xilinux Vertex FPGA available to the mainstream for now (there are some exceptions though, which is why this article exists, more on that below).

+ Are quite pricey compared to GPUs.

+ Can be slightly outperformed by ASICs depending on the algorithm.

Bitstream:

The Bitstream is the program written on a low-level programming language known as Verilog or VHDL that tells the FPGA what to do.

If you want to mine a specific algorithm you must have a bitstream that tells the FPGA how to mine that specific algo.

Bitstreams are loaded to the FPGA once the system boots. The bitstream is loaded into the volatile FPGA RAM memory. This is the same DDR4 memory – the FPGA model people use for mining has got 64GB of it. 

This huge amount of RAM allows the FPGA to store hundreds of bitstreams and switch between those in fractions of a second.

This functionality allows an FPGA to mine algorithms such as Timetravel10, X11Evo, X16R and X16S that require the chip to switch between various “lesser” hashing algorithms every few minutes.

While the bitstream can be changed in a fraction of a second, the board can still mine only one algorithm at a time with a couple rare exceptions.

Anyone can create bitstreams for existing mining algos and Zetheron (name of the company) will be collecting a fixed fee on behalf of the developers. This will ensure:

  • safety to the developers of bitstreams – they will get paid for their work and

  • no entry fee for FPGA owners – you pay only if the bitstream you have downloaded works

  • Plus, the access to a diversity of community-made bitstream will certainly guarantee that we will be able to mine virtually any algo and fork we want.

As for today, Zethereon has working bitstream for Cryptonote and Lyra2z algos. “The current plan is to release approximately one algorithm per month, until all major algorithms in the above table have been covered.”  – Zetheron writes. Here is a table of all the planned coins for the VU9P FPGA.

This means that thanks to the work those guys did, we will now have a seamless, pretty much plug-and-play experience when using our FPGA boards.

The ecosystem Zetheron is creating will give us all the bitstream solutions we need to mine any popular algorithm we want without the need to know anything about programming. Plus, developers will be motivated to push the plank higher and create better bitstreams.



 




 

Ultra96 Monero Miner



Ultra96: Defined




Avnet Ultra96
Price: $249
Part Number: AES-ULTRA96-G
Device Support:
Zynq UltraScale+ MPSoC
 Vendor: Avnet
The Future is AVNET...

Program Tier:

  1. Premier
  2. View Partner Profile


Product Description:

Ultra96™ is an ARM-based, Xilinx Zynq UltraScale+™ MPSoC development board based on the Linaro 96Boards specification. The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers. Ultra96 represents a unique position in the 96Boards community with a wide range of potential peripherals and acceleration engines in the programmable logic that is not available from other offerings.

Key Features and Benefits:

  • Linaro 96Boards Consumer Edition compatible
  • 85mm x 54mm form factor
  • 60-pin 96Boards High speed expansion header
  • 40-pin 96Boards Low-speed expansion header
  • 2x USB 3.0, 1x USB 2.0 Type A downstream ports
  • 1x USB 3.0 Type Micro-B upstream port
  • Mini DisplayPort (MiniDP or mDP)
  • Wi-Fi / Bluetooth
  • Delkin 16 GB MicroSD card + adapter
  • Micron 2 GB (512M x32) LPDDR4 Memory
  • Xilinx Zynq UltraScale+ MPSoC ZU3EG A484

What's Included:

  • 16 GB pre-loaded microSD card + adapter
  • Quick-start instruction card
  • Ultra96 development board
  • Voucher for SDSoC license from Xilinx
==================






















About:

  • The Ultra96 is the Arm-based, Xilinx Zynq UltraScale+ MPSoC development board based on the Linaro 96Boards specification.
  • We sat down with Robert Wolff and Sahaj Sarup from the 96Boards team within Linaro, to talk about the new Ultra96 FPGA board.
  • This powerful, adaptable single-board computer runs PetaLinux, and is perfect for flexible application development within image processing, AI, and more.

http://zedboard.org/product/ultra96
http://96boards.org
http://linaro.org
Part 2: Demos – https://youtu.be/MoCFhOiGj6c
Part 3: SDSoC – https://youtu.be/pdXdKK14xfo


The 96Boards’ specifications are open and define a standard board layout for development platforms that can be used by software application, hardware device, kernel, and other system software developers.

Ultra96 represents a unique technological position in the 96Boards tier community with a wide range of potential peripherals and acceleration-software engines.



This is all umbrella by the programmable logic that is not available from other offerings other-than the tier group community standardized; CISCO and what you qualify for in consumer sales.

Ultra96 boots from the provided Delkin 16 GB Micro-SD card, pre-loaded with Peta-Linux Operating-System.

Engineers have options of connecting to Ultra96 through an intranet/internal Webserver using integrated wireless access point capability or to use the provided Peta-Linux desktop environment.




The Peta-Linux desktop environment can be viewed on the integrated Mini DisplayPort video output onto a PC and-or MAC hardware monitor interface.

Multiple application examples and on-board development options are provided as examples.

Ultra96 provides novice to professional four user-controllable LEDs.


Engineers may also interact with the board through the 96Boards-compatible low-speed and high-speed expansion connectors by adding peripheral accessories such as those included in Seeed Studio’s Grove Starter Kit for 96Boards.



Micron LPDDR4 memory provides 2 GB of RAM in a 512M x 32 configuration. Wireless options include 802.11b/g/n Wi-Fi and Bluetooth 4.2 (provides both Bluetooth Classic and Low Energy (BLE)). UARTs are accessible on a header as well as through the expansion connector. JTAG is available through a header (external USB-JTAG required). I2C is available through the expansion connector.



Ultra96 provides one upstream (device) and two downstream (host) USB 3.0 connections. A USB 2.0 downstream (host) interface is provided on the high speed expansion bus. Two Microchip USB3320 USB 2.0 ULPI Transceivers and one Microchip USB5744 4-Port SS/HS USB Controller Hub are specified.
The integrated power supply generates all on-board voltages from an external 12V supply (available as an accessory).


 

 Features:

  • Xilinx Zynq UltraScale+ MPSoC ZU3EG SBVA484.
  • Micron 2 GB (512M x32) LPDDR4 Memory.
  • Delkin 16 GB MicroSD card + adapter.
  • Pre-loaded with PetaLinux environment.
  • Wi-Fi / Bluetooth.
  • Mini DisplayPort (MiniDP or mDP).
  • 1x USB 3.0 Type Micro-B upstream port.
  • 2x USB 3.0, 1x USB 2.0 Type A downstream ports.
  • 40-pin 96Boards Low-speed expansion header.
  • 60-pin 96Boards High speed expansion header.
  • 85mm x 54mm form factor.
  • Linaro 96Boards Consumer Edition compatible.

 Target Applications:

  • Artificial Intelligence.
  • Machine Learning.
  • IoT/Cloud connectivity for add-on sensors.

 Optional Add-On Items:

  •  External 2.0A @ 12V power supply
  •  USB-to-JTAG/UART pod (coming soon)
  •  Seeed Studios Grove Starter Kit for 96Boards
  •  Compatible Accessories

 

 Other Qualified microSD Cards:

  •  Delkin Utility MLC 128 GB microSD Card

Squirrels Research CVP-13_FPGA

123

Squirrels Research Labs and BittWare Launch World's Most Powerful Cryptocurrency FPGA Card.


(NORTH CANTON, Ohio, Sept. 07, 2018 (GLOBE NEWSWIRE via COMTEX)

 --- Squirrels Research Labs (SQRL) recently partnered with BittWare, a Molex company and provider of high-performance computer boards, to offer the world's most powerful FPGA cryptocurrency mining hardware, the BittWare CVP-13.

The CVP-13 uses the largest Virtex UltraScale+ FPGA chip available from Xilinx Inc. XLNX, -1.10%

"By utilizing the largest chip available from Xilinx, the VU13P, BittWare's CVP-13 offers the most powerful cryptocurrency FPGA card in existence," SQRL president David Stanfill said.

The CVP-13 provides the most processing power of any FPGA cryptocurrency mining card available--46 percent more logic, 31 percent more on-chip memory and a 66 percent larger power supply than other popular boards that use the Xilinx VU9P FPGA chip.

























Unlike other FPGA mining hardware in the market, the CVP-13 uses a 300 ampere power supply, allowing users to maximize the potential of the VU13P FPGA chip on the board.

Similar hardware in the market requires tuning that pulls more power than the hardware is rated for, causing efficiency losses.

The CVP-13's factory-designed and installed Viper cooling options include liquid cooling to increase efficiency.

"The efficiency and power gains obtained with the CVP-13 allow for higher density deployments with less hosting overhead," Stanfill explained.

"For every three VU9P-based boards, you only need two CVP-13 units to achieve similar performance goals."

In addition to cooler and more efficient performance, the increased amount of on-chip logic allows larger algorithms to fit that aren't available on other hardware. These algorithms include X17r, X16r and TimeTravel10.

CVP-13 cards can be chained together with QSFP28 and SlimSAS cables, enabling them to run larger algorithms like Equihash variants.

The CVP-13 has quadruple the amount of bandwidth of competitive cards and includes four QSFP28 cages and two SlimSAS connectors for board-to-board communication.

The CVP-13 includes the ability to run secure bitstreams developed and supported by AllMine Inc., and SQRL.

"BittWare has been producing high-end boards for three decades," Stanfill continued.

"They're a well-known and respected brand in the FPGA industry and now have the strength of Molex for even stronger market agility to take on projects like this."

BittWare CVP-13 FPGA boards are available for purchase now through SQRL with deliveries expected in November. Hosted CVP-13 hardware is also available through The Mineority Group.

Both options can be purchased at http://store.mineority.io.

"We're very excited to continue bringing groundbreaking FPGA technology into this market," Stanfill said.

About Squirrels Research Labs

Squirrels Research Labs, or SQRL, is a research and development division of Squirrels LLC. SQRL focuses time on projects that keep Squirrels forward thinking and adaptable in competitive markets.

Squirrels Research Labs found at http://squirrelsresearch.com.

About BittWare, a Molex company

For three decades, BittWare has designed and deployed high-end signal processing, network processing and high-performance computing board-level solutions that significantly reduce technology risk and time-to-revenue for OEM customers.

BittWare products are based on the latest FPGA technology and industry-standard COTS form factors, including PCIe.

When customer requirements make it difficult to use industry-standard boards, BittWare provides modified solutions and/or licensed designs.

Need more information on BittWare and FPGA: www.BittWare.com.

FPGA Monero Mining Source-Codes




FPGA:





Video Source in Title link Blow:

AltCoin Mining, Source-code DeCompile ReCompile.








Reference Material Link To Siste embedded in Title.

Xilinx Virtex-7 2000T FPGA provides over 20 million ASIC gates per-chip Date: 09m-24d-2018y, the present purchase cost of a Xilinx XC7V2000T Chip is $1k.




Xilinx has announced the first shipments of its Virtex-7 2000T Field Programmable Gate Array (FPGA). The Virtex-7 2000T is the world’s highest-capacity programmable logic device – it contains 6.8 billion transistors, providing customers access to 2 million logic cells.

This is equivalent to 20 million ASIC gates, which makes these devices ideal for system integration, ASIC replacement, and ASIC prototyping and emulation.

This capacity is made possible by Xilinx’s Stacked Silicon Interconnect technology – also referred to as 2.5D ICs. The simplest packaging technology is to have a single die in the package.

The next step up the “complexity ladder” is to have multiple die is the same package, but for all of these die to be attached directly to the package substrate. In this case, compared to the tracks on the die, the tracks on the package substrate are relatively large, slow, and driving signals onto them consumes a lot of power.

In this first incarnation of the technology, four FPGA die are attached to the silicon interposer, which – in addition to connecting the FPGAs to each other – provides connections to the package as illustrated below.


In the case of the Virtex-7 2000T, the FPGA die are implemented at the 28 nm technology node, while the passive silicon interposer is implemented at the 65 nm technology node. Implementing the large silicon interposer at this higher node reduces costs and increases yield without significantly degrading performance.

One way to think about this is that the silicon interposer essentially adds four additional tracking layers that can be used to connect the FPGAs to each other with more than 10,000 connections between each pair of adjacent die!

On top of this, Through-Silicon Vias (TSVs) are used to pass signals through the silicon interposer to C4 bumps on the bottom of the interposer. These bumps are then used to connect the interposer to the package substrate.


A view of Xilinx’s Virtex-7 2000T device showing the
packaging substrate (bottom), silicon interposer (middle),
and four FPGA die (top).


Compared with having to use standard I/O connections to integrate two FPGAs together on a circuit board, this stacked silicon interconnect technology is said to provide over 100X the die-to-die connectivity bandwidth-per-watt, at one-fifth the latency, without consuming any of the FPGAs' high-speed serial or parallel I/O resources.

Of particular interest to designers is the fact that, despite being composed of four die, the Virtex-7 2000T preserves the traditional FPGA use model in that users will program the device as one extremely large FPGA with the Xilinx tool flow and methodology.

Xilinx’s first application of 2.5D IC stacking gives customers twice the capacity of competing devices and leaps ahead of what Moore’s Law could otherwise offer in a monolithic 28-nanometer (nm) FPGA.

Xilinx says that its customers can use Virtex-7 2000T FPGAs to replace large capacity ASICs to achieve overall comparable total costs in a third of the time, creating integrated systems that increase system bandwidth and reduce power by eliminating I/O interconnect, and accelerating the prototyping and emulation of advanced ASIC systems.


A top and bottom view of Xilinx’s Virtex-7 2000T
device,
the world’s highest-capacity FPGA using
Stacked Silicon Interconnect technology.


 “The Virtex-7 2000T FPGA marks a major milestone in Xilinx’s history of innovation and industry collaboration,” said Victor Peng, Xilinx Senior Vice President, Programmable Platforms Development.  

“Of significance to our customers is the fact that Stacked Silicon Interconnect technology offers capacities that otherwise wouldn’t be possible in an FPGA for at least another process generation. 

They can immediately add new functionality to existing designs while forgoing an ASIC, cost reduce a 3 or 5 FPGA solution into a single FPGA or move ahead with prototyping and building system emulators using our largest FPGAs at least a year earlier than typical for a new generation.”


The Virtex-7 2000T device also provides equipment manufacturers with an integration platform that will help them overcome the challenges of lowering power while increasing performance and capabilities.

By eliminating the I/O interfaces between different ICs on a circuit board, a system’s overall power consumption can be reduced considerably.

Consider the following example provided by Xilinx that compares a single Virtex-7 2000T with four of the largest monolithic ICs as illustrated below:


Actually, this is not really a fair comparison, because in terms of capacity the Virtex-7 2000T is equivalent to only around two of the largest monolithic ICs. But even comparing to two monolithic ICs results in a significant power advantage. (Having said this, I’d be interested to know just what was being exercised in this example – Logic? Memory? DSP slices? SERDES channels? – and at what frequency.)



Reference Material Link To Siste embedded in Title.

FPGA programming step by step...


FPGAs and microprocessors are more similar than you may think. Here's a primer on how to program an FPGA and some reasons why you'd want to. Small processors are, by far, the largest selling class of computers and form the basis of many embedded systems. The first single-chip microprocessors contained approximately 10,000 gates of logic and 10,000 bits of memory. Today, field programmable gate arrays (FPGAs) provide single chips approaching 10 million gates of logic and 10 million bits of memory. Figure 1 compares one of these microprocessors with an FPGA.

Figure 1: Comparison of first microprocessors to current FPGAs

Powerful tools exist to program these powerful chips. Unlike microprocessors, not only the memory bits, but also the logical gates are under your control as the programmer. This article will show the programming process used for FPGA design.
As an embedded systems programmer, you're aware of the development processes used with microprocessors. The development process for FPGAs is similar enough that you'll have no problem understanding it but sufficiently different that you'll have to think differently to use it well. We'll use the similarities to understand the basics, then discuss the differences and how to think about them.
Similarities
Table 1 shows the steps involved in designing embedded systems with a microprocessor and an FPGA. This side-by-side comparison lets you quickly assess the two processes and see how similar they are.

Table 1: Step-by-step design process for microprocessors and FPGAs


FPGA Monero Working IP-Cores Shares









Build Status

DownLoad - Click Here - SiaFpgaMiner

This project is a VHDL FPGA core that implements an optimized Blake2b pipeline to mine Siacoin.

Motivation

When CPU mining got crowded in the earlier years of cryptocurrencies, many started mining Bitcoin with FPGAs. The time arrived when it made sense to invest millions in ASIC development, which outperformed FPGAs by several orders of magnitude, kicking them out of the game. The complexity and cost of developing ASICs monopolized Bitcoin mining, leading to relatively dangerous mining centralization. Therefore, emerging altcoins decided to base their PoW puzzle on other algorithms that wouldn't give ASICs an unfair advantage (i.e. ASIC-resistant). The most popular mechanism has been designing the algorithm to be memory-hard (i.e. dependent on memory accesses), which makes memory bandwith the computing bottleneck. This gives GPUs an edge over ASICS, effectively democratizing access to mining hardware since GPUs are consumer electronics. Ethereum is a clear example of it with its Ethash PoW algorithm.
Siacoin is an example of a coin without a memory-hard PoW algorith and no ASIC miners some ASIC miners are being rolled out (see Obelisk and Antminer A3). So was a perfect candidate for FPGA mining! (more for fun than profit)

Design theory

To yield the highest posible hash rate, a fully unrolled pipeline was implemented with resources dedicated to every operation of every round of the Blake2b hash computation. It takes 96 clock cycles to fill the pipeline and start getting valid results (4 clocks per 'G' x 2 'G' per round x 12 rounds).
  • MixG.vhd implements the basic 'G' function in 4 steps. Eight and two-step variations were explored but four steps gave the best balance between resource usage and timing.
  • QuadG.vhd is just a wrapper that instantiates 4 MixG to process the full 16-word vectors and make the higher level files easier to understand.
  • Blake2bMinerCore.vhd instantiates the MixG components for all rounds and wires their inputs and outputs appropiately. Nonce generation and distribution logic also lives in this file.
  • /Example contains an example instantiation of Blake2bMinerCore interfacing a host via UART. It includes a very minimalist Python script to interface the FPGA to a Sia node for mining.

MixG

The diagram below shows the pipeline structure of a single MixG. Four of these are instantiated in parall to constitute QuadGs, which are chained in series to form rounds.
MixG logic
The gray A, B, C, D boxes contain combinatorial operations to add and rotate bits according to the G function specification. The white two-cell boxes represent two 64-bit pipelining registers to store results from the combinatorial logic that are used later on the process.

Nonce Generation and Distribution

Pipelining the hash vector throughout the chain implies heavy register usage and there is no way around it. Fortunately the X/Y message feeds aren't as resource-demanding because the work header can remain constant for a given block period, with the exception of the nonce field, which must obviously be changing all the time to yield unique hashes. Therefore, the nonce field must be tracked or kept in memory for when a given step in the mixing logic requires it. The most simplistic approach would be to make a huge N-bit wide shift register to "drag" the nonce corresponding to each clock cycle across the pipeline. This is not an ideal solution, for we would require N flip-flops (e.g. 48-bit counter) times the number of clock cycles it takes to cross the pipeline (48 x 96 = 4608 FF!)
Luckily, the nonce field is only used once per round (12 times total). This allows hooking up 12 counters statically to the X or Y input where the nonce part of the message is fed in each round. To make the counter output the value of the nonce corresponding to a given cycle, the counters' initial values are offset by the amount of clock cycles between them. The following diagram illustrates the point:
Nonce counters
In this case the offsets show that the nonce used in round zero will be consumed by round one 8 clock cycles after, by round two 20 cycles after, and so on. (The distance in clock cycles between counters is defined by the Blake2b message schedule)

Implementation results

It is evident that a single core is too big to fit in a regular affordable FPGA device. A ballpark estimate of the flip-flop resources a single core could use:
  • 64-bits per word x 16 word registers per MixG x 4 MixG per QuadG x 2 QuadG per round x 12 rounds = 98,308 registers (not considering nonce counters and other pieces of logic).
The design won't fit in your regular Spartan 6 dev board, which is why I built it for a Kintex 7 410 FPGA. Here are some of my compile tests:
CoresClockHashrateMix stepsStrategyUtilizationWorst Setup SlackWorst Hold SlackFailuresNotes
12002004Default18.00%0.1680
22004004Default38.00%0
32006004Default56.00%-0.246602 failing endpoints
32006004Explore56.00%-0.2460.011602 failing endpoints
3166.67500.014Default56.00%0.1320.020
4166.67666.684Default75.00%0.0510.0090
51668304ExplorePlacing error
4173.33693.324Explore75.00%0.03900
4173.33693.324Explore75.00%0.170.02201 BUFGs per core
As seen in the table, the highest number of cores I was able to instantiate was 4 and the highest clock flequency that met timing was 173.33 MHz.
~700 MH/s is no better than a mediocre GPU, but power draw is way less! (hey, I did say it was for fun)

Further work

  • Investigate BRAM as alternative to flip-flops (unlikely to fit the needs of this application).
  • Fine-tune a higher clock frequency to squeeze out a few more MH/s.
  • Porting to Blake-256 for Decred mining. That variant adds two rounds but words are half as wide, so fitting ~2x the number of cores sounds possible.
  • Do more in-depth tests with different number of steps in the G function (timing-resources tradeoff).
  • Play more with custom implementation strategies.

Resources

Anti-AdBlocker

Search This Blog