Wednesday, October 14, 2020

FPGA XCV7-690T/HTG700 Hourly-Data-Updates

Referance/Source: LighWave

March 29, 2012

Author Lightwave Staff

Lightwave Staff

Xilinx, Inc. (NASDAQ: XLNX) says it has started shipping the Virtex-7 X690T FPGA, which combines reliable high-speed serial transceivers, high system bandwidth, and optimized FPGA resources. The company claims that this is the first FPGA device, based on 80 GTH serial transceivers, to break the 2-Tbps bandwidth barrier.

The Virtex-7 X690T FPGA is the first of a set of devices in the 7 series to address advanced high-performance wired communication applications that require low-power, single-chip approaches. These devices are designed to enable fast, scalable, and easy-to-implement chip-to-chip serial interfaces, robust 10GBASE-KR backplanes that maximize bandwidth over board-to-board distances in next-generation communications equipment, and high signal-integrity interfaces to the latest optical modules validated to support cable distances of up to 80 km.

Customers who need even greater system capacity and bandwidth can migrate to the Virtex-7 X1140T FPGA, built using 3D Stacked Silicon Interconnect technology on the 7 series FPGA scalable optimized architecture. Shipments of footprint-compatible Virtex-7 X1140T FPGAs with 96 GTH transceivers will follow in May.

"Xilinx has taken its years of experience in serial transceivers – having shipped to date over 75 percent of all FPGAs that include serial transceivers – and combined that with the innovative low power, 28-nm 7 series FPGA architecture to deliver a highly optimized family for the wired communication marketplace," said Tim Erjavec, senior director of FPGA platforms at Xilinx. "Customers can confidently implement their designs starting with Virtex-7 X690T FPGAs today and move up to larger FPGAs that provide the market-optimized ratio of resources when higher integration is required."

The Virtex-7 X690T and X1140T FPGAs enable engineers to design and implement advanced packet processing, forward error correction, quality-of-service, switching, and traffic management algorithms. The dynamically controllable GTH serial transceivers include fully programmable three-tap FIRs that enable the transmitter de-emphasis needed to deal with the widest range of environments, along with fully-adapting seven fixed and four sliding tap receiver decision feedback equalization (DFE) circuits -- the most DFE taps in the industry, Xilinx claims -- to ensure the greatest margin over different topologies. To accelerate design and debugging, each GTH transceiver includes a non-destructive, high-resolution 2D eye-scan circuit that allows designers to see and measure the receiver eye from within the FPGA.

With 80 GTH transceivers that run up to 13.1 Gbps, the Virtex-7 X690T FPGA is first FPGA to break the 2 Tbps single FPGA device bandwidth barrier. By leveraging the advanced 7 series FPGA architecture built on the TSMC 28HPL process, Xilinx claims that customers can save more than 25 percent total power over competing similar-density FPGAs, allowing them to achieve the integration they need to build next-generation systems that achieve performance and low power requirements.

Virtex-7 X690T FPGA engineering samples are available today. Samples of Virtex-7 X1140T FPGAs will be available in May.

Digital signal processing specialist RADX Technologies, Inc. is using the new parts to design an EdgeQAM device to deliver narrowcast services such as video on demand. "As consumers demand more narrowcast services from cable companies, equipment manufacturers are faced with the challenge of providing future-proof solutions that adapt to emerging standards that are constantly evolving," said Ross Smith, CEO at RADX. "The EdgeQAM solution being developed by Xilinx and RADX benefits from the compatibility, flexibility, and density of the Virtex-7 X690T FPGA to provide systems that substantially increase QAM channel density while staying inside existing power envelopes."

Xilinx ISE-LabTools Tutorial

Xilinx Tools Tutorial

qweqwe

Introduction

The Xilinx Integrated Software Environment (ISE) is a powerful and complex set of tools. The purpose of this guide is to help new users get started using ISE to compile their designs. This guide provides a very high-level overview of how the tools work, and takes the reader through the process of compiling. The ultimate reference to ISE is of course the official documentation, which is installed on every PC in the lab, and is available from the Xilinx website. Because the documentation is so voluminous, this guide will attempt to provide some help with finding the right sections of the documentation to read. Don't miss the required reading section at the end of this guide which points out some sections of the documentation that every 6.111 student should read before beginning a complex labkit project.

From HDL to FPGA

The process of converting hardware design language (HDL) files into a configuration bitstream which can be used to program the FPGA, is done several steps.
First, the HDL files are synthesized. Synthesis is the process of converting behavioral HDL descriptions into a network of logic gates. The synthesis engine takes as input the HDL design files and a library of primitives. Primitives are not necessarily just simple logic gates like AND and OR gates and D-registers, but can also include more complicated things such as shift registers and arithmetic units. Primitives also include specialized circuits such as DLLs that cannot be inferred by behavioral HDL code and must be explicitly instantiated. The libraries guide in the Xilinx documentation provides an complete description of every primitive available in the Xilinx library. (Note that, while there are occasions when it is helpful or even necessary to explicitly instantiate primitives, it is much better design practice to write behavioral code whenever possible.)
In 6.111, we will be using the Xilinx supplied synthesis engine known as XST. XST takes as input a verilog (.v) file and generates a .ngc file. A synthesis report file (.srp) is also generated, which describes the logic inferred for each part of the HDL file, and often includes helpful warning messages.
The .ngc file is then converted to an .ngd file. (This step mostly seems to be necessary to accommodate different design entry methods, such as third-part synthesis tools or direct schematic entry. Whatever the design entry method, the result is an .ngd file.)
The .ngd file is essentially a netlist of primitive gates, which could be implemented on any one of a number of types of FPGA devices Xilinx manufacturers. The next step is to map the primitives onto the types of resources (logic cells, i/o cells, etc.) available in the specific FPGA being targeted. The output of the Xilinx map tool is an .ncd file.
The design is then placed and routed, meaning that the resources described in the .ncd file are then assigned specific locations on the FPGA, and the connections between the resources are mapped into the FPGAs interconnect network. The delays associated with interconnect on a large FPGA can be quite significant, so the place and route process has a large impact on the speed of the design. The place and route engine attempts to honor timing constraints that have been added to the design, but if the constraints are too tight, the engine will give up and generate an implementation that is functional, but not capable of operating as fast as desired. Be careful not to assume that just because a design was successfully placed and routed, that it will operate at the desired clock rate.
The output of the place and route engine is an updated .ncd file, which contains all the information necessary to implement the design on the chosen FPGA. All that remains is to translate the .ncd file into a configuration bitstream in the format recognized by the FPGA programming tools. Then the programmer is used to download the design into the FPGA, or write the appropriate files to a compact flash card, which is then used to configure the FPGA.

Constraints

By itself, a Verilog model seldom captures all of the important attributes of a complete design. Details such as i/o pin mappings and timing constraints can't be expressed in Verilog, but are nonetheless important considerations when implementing the model on real hardware. The Xilinx tools allow these constraints to be defined in several places, the two most notable being a separate "universal constraints file" (.ucf) and special comments within the Verilog model.
A .ucf file is simply a list of constraints, such as

net "ram0_data<35>" loc="ab25"

which indicates that bit 35 of the signal ram0_data (which should be a port in the top-level Verilog module) should be assigned to pin AB25 on the FPGA. Sometimes it is useful to combine several related constraints on one line, using "|" characters to separate constraints.

net "ram0_data<35>" loc="ab25" | fast | iostandard=lvdci_33 | drive=12;

The above example again assigns bit 35 of the signal ram0_data to pin AB25, and also specifies that the i/o driver should be configured for fast slew rate, 3.3V LVTTL level signaling (with a built-in series termination resistor), and a drive strength of 12mA. See the Xilinx documentation for more details, but don't worry: all of the pin constraints have been written for you (more on this later). Constraints can also be specified within special comments in the Verilog code. For example,

reg [7:0] state;
// synthesis attribute init of state is "03";

The Xilinx synthesis engine will identify the phrase "synthesis attribute" within any comments, and will add the constraint following this phrase to the list of constraints loaded from a .ucf file. In the above example, the initial state of the state signal after the FPGA finishes configuring itself will be set to 8'h03. In general, it is bad form to specify the initial state of signals using constraints, rather than implementing a reset signal. In some advanced designs, however, such initializations are sometimes necessary. The tools recognize a huge variety of constraints, and an entire section of the online manual is dedicated to explaining them. Fortunately, understanding a few simple constraints is sufficient for most designs.

ISE and the 6.111 Labkit

The FPGA used in the labkit has 684 i/o pins, and most of them are actually being used. To simplify the process of adding pin constraints to a new design, two template files have been developed The file labkit.v is a template top-level Verilog module. This module defines names for all of the signals going in or out of the FPGA. Additionally, it provides default assignments for all FPGA outputs, so that unused circuits on the labkit are disabled. A template constraints file, labkit.ucf ties the signals in labkit.v to the appropriate physical FPGA pins.

A Tutorial

Download the following source files:

Toplevel template module: labkit.v
Sample design: counter.v
Standard labkit constraints file: labkit.ucf

Create a New Project

Select "New Project" from the "File" menu
Enter a project name and choose a project location. Be sure the top-level module type is set to HDL (hardware design language).
On the next form, enter the target device information. The FPGA on the labkits is a Virtex 2 family XC2V6000, speed grade 4, in a BF957 package.
The next form allows you to create a new source file. If you have already written the Verilog code for your project, click "Next" to advance to the next form, which allows you to add existing source files. You can also add existing files to your project later, after the project has been created.
After creating or adding any source files, click "Next" until you reach the project summary form, and then click "Finish" to create the project.

Add Sources and Constraints

If you did not add all of your Verilog source files when creating the project, add them now by choosing "Add Source..." from the "Project" menu.

Modify the Top-Level Template

To instantiate the counter module, add the following code to labkit.v, just before the endmodule statement.
```
counter counter1 (.clock(clock_27mhz), .led(led));
```
The stock labkit.v file assigns a default value to the LED outputs. In order to drive the LEDs with the counter module, we need to delete the following line, which can be found near the end of labkit.v.
```
assign led = 8'hFF; // Turn off all LEDs
```

Compile the Design

The labkit.ucf file includes constraints for all of the FPGA outputs defined in labkit.v. Most designs, however, will not use all of these signals, in which case the synthesis engine will optimize the unused signals out of the design. It is therefore necessary to tell the place-and-route tool not to generate an error if it encounters a pin constraint for a (now) nonexistent signal. To do this, right-click on the "Implement Design" item in the process pane, and select "Properties...". Check the box to "allow unmatched LOC constraints".
Make sure the top-level labkit module is selected in the sources window.
Double click "Generate Programming File" in the tasks window. This will cause the "Synthesize", "Implement Design" and "Generate Programming File" tasks to be run in order. A green check mark will appear beside each task when it is successfully completed. It will take approximately 5 minutes for everything to complete. (The tools seem to hang for a few minutes while generating the pad report, but this normal.)

Move the Design to a Compact Flash Card

In the process pane, double-click on "Generate PROM, ACE or JTAG file". (You may have to expand the "Generate Programming File" line to see this item.)
The iMPACT programming tool will launch, and a wizard will ask you several questions about what you want to do. You want to generate a SystemACE file.
When asked to choose between the CF and MPM versions of SystemACE, choose CF (Compact Flash). The operating mode is unimportant.
It is not necessary to specify the size of the CF card.
Choose a name and a location for the SystemACE file collection you are about to generate.
SystemACE allows up to eight designs to be stored on a single CF card. A switch on the labkit selects which design will be loaded into the FPGA. Check the boxes for as many designs as you want to include on the CF card.
iMPACT will ask you to add design files for each configuration. You can potentially add more than one design file for each of the eight configurations (if there were more than one FPGA on the board, for example), but for the labkits, only add one design per configuration.
Ignore the warnings about changing the configuration clock.
Once all of the configurations have been specified, choose "Finish" and generate the .ace files. This takes a minute or two.
Right click anywhere in the upper pane of the iMPACT window (with the system diagram) and select "Copy to CompactFlash...". Select the collection you just generated and copy it to the CF card.
Insert the CF card in the labkit. Make sure the configuration source is set to "JTAG/CF", and select the configuration selector switch to the appropriate number. Power on the labkit, and your configuration should load.

10-FPGA Programming Methods

123

Adam Taylor

6/10/2016 09:47 AM EDT
21 comments

6 saves

Despite the recent push toward high level synthesis (HLS), hardware description languages (HDLs) remain king in field programmable gate array (FPGA) development. Specifically, two FPGA design languages have been used by most developers: VHDL and Verilog. Both of these “standard” HDLs emerged in the 1980s, initially intended only to describe and simulate the behavior of the circuit, not implement it.

However, if you can describe and simulate, it’s not long before you want to turn those descriptions into physical gates.
For the last 20 plus years most designs have been developed using one or the other of these languages, with some quite nasty and costly language wars fought. Other options rather than these two languages exist for programming your FPGA. Let’s take a look at what other tools we can use.
C / C++ / System C

Click here for larger image

The C, C++ or System C option allows us to leverage the capabilities of the largest devices while still achieving a semblance of a realistic development schedule... although that may just be my engineering management side coming out.

The ability to use C-based languages for FPGA design is brought about by HLS (high level synthesis), which has been on the verge of a breakthrough now for many years with tools like Handle-C and so on. Recently it has become a reality with both major vendors, Altera and Xilinx offering HLS within their toolsets Spectra-Q and Vivado HLx respectively.
A number of other C-based implementations are available, such as OpenCL which is designed for software engineers who want to achieve performance boosts by using a FPGA without a deep understanding of FPGA design. Whereas HLS is still very much in the area of FPGA engineers who want to increase productivity.
As with HDL, HLS has limitations when using C-based approaches, just like with traditional HDL you have to work with a subset of the language. For instance, it is difficult to synthesize and implement system calls, and we have to make sure everything is bounded and of a fixed size.
What is nice about HLS, however, is the ability to develop your algorithms in floating point and let the HLS tool address the floating- to fixed-point conversion.
As with many things, we are still at the start of the journey: I am sure over the coming years, we will see HLS increasingly used in different languages, making HLS similar to very low level of a software engineer’s C.

More info:

PDF: Spectra-Q

PDF: Vivado HLx

10 FPGA dev tools:

Page 1: C / C++ / System C

Page 2: MyHDL

Page 3: CHISEL

Page 4: JHDL

Page 5: BSV

Page 6: MATLAB

Page 7: LabVIEW FPGA

Page 8: SystemVerilog

Page 9: VHDL / VERILOG

Page 10: SPINAL HDL

123

FPGA software 1 - FPGA design software

FPGA vendors provide design software that support their devices. It does four main things:

Design-entry.
Simulation.
Synthesis / place-and-route.
Programming through special cables (JTAG).

There are usually two versions: one free that supports low to medium density FPGA devices, and a full (non-free) version of the same software for big devices.

Xilinx's free software is named ISE WebPACK, which is a scaled-down version of the full ISE software.
Altera's free software is named Quartus Web/Lite Edition, which is a scaled-down version of the full Quartus II software.

The free software is usually fine to start with because it is similar in functionality to the full version, and today's low to medium density devices are very capable.
Here's a summary of the features/limitations of the software:

	Xilinx's ISE or the free ISE WebPACK	Altera's Pro, Standard or (free) Web/Lite Quartus software
Design-entry	VHDL, Verilog, ABEL, Schematic, EDIF	VHDL, Verilog, SystemVerilog, AHDL, Schematic, EDIF
Core generator	Yes (CORE Generator)	Yes (MegaWizard Plug-Ins)
Functional simulation	No	No (last version with simulation was 9.1SP2)
Testbench simulation	Use ISim	Use ModelSim-Altera Starter Edition
Synthesis/P&R	Free version limited to small & medium devices	Free version limited to small & medium devices
Programming	Yes	Yes
FPGA editor	Yes (FPGA editor)	Yes (Chip Editor)
Embedded logic analyzer	ChipScope PRO (a separate product - not free)	SignalTap II (included in Quartus II Web/Lite edition)
Older versions	Available from ISE Classics	Available from the Quartus II Software Archive
OS support	Windows + Linux	Windows + Linux
Price	Free version: $0 Full version: starting at $2995 for a 12 month license	Free version: $0 Full version: $2995 for a 12 month license
Software matrix	Check here	Check here

Which is better?

As of this writing (May 2013), Quartus-II is better overall - it runs faster, has a better GUI, better HDL support and includes one killer feature: SignalTap II embedded logic analyzer, which is easy to use and available in the free edition. Altera's low point is their simulator - they dropped their own integrated simulator but didn't have anything to replace it so rely on ModelSim for now.
ISE is pretty good overall. Its low points are basic HDL support and ChipScope PRO (not part of the free suite).
Xilinx has a new software suite called Vivado but limited to high-end devices.
Xilinx traditionally had better silicon, and Altera better software... this seems to still hold true.

FPGA software 2 - FPGA design entry ❯

OpenCores: EDA Tools

Introduction

OpenCores is the world largest community focusing on open source development targeted for hardware. Designing IP cores, is unfortunately not as simple as writing a C program. A lot more steps are needed to verify the cores and to ensure they can be synthesized to different FPGA architectures and various standard cell libraries.

Open Source EDA tools

There are plenty of good EDA tools that are open source available. The use of such tools makes it easier to collaborate at the opencores site. An IP that has readily available scripts for an open source HDL simulator makes it easier for an other person to verify and possibly update that particular core. A test environment that is built for a commercial simulator that only a limited number of people have access to makes verification more complicated.

Icarus Verilog Simulator

Icarus Verilog is a Verilog simulation and synthesis tool. It operates as a compiler, compiling source code writen in Verilog (IEEE-1364) into some target format. For batch simulation, the compiler can generate an intermediate form called vvp assembly. This intermediate form is executed by the &qout;vvp&qout; command. For synthesis, the compiler generates netlists in the desired format.
The compiler proper is intended to parse and elaborate design descriptions written to the IEEE standard IEEE Std 1364-2005.
Icarus web site

Verilator

Verilator is a free Verilog HDL simulator. It compiles synthesizable Verilog into an executable format and wraps it into a SystemC model. Internally a two-stage model is used. The resulting model executes about 10 times faster than standalone SystemC.
Verilator has been used to simulate many very large multi-million gate designs with thousands of modules. Therefor we have chosen this tool to be used in the verification environment for the OpenRISC processor.
Verilator web site

GHDL VHDL simulator

GHDL implements the VHDL87 (common name for IEEE 1076-1987) standard, the VHDL93 standard (aka IEEE 1076-1993) and the protected types of VHDL00 (aka IEEE 1076a or IEEE 1076-2000). The VHDL version can be selected with a command line option.
GHDL web site

EMACS - text editor

GNU Emacs is an extensible, customizable text editor—and more.
Very good support for both Verilog HDL and VHDL editing.
Emacs web site

Fizzim is a FREE, open-source GUI-based FSM design tool

The GUI is written in java for portability. The backend code generation is written in perl for portability and ease of modification.

Features:

GUI:

Runs on Windows, Linux, Apple, anything with java.
Familiar Windows look-and-feel.
Visibility (on/off/only-non-default) and color control on data and comment fields.
Multiple pages for complex state machines.
"Output to clipboard" makes it easy to pull the state diagram into your documentation.

Backend:

Verilog code generation based on recommendations from experts in the field.
Output code has "hand-coded" look-and-feel (no tasks, functions, etc).
Switch between highly encoded or onehot output without changing the source.
Registered outputs can be specified to be included as state bits, or pulled out as independent flops.
Mealy and Moore outputs available.
Transition priority available.
Automatic grey coding available.
Code and/or comments can be inserted at strategic places in the output - no need to "perl" the output to add your copyright or `include

Fizzim web site

TCE

TCE is a toolset for designing application-specific processors (ASP) based on the Transport triggered architecture (TTA). The toolset provides a complete co-design flow from C programs down to synthesizable VHDL and parallel program binaries. Processor customization points include the register files, function units, supported operations, and the interconnection network.
TCE has been developed internally in the Tampere University of Technology since the early 2003. The current source code base consists of roughly 400 000 lines of C++ code.
TCE web site

C to Verilog translation

Available is an online C to Verilog compiler. The code generated by the site is licensed under BSD (use it "as is").
C-to-Verilog web site

Fedora Electronic Lab

Fedora Electronic Lab tries to provide a complete hardware design flow with the best opensource tools. We try to ensure interoperability as far as we can and we work with other opensource developers to improve existing EDA tools.
Fedora Electronic Lab web site

The FreeHDL Project

Linux - The logical choice for EDA

Subproject Teams:
AIRE Implementation
Frontend -parser/analyzer/codegen
Simulator
Debugger
Waveform Viewer
Testing/compliance

Links
Related Projects...

Commercial EDA software for Linux
Press

Subscribe to mailing list

Mailing list archives

Download

A project to develop a free, open source, GPL'ed VHDL simulator for Linux!

Project goals:
To develop a VHDL simulator that:

Has a graphical waveform viewer.

Has a source level debugger.

Is VHDL-93 compliant.

Is of commercial quality. (on par with, say, V-System - it'll take us a while to get there, but that should be our aim)

Is freely distributable - both source and binaries - like Linux itself. (Under the Gnu General Public License (GPL)).

Works with Linux. If others want to port it to other platforms they may, but it is not the goal of this project.

News:
FreeHDL is used by Qucs for digital simulation. Qucs is a circuit simulator with graphical user interface. Qucs aims to support all kinds of circuit simulation types, e.g. DC, AC, S-parameter, Transient, Noise and Harmonic Balance analysis. It is available from http://sourceforge.net/projects/qucs.
Download:
Release 0.0.7 of the FreeHDL compiler/simulater system can be downloaded from here.
Release 0.0.6 of the FreeHDL compiler/simulater system can be downloaded from here.
Release 0.0.5 of the FreeHDL compiler/simulater system can be downloaded from here.

XCV7T2000T/X690T/HTG700 Monero Bitstreams

Xilinx Virtex-7 V2000T-X690T-HTG700

Part Number: HTG-V7-PCIE-2000
Device Support:
- Virtex-7
Vendor: HiTech Global Distribution, LLC
Program Tier: Certified

View Partner Profile

Overview

BTC

Product Description:

Powered by Xilinx Virtex-7 V2000T, V585, or X690T the HTG700 is ideal for ASIC/SOC prototyping, high-performance computing, high-end image processing, PCI Express Gen 2 & 3 development, general purpose FPGA development, and/or applications requiring high speed serial transceivers (up to 12.5Gbps).

Key Features and Benefits:

Scalable via HTG-FMC-FPGA module (with one X980T FPGA) for
x8 PCI Express Gen2 /Gen 3 edge connectors
x3 FPGA Mezzanine Connectors (FMC)
x4 SMA ports (16 SMAs providing 4 Txn/Txp/Rxn/Rxp) clocked by external
DDR3 SODIMM with support for up to 8GB (shipped with a 1GB module)
USB to UART bridge
Configuration through JTAG or Micron G18 Flash

What's Included:

Reference Designs
Schematic, User Manual, UCF
The HTG700 Board

HTG-700: Xilinx Virtex™ -7 PCI Express Development Platform

Powered by Xilinx Virtex-7 V2000T, V585, or X690T the HTG-700 is ideal for ASIC/SOC prototyping, high-performance computing, high-end image processing, PCI Express Gen 2 & 3 development, general purpose FPGA development, and/or applications requiring high speed serial transceivers (up to 12.5Gbps).

Three High Pin Count (HPC) FMC connectors provide access to 480 single-ended I/Os and 24 high-speed Serial Transceivers of the on board Virtex 7 FPGA. Availability of over 100 different off-the-shelf FMC modules extend functionality of the board for variety of different applications.

Eight lane of PCI Express Gen 2 is supported by hard coded controllers inside the Virtex 7 FPGA. The board's layout, performance of the Virtex 7 FPGA fabric, high speed serial transceivers (used for PHY interface), flexible on-board clock/jitter attenuator, along with soft PCI Express Gen 3 IP core allow usage of the board for PCI Express Gen3 applications.

The HTG-700 Virtex 7 FPGA board can be used either in PCI Express mode (plugged into host PC/Server) or stand alone mode (powered by external ATX or wall power supply).

Features:

►Xilinx Virtex-7 V2000T, 585T, or X690T FPGA
►Scalable via HTG-777 FPGA module for providing higher FPGA gate density
► x8 PCI Express Gen2 /Gen 3 edge connectors with jitter cleaner chip
- Gen 3: with the -690 option
- Gen 2: with the -585 or -2000 option (Gen3 requires soft IP core)
►x3 FPGA Mezzanine Connectors (FMC)
- FMC #1: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
    Transceivers
- FMC #2: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
    Transceivers
- FMC #3: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
    Transceivers. Physical location of this connector allows plug-in FMC
    daughter cards having easy access to the board through the front panel.
►x4 SMA ports (16 SMAs providing 4 Txn/Txp/Rxn/Rxp) clocked by external pulse generators
►DDR3 SODIMM with support for up to 8GB (shipped with a 2GB module)
►Programmable oscillators (Silicon Labs Si570) for different interfaces
►Configuration through JTAG or Micron G18 Embedded Flash
►USB to UART bridge
►ATX and DC power supplies for PCI Express and Stand Alone operations
►LEDs & Pushbuttons
►Size: 9.5" x 4.25"

Kit Content:

Hardware:
- HTG-700 board

Software:
- PCI Express Drivers (evaluation) for Windows & Linux

Reference Designs/Demos:
- PCI Express Gen3 PIO
- 10G & 40G Ethernet (available only if interested in licensing the IP cores)
- DDR3 Memory Controller

Documents:
- User Manual
- Schematics (in searchable .pdf format)
- User Constraint File (UCF)

Ordering information Part Numbers:
- HTG-V7-PCIE-2000-2 (populated with V2000T-2 FPGA)
Price: Contact Us
- HTG-V7-PCIE-690-2   (populated with X690T-2 FPGA)
Price: Contact us
- HTG-V7-PCIE-690-3   (populated with X690T-3 FPGA)
Price: Contact us
- HTG-V7-PCIE-585-2   (populated with V585T-2 FPGA)
Price: Contact us

FPGA-(CVP-13, XUPVV4): Hardware Modifications

Currently, (08d-10m-2018y), the Bittware cards (CVP-13, XUPVV4) do not require any modifications and will run at full speed out-of-the-box.

If you have a VCU1525 or BCU1525, you should acquire a DC1613A USB dongle to change the core voltage.

This dongle requires modifications to ‘fit’ into the connector on the VCU1525 or BCU1525.

You can make the modifications yourself as described here,

You can purchase buy a fully modified DC1613A from https://shop.fpga.guide.

If you have an Avnet AES-KU040 and you are brave enough to make the complex modifications to run at full hash rate, you can download the modification guide right here (it will be online in a few days).

You can see a video of the modded card On YouTube: Here.

If you have a VCU1525 or BCU1525, we recommend using the TUL Water Block (this water block was designed by TUL, the company that designed the VCU/BCU cards).

The water block can be purchased from https://shop.fpga.guide.

WARNING: Installation of the water block requires a full disassembly of the FPGA card which may void your warranty.

Maximum hash rate (even beyond water-cooling) is achieved by immersion cooling, immersing the card in a non-conductive fluid.

Engineering Fluids makes BC-888 and EC-100 fluids which are non-boiling and easy to use at home. You can buy them here.

If you have a stock VCU1525, there is a danger of the power regulators failing from overheating, even if the FPGA is very cool.

We recommend a simple modification to cool the power regulators by more than 10C.

The modification is very simple. You need:

Thermaltake Slim X3 CPU cooler:
https://www.newegg.com/Product/Product.aspx?Item=N82E16835106152
Noctua NA-FC1 fan controller or any fan controller/driver:
https://www.newegg.com/Product/Product.aspx?Item=9SIAADY5SE8250
Thermal interface tape, any brand will do:
https://www.digikey.com/product-detail/en/BP100-0.011-00-1010/BER160-ND/307782

First, cut a piece of thermal tape and apply it to the back side of the Slim X3 CPU cooler, and plug the fan into the fan controller:

Then, you are going to stick the CPU cooler on the back plate of the VCU1525 on this area:

Once done it will look like this:

Make sure to connect the fan controller to the power supply and run the fan on maximum speed.

This modification will cool the regulators on the back side of the VCU1525, dropping their temperature by more than 10C and extending the life of your hardware.

This modification is not needed on ‘newer’ versions of the hardware such as the XBB1525 or BCU1525.

OptEdited:

Source-Page:
Grab Bag of FPGA and GPU Software Tools from Intel, Xilinx & NVIDIA

FPGA's as Accelerators:

From the Intel® FPGA SDK for OpenCL™ Product Brief available at link.
"The FPGA is designed to create custom hardware with each instruction being accelerated providing more efficiency use of the hardware than that of the CPU or GPU architecture would allow."

Hardware:

With Intel, developers can utilize an x86 with a built-in FPGA or connect a card with an Intel or Xilinx FPGA to an x86. This Host + FPGA Acceleration would typically be used in a "server."
With Intel and Xilinx, developers can also get a chip with an ARM core + FPGA. This FPGA + ARM SoC Acceleration is typically used in embedded systems.
Developers can also connect a GPU card from Nvidia to an x86 host. Developers can also get an integrated GPU from Intel. Nvidia also provides chips with ARM cores + GPUs.

Tools:

Intel and Xilinx provide tools to help developers accelerate x86 code execution using an FPGA attached to the x86. They also provide tools to accelerate ARM code execution using an FPGA attached to the ARM.
Intel, Xilinx and Nvidia all provide OpenCL libraries to access their hardware. These libraries can not interoperate with one another. Intel also provides libraries to support OpenMP and Nvidia provides CUDA for programming their GPUS. Xilinx includes their OpenCL library in an SDK called SDAccel and an SDK called SDSoC. SDAccel is used for x86 + Xilinx FPGA systems, i.e. servers. SDSoC is used for Xilinx chips with ARM + FPGAs, i.e. embedded systems.

Libraries:

To help developers building computer vision applications, Xilinx provides OpenVX, Caffe, OpenCV and various DNN and CNN libraries in an SDK called reVISION for software running on chips with an ARM+FPGA.
All of these libraries and many more are available for x86 systems.
Xilinx also provides neural network inference, HEVC decoders and encoders and SQL data-mover, function accelerator libraries.

Tools for FPGA + ARM SoC Acceleration Intel:

From link developers can work with ARM SoCs from Intel using:
ARM DS-5 for debug
SoC FPGA Embedded Development Suite for embedded software development tools
Intel® Quartus® Prime Software for working with the programmable logic
Virtual Platform for simulating the ARM
SoC Linux for running Linux on the FPGA + ARM SoC
Higher Level
Intel® FPGA SDK for OpenCL™ is available for programming the ARM + FPGA chips using OpenCL.

Xilinx:

Developers can work with ARM SoCs from Xilinx using:
An SDK for application development and debug
PetaLinux Tools for Linux development and ARM simulation and
Vivado for using the PL for working with its FPGA + ARM SoC chips

Higher Level:

Xilinx provides SDSoC for accelerating ARM applications on the built-in FPGA. Users can program in C and/or C++ and SDSoC will automatically partition the algorithm between the ARM core and the FPGA. Developers can also program using OpenCL and SDSoC will link in an embedded OpenCL library and build the resulting ARM+FPGA system. SDSoC also supports debugging and profiling.

Domain Specific:
Xilinx leverages SDSoC to create an embedded vision stack called reVISION.

Source-Website:

SPINPACK

Author:
Joerg Schulenburg, Uni-Magdeburg, 2008-2016

What is SpinPack?

SPINPACK is a big program package to compute lowest eigenvalues and eigenstates and various expectation values (spin correlations etc) for quantum spin systems.

These model systems can for example describe magnetic properties of insulators at very low temperatures (T=0) where the magnetic moments of the particles form entangled quantum states.

The package generates the symmetrized configuration vector, the sparse matrix representing the quantum interactions and computes its eigenvectors and finaly some expectation values for the system.

The first SPINPACK version was based on Nishimori's TITPACK (Lanczos method, no symmetries), but it was early converted to C/C++ and completely rewritten (1994/1995).

Other diagonalization algorithms are implemented too (Lanzcos, 2x2-diagonalization and LAPACK/BLAS for smaller systems). It is able to handle Heisenberg, t-J, and Hubbard-systems up to 64 sites or more using special compiler and CPU features (usually up to 128) or more sites in slower emulation mode (C++ required).

For instance we got the lowest eigenstates for the Heisenberg Hamiltonian on a 40 site square lattice on our machines at 2002. Note that the resources needed for computation grow exponentially with the system size.

The package is written mainly in C to get it running on all unix systems. C++ is only needed for complex eigenvectors and twisted boundary conditions when C has no complex extension. This way the package is very portable.

Parallelization can be done using MPI- and PTHREAD-library. Mixed mode (hybrid mode) is possible, but not always faster than pure MPI (2015). v2.60 has slightly hybrid mode advantage on CPUs supporting hyper-threading.

This will hopefully be improved further. MPI-scaling is tested to work up to 6000 cores, PTHREAD-scaling up to 510 cores but requires careful tuning (scaling 2008-1016).

The program can use all topological symmetries, S(z) symmetry and spin inversion to reduce matrix size. This will reduce the needed computing recources by a linear factor.

Since 2015/2016 CPU vector extensions (SIMD, SSE2, AVX2) are supported to get better performance doing symmetry operations on bit representations of the quantum spins.

The results are very reliable because the package has been used since 1995 in scientific work. Low-latency High-bandwith network and low latency memory is needed to get best performance on large scale clusters.

News:

Groundstate of the S=1/2 Heisenberg AFM on a N=42 kagome biggest sub-matrix computed (Sz=1 k=Pi/7 size=36.7e9, nnz=41.59, v2.56 cplx8, using partly non-blocking hybrid code on supermuc.phase1 10400cores(650 nodes, 2 tasks/node, 8cores/task, 2hyperthreads/core, 4h), matrix_storage=0.964e6nz/s/core SpMV=6.58e6nz/s/core Feb2017)
Groundstate of the S=1/2 Heisenberg AFM on a N=42 linear chain computed (E0/Nw=-0.22180752, Hsize = 3.2e9, v2.38, Jan2009) using 900 Nodes of a SiCortex SC5832 700MHz 4GB RAM/Node (320min).
Update: N=41 Hsize = 6.6e9, E0/Nw=-0.22107343 16*(16cores+256GB+IB)*32h matrix stored, v2.41 Oct2011).
Groundstate of the S=1/2 Heisenberg AFM on a N=42 square lattice computed (E0 = -28.43433834, Hsize = 1602437797, ((7,3),(0,6)), v2.34, Apr2008) using 23 Nodes a 2*DualOpteron-2.2GHz 4GB RAM via 1Gb-eth (92Cores usage=80%, ca.60GB RAM, 80MB/s BW, 250h/100It).
Program is ready for cluster (MPI and Pthread can be used at the same time, see the performance graphic) and can again use memory as storage media for performance measurement (Dec07).
Groundstate of the S=1/2 Heisenberg AFM on a N=40 square lattice computed (E0 = -27.09485025, Hsize = 430909650, v1.9.3, Jan2002).
Groundstate of the S=1/2 J1-J2-Heisenberg AFM on a N=40 square lattice J2=0.5, zero-momentum space: E0= -19.96304839, Hsize = 430909650 (15GB memory, 185GB disk, v2.23, 60 iterations, 210h, Altix-330 IA64-1.5GHz, 2 CPUs, GCC-3.3, Jan06)
Groundstate of the S=1/2 Heisenberg AFM on a N=39 triangular lattice computed (E0 = -21.7060606, Hsize = 589088346, v2.19, Jan2004).
Largest complex Matrix: Hsize=1.2e9 (26GB memory, 288GB disk, v2.19 Jul2003), 90 iterations: 374h alpha-1GHz (with limited disk data rate, 4 CPUs, til4_36)
Largest real Matrix: Hsize=1.3e9 (18GB memory, 259GB disk, v2.21 Apr2004), 90 iterations: real=40h cpu=127h sys=9% alpha-1.15GHz (8 CPUs, til9_42z7)

Download:

Verify download using: gpg --verify spinpack-2.55.tgz.asc spinpack-2.54.tgz

spinpack.tgz experimental developper version (may have bug fixes, new features or speed improvements, see doc/history.html)
spinpack-2.56c.tgz 2.57 backport fixes, above 2048*16threads, FTLM-random-fix, see doc/history
spinpack-2.56.tgz better hybrid MPI-scaling above 1000 tasks, tested on kagome42_sym14_sz13..6, pgp-sign, updated 2017-02-23, see doc/history, still blocking MPI only)
spinpack-2.55.tgz better MPI-scaling above 1000 tasks, tested on kagome42_sym14_sz13..8..1, pgp-sign, updated 2017-02-21, see doc/history)
spinpack-2.52.tgz OpenMP-support (implemented as pthread-emulation), but weak mixed code speed, pgp-sign, Dec16)
spinpack-2.51.tgz g++6-adaptions (gcc6.2 compile-errors/warnings fixed, pgp-sign, Sep16)
spinpack-2.50d.tgz SIMD-support (SSE2,AVX2), lot of bug-fixes (Jan16+fixFeb16+fixMar16b+c+fixApr16d))
spinpack-2.49.tgz mostly bug-fixes (Mar15) (updated Mar15,12, buggy bfly-bench, NN>32 32bit-compile-error.patch, see experimental version above)
spinpack-2.48.tgz test-version (v2.48pre Feb14 new features, +tUfixMay14 +chkptFixDez14 +2ndrunFixJan15)
spinpack-2.47.tgz bug fixes (see doc/history.html, bug fixes of 2.45-2.46) (version 2014/02/14, 1MB, gpg-signatur)
spinpack-2.44.tgz (see doc/history.html, known bugs) (version 2013/01/23 + fix May13,May14 2.44c, 1MB, gpg-signatur)
spinpack-2.43.tgz +checkpointing (see doc/history.html) (version 2012/05/23, 1MB, gpg-signatur)
spinpack-2.42.tgz ns.mpi-speed++ (see doc/history.html) (version 2012/05/07, 1MB, gpg-signatur)
spinpack-2.41.tgz mpi-speed++,doc++ (see doc/history.html) (version 2011/10/24 + backport-fix 2015-09-23, 1MB, gpg-signatur)
spinpack-2.40.tgz bug fixes (see doc/history.html) (version 2009/11/26, 890kB, gpg-signatur)
spinpack-2.39.tgz new option -m, new lattice (doc/history.html) (version 2009/04/20, 849kB, gpg-signatur)
spinpack-2.38.tgz MPI-fixes (doc/history.html) (version 2009/02/11, 849kB, gpg-signatur)
spinpack-2.36.tgz MPI-tuned (doc/history.html) (version 2008/08/04, 802kB, gpg-signatur)
spinpack-2.35.tgz IA64-tuned (doc/history.html) (version 2008/07/21, 796kB, gpg-signatur)
spinpack-2.34.tgz bugs fixed for MPI (doc/history.html) (version 2008/04/23, 770kB, gpg-signatur)
spinpack-2.33.tgz bugs fixed for MPI (doc/history.html) (version 2008/03/16, 620kB, gpg-signatur)
spinpack-2.32.tgz bug fixed (doc/history.html) (version 2008/02/19, 544kB, gpg-signatur)
spinpack-2.31.tgz MPI works and scales (version 2007/12/14, 544kB, gpg-signatur)
spinpack-2.26.tgz code simplified and partly speedup, prepare for FPGA and MPI (version 07/02/27, gpg-signatur)
spinpack-2.15.tgz see doc/history.tex (updated 2003/01/20)

Installation:

gunzip -c spinpack-xxx.tgz | tar -xf - # xxx is the version number
cd spinpack; ./configure --mpt
make test # to test the package and create exe path
# edit src/config.h exe/daten.def for your needs (see models/*.c)
make
cd exe; ./spin

Documentation:

The documentation is available in the doc-path. Most parts of the documentation are rewritten in english now.

If you still find some parts written in german or out-of-date documentation send me an email with a short hint where I find this part and I want to rewrite this part as soon as I can.

Please see doc/history.html for latest changes. You can find a documentation about speed in the package or an older version on this spinpack-speed-page.

Most Important Function:

The most time consuming important function is b_smallest in hilbert.c. This function computes the representator of a set of symmetric spin configurations (bit pattern) from a member of this set.

It also returns a phase factor and the orbit length. It would be a great progress, if the performance of that function could be improved. Ideas are welcome.

One of my motivations is to use FPGAs in 2009 was inspired by the FPGA/VHDL-Compiler.

These are Xilings-tools and are so slow, badly scaling and buggy, that code generation and debugging is really no fun and a much better FPGA toolchain is needed for HPC, but all that is fixed now with updates.

2015-05 I added software benes-network to get gain of AVX2, but it looks like that its still not the maximum available speed (HT shows near 2 factor, bitmask falls out of L1-cache?).

Examples for open access

Please use these data for your work or verify my data. Questions and corrections are welcome. If you miss data or explanations here, please send a note to me.

s=1/2 Heisenberg model square lattice (finite size extrapolation: gnuplot data, gnuplot script)
s=1/2 Heisenberg model triangular lattice (finite size extrapolation: gnuplot data, gnuplot script)
s=1/2 Heisenberg model kagome lattice (finite size extrapolation: gnuplot script, data included)

Frequently asked questions (FAQ):

 Q: I try to diagonalize a 4-spin system, but I do not get the full spectrum. Why?
 A: Spinpack is designed to handle big systems. Therefore it uses as much
    symmetries as it can. The very small 4-spin system has a very special
    symmetry which makes it equivalent to a 2-spin system build by two s=1 spins.
    Spinpack uses this symmetry automatically to give you the possibility
    to emulate s=1 (or s=3/2,etc) spin systems by pairs of s=1/2 spins.
    If you want to switch this off, edit src/config.h and change
    CONFIG_S1SYM to CONFIG_NOS1SYM.

This picture is showing a small sample of a possible Hilbert matrix. The non-zero elements are shown as black pixels (v2.33 Feb2008 kago36z14j2).

This picture is showing a small sample of a possible Hilbert matrix. The non-zero elements are shown as black (J1) and gray (J2) pixels (v2.42 Nov2011 j1j2-chain N=18 Sz=0 k=0). Config space is sorted by J1-Ising-model-Energy to show structures of the matrix. Ising energy ranges are shown as slightly grayed arrays.

Hilbert matrix for N=18 s=1/2 quantum chain

Ground state energy scaling for finite size spin=1/2-AFM-chains N=4..40 using up to 300GB memory to store the N=39 sparse matrix and 245 CPU-houres (2011, src=lc.gpl).

Author: Joerg Schulenburg, Uni-Magdeburg, 2008-2016

XC7V2000T-x690T_HTG-700: Models

FPGA Scematic Updates: Cryptocurrency

Reference Materials

FPGA Scematic Updates

FPGA Device Driver Memo

[hide]

1 FPGA device driver (Memory Mapped Kernel)

FPGA device driver (Memory Mapped Kernel)

Description

A simple linux device driver for FPGA access. This driver provides memory mapped support and can communicate with FPGA designs. The advantage that memory mapping provides is that the system call overhead is completely reduced. However network overhead and the EPB bus bandwidth limitation exists. PowerPC in ROACH is mainly intended for control and monitoring. For larger transactions and performance numbers, the recommendation is to read the data directly from the FPGA through the 10Ge interface.

Need for an alternate device driver

BORPH software approach incurs system call latencies, which can degrade performance in applications that make frequent short or random accesses to FPGA resources. System calls are function invocations made from user-space in order to request some service from the operating system. Instead of making a series of system calls that involve file I/O, we memory map the FPGA to the user process address space in the PowerPC. Memory mapping forms an association between the FPGA and the user process memory. In doing so, the abstraction is being moved from the kernel to the user application. The performance of a memory mapped FPGA device is measurably better than the current approach of BORPH which presents a file system of hardware mapped registers. The contribution of a memory-mapped approach is two-fold: Firstly, the overhead of a system call performing I/O operations is eliminated. Secondly, unnecessary memory copies are not kept in the kernel. While the approach gives a performance benefit, it comes with the limitation that user applications are required to track and provide FPGA symbolic register name to memory to offset mapping. This limitation can be overcome by automating the mapping at FPGA design compile time in the same way as is currently done for BORPH, thereby abstracting this limitation away from ordinary users

Advantages

Latest kernel support(Linux 3.10)
mmap method support
Improved performance
support for both ROACH and ROACH2 platforms

Implications

Experimental

(Let me know feedback at shanly@ska.ac.za to iron out bugs)

Usage

The linux kernel communicates with the FPGA through special files called "device nodes". There are two device nodes to be created.

/dev/roach/config (FPGA configuration)
/dev/roach/mem (FPGA read write)

tcpborphserver3 communicates with FPGA through these device specific nodes.

telnet ip-address portno

You will see the katcp commands by issuing ?help

Kernel Source

https://github.com/ska-sa/roach2_linux

There is a working config file incase you struggle to build the kernel image from the source on your own. NOTE: Depending on platform use roach or roach2.

- make 44x/roach_defconfig (for roach2, make 44x/roach2_defconfig)
- make cuImage.roach (for roach2, make cuImage.roach2)

(The kernel binary built can be located in arch/powerpc/boot/cuImage.roach) The driver can be found in drivers/char/roach directory.

Steps to follow

Build the kernel binary from source as indicated above OR use the provided kernel precompiled binary(uImage-roach-mmap) available after checking out the git repository below
1. git clone https://github.com/shanlyrajan/roach2_linux
2. Note the two files uImage-roach-mmap and test_mmap_RW after checking out.
Run the below macro in uboot assuming you are nfs booting and you have placed the uImage-roach-mmap file in location from where tftp can fetch it.
1. setenv roachboot "dhcp;tftpboot 0x2000000 uImage-roach-mmap; setenv bootargs console=ttyS0,115200 root=/dev/nfs ip=dhcp;bootm 0x2000000"
2. saveenv to save the created macro to flash
3. run roachboot
Ignore the fatal module dep warning that you see after booting the kernel. After kernel boots in init prompt type the following. Once netbooted, mount the nfs filesystem rw. Create device files if not created, /dev/roach/config that is the programming bitstream interface and /dev/roach/mem that is the memory mapped read/write interface using mknod.
1. cat /proc/devices (To see whether driver is loaded and check the major number associated with it. I would expect you to see 252 major number)
2. mount -o rw,remount /
3. mkdir /dev/roach
4. mknod /dev/roach/config c 252 0
5. mknod /dev/roach/mem c 252 1
6. mount -o ro,remount /
Use tcpborphserver3 available along with KATCP which has registername to offset support logic.
1. Issue katcp commands like ?progdev x.bof, ?listdev, ?wordread and ?wordwrite for communicating with designs.

Reference userspace code

The test_mmap_RW.c available in the checked out source code, performs reading, writing and verifying the scratchpad register half a million times. The code can be used as a reference and adapted to read data out of BRAMs and send as UDP packets. Note: The c file has to be cross-compiled to run on powerpc platform and then the executable can run on the PowerPC itself. Core_info.tab is the authoritative source for checking the register name and memory offset into FPGA.

Anti-AdBlocker

Wednesday, October 14, 2020

FPGA XCV7-690T/HTG700 Hourly-Data-Updates

Xilinx Tools Tutorial

Introduction

From HDL to FPGA

Constraints

ISE and the 6.111 Labkit

A Tutorial

Create a New Project

Add Sources and Constraints

Modify the Top-Level Template

Compile the Design

Move the Design to a Compact Flash Card

10 Ways To Program Your FPGA

FPGA software 1 - FPGA design software

Which is better?

OpenCores: EDA Tools

Introduction

Open Source EDA tools

Icarus Verilog Simulator

Verilator

GHDL VHDL simulator

EMACS - text editor

Fizzim is a FREE, open-source GUI-based FSM design tool

Features:

GUI:

Backend:

TCE

C to Verilog translation

Fedora Electronic Lab

Linux - The logical choice for EDA

Subproject Teams:

Links

A project to develop a free, open source, GPL'ed VHDL simulator for Linux!

Xilinx Virtex-7 V2000T-X690T-HTG700

Product Description:

Key Features and Benefits:

What's Included:

FPGA-(CVP-13, XUPVV4): Hardware Modifications

What is SpinPack?

News:

Download:

Installation:

Documentation:

Most Important Function:

Examples for open access

Frequently asked questions (FAQ):

Reference Materials

FPGA Scematic Updates

FPGA Device Driver Memo

Contents

FPGA device driver (Memory Mapped Kernel)

Description

Need for an alternate device driver

Advantages

Implications

Usage

Kernel Source

Steps to follow

Reference userspace code

reference

documents

wiki

Search

Toolbox