CryptoURANUS Economics: XCV7T2000T/X690T/HTG700 Monero Bitstreams


Wednesday, October 14, 2020

XCV7T2000T/X690T/HTG700 Monero Bitstreams

Xilinx Virtex-7 V2000T-X690T-HTG700


Product Description:

Powered by Xilinx Virtex-7 V2000T, V585, or X690T the HTG700 is ideal for ASIC/SOC prototyping, high-performance computing, high-end image processing, PCI Express Gen 2 & 3 development, general purpose FPGA development, and/or applications requiring high speed serial transceivers (up to 12.5Gbps).

 Key Features and Benefits:

  • Scalable via HTG-FMC-FPGA module (with one X980T FPGA) for
  • x8 PCI Express Gen2 /Gen 3 edge connectors
  • x3 FPGA Mezzanine Connectors (FMC)
  • x4 SMA ports (16 SMAs providing 4 Txn/Txp/Rxn/Rxp) clocked by external
  • DDR3 SODIMM with support for up to 8GB (shipped with a 1GB module)
  • USB to UART bridge
  • Configuration through JTAG or Micron G18 Flash


 What's Included:

  • Reference Designs
  • Schematic, User Manual, UCF
  • The HTG700 Board

HTG-700: Xilinx Virtex™ -7 PCI Express  Development Platform

Powered by Xilinx Virtex-7 V2000T, V585, or X690T  the HTG-700 is ideal for ASIC/SOC prototyping, high-performance computing, high-end image processing, PCI Express Gen 2 & 3 development, general purpose FPGA development, and/or applications requiring high speed serial transceivers (up to 12.5Gbps).  

Three High Pin Count (HPC) FMC connectors provide access to 480 single-ended I/Os and 24 high-speed Serial Transceivers of the on board Virtex 7 FPGA. Availability of over 100 different off-the-shelf FMC modules extend functionality of the board for variety of different applications. 

Eight lane of PCI Express Gen 2 is supported by hard coded controllers inside the Virtex 7 FPGA. The board's layout, performance of the Virtex 7 FPGA fabric, high speed serial transceivers (used for PHY interface), flexible on-board clock/jitter attenuator, along with soft PCI Express Gen 3 IP core allow usage of the board for PCI Express Gen3 applications. 

The HTG-700 Virtex 7 FPGA board can be used either in PCI Express mode (plugged into host PC/Server) or stand alone mode (powered by external ATX or wall power supply).

Xilinx Virtex-7 V2000T, 585T, or X690T FPGA
Scalable via HTG-777 FPGA module  for providing higher FPGA gate density
x8 PCI Express Gen2 /Gen 3  edge connectors with jitter cleaner chip
  - Gen 3: with the -690 option
  - Gen 2: with the -585 or -2000 option (Gen3 requires soft IP core)
x3 FPGA Mezzanine Connectors (FMC)
  - FMC #1: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
  - FMC #2: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
  - FMC #3: 80 LVDS (160 single-ended) I/Os and 8 GTX (12.5 Gbps) Serial
    Transceivers. Physical location of this  connector allows  plug-in FMC
    daughter cards having easy access to the board through the front panel.
x4 SMA ports (16 SMAs providing 4 Txn/Txp/Rxn/Rxp) clocked by external pulse generators
DDR3 SODIMM with support for up to 8GB (shipped with a 2GB module)
Programmable oscillators (Silicon Labs Si570) for different interfaces
Configuration through JTAG or Micron G18 Embedded Flash
USB to UART bridge
ATX and DC power supplies for PCI Express and Stand Alone operations
LEDs & Pushbuttons
Size: 9.5" x 4.25"
Kit Content:

HTG-700 board


- PCI Express Drivers (evaluation) for Windows & Linux

Reference Designs/Demos:
- PCI Express Gen3 PIO
- 10G & 40G Ethernet (available only if interested in licensing the IP cores)
- DDR3 Memory Controller


- User Manual
- Schematics (in searchable .pdf format)
- User Constraint File (UCF)

Ordering information Part Numbers:
- HTG-V7-PCIE-2000-2 (populated with V2000T-2 FPGA)
Contact Us
HTG-V7-PCIE-690-2   (populated with X690T-2 FPGA)  
Contact us
- HTG-V7-PCIE-690-3   (populated with X690T-3 FPGA)  
Contact us
HTG-V7-PCIE-585-2   (populated with V585T-2 FPGA)
Contact us

FPGA-(CVP-13, XUPVV4): Hardware Modifications

Currently, (08d-10m-2018y), the Bittware cards (CVP-13, XUPVV4) do not require any modifications and will run at full speed out-of-the-box.

If you have a VCU1525 or BCU1525, you should acquire a DC1613A USB dongle to change the core voltage.

This dongle requires modifications to ‘fit’ into the connector on the VCU1525 or BCU1525.

You can make the modifications yourself as described here,

You can purchase buy a fully modified DC1613A from

If you have an Avnet AES-KU040 and you are brave enough to make the complex modifications to run at full hash rate, you can download the modification guide right here (it will be online in a few days).

You can see a video of the modded card On YouTube: Here.

If you have a VCU1525 or BCU1525, we recommend using the TUL Water Block (this water block was designed by TUL, the company that designed the VCU/BCU cards).

The water block can be purchased from

WARNING:  Installation of the water block requires a full disassembly of the FPGA card which may void your warranty.

Maximum hash rate (even beyond water-cooling) is achieved by immersion cooling, immersing the card in a non-conductive fluid.

Engineering Fluids makes BC-888 and EC-100 fluids which are non-boiling and easy to use at home. You can buy them here.

If you have a stock VCU1525, there is a danger of the power regulators failing from overheating, even if the FPGA is very cool.

 We recommend a simple modification to cool the power regulators by more than 10C.

The modification is very simple. You need:
First, cut a piece of thermal tape and apply it to the back side of the Slim X3 CPU cooler, and plug the fan into the fan controller:

Then, you are going to stick the CPU cooler on the back plate of the VCU1525 on this area:

Once done it will look like this:

Make sure to connect the fan controller to the power supply and run the fan on maximum speed.

This modification will cool the regulators on the back side of the VCU1525, dropping their temperature by more than 10C and extending the life of your hardware.

This modification is not needed on ‘newer’ versions of the hardware such as the XBB1525 or BCU1525.


Grab Bag of FPGA and GPU Software Tools from Intel, Xilinx & NVIDIA

FPGA's as Accelerators:
  • From the Intel® FPGA SDK for OpenCL™ Product Brief available at link.
  • "The FPGA is designed to create custom hardware with each instruction being accelerated providing more efficiency use of the hardware than that of the CPU or GPU architecture would allow." 

  • With Intel, developers can utilize an x86 with a built-in FPGA or connect a card with an Intel or Xilinx FPGA to an x86. This Host + FPGA Acceleration would typically be used in a "server."
  • With Intel and Xilinx, developers can also get a chip with an ARM core + FPGA. This FPGA + ARM SoC Acceleration is typically used in embedded systems.
  • Developers can also connect a GPU card from Nvidia to an x86 host. Developers can also get an integrated GPU from Intel. Nvidia also provides chips with ARM cores + GPUs.

  • Intel and Xilinx provide tools to help developers accelerate x86 code execution using an FPGA attached to the x86. They also provide tools to accelerate ARM code execution using an FPGA attached to the ARM.
  • Intel, Xilinx and Nvidia all provide OpenCL libraries to access their hardware. These libraries can not interoperate with one another. Intel also provides libraries to support OpenMP and Nvidia provides CUDA for programming their GPUS. Xilinx includes their OpenCL library in an SDK called SDAccel and an SDK called SDSoC. SDAccel is used for x86 + Xilinx FPGA systems, i.e. servers. SDSoC is used for Xilinx chips with ARM + FPGAs, i.e. embedded systems.

  • To help developers building computer vision applications, Xilinx provides OpenVX, Caffe, OpenCV and various DNN and CNN libraries in an SDK called reVISION for software running on chips with an ARM+FPGA.
  • All of these libraries and many more are available for x86 systems.
  • Xilinx also provides neural network inference, HEVC decoders and encoders and SQL data-mover, function accelerator libraries.

Tools for FPGA + ARM SoC Acceleration Intel:
  • From link developers can work with ARM SoCs from Intel using:
  • ARM DS-5 for debug
  • SoC FPGA Embedded Development Suite for embedded software development tools
  • Intel® Quartus® Prime Software for working with the programmable logic
  • Virtual Platform for simulating the ARM
  • SoC Linux for running Linux on the FPGA + ARM SoC
  • Higher Level
  • Intel® FPGA SDK for OpenCL™ is available for programming the ARM + FPGA chips using OpenCL.

  • Developers can work with ARM SoCs from Xilinx using:
  • An SDK for application development and debug
  • PetaLinux Tools for Linux development and ARM simulation and
  • Vivado for using the PL for working with its FPGA + ARM SoC chips

Higher Level:
  • Xilinx provides SDSoC for accelerating ARM applications on the built-in FPGA. Users can program in C and/or C++ and SDSoC will automatically partition the algorithm between the ARM core and the FPGA. Developers can also program using OpenCL and SDSoC will link in an embedded OpenCL library and build the resulting ARM+FPGA system. SDSoC also supports debugging and profiling.

Domain Specific:
Xilinx leverages SDSoC to create an embedded vision stack called reVISION.



spin system logo

Joerg Schulenburg, Uni-Magdeburg, 2008-2016

What is SpinPack?

SPINPACK is a big program package to compute lowest eigenvalues and eigenstates and various expectation values (spin correlations etc) for quantum spin systems.

These model systems can for example describe magnetic properties of insulators at very low temperatures (T=0) where the magnetic moments of the particles form entangled quantum states.

The package generates the symmetrized configuration vector, the sparse matrix representing the quantum interactions and computes its eigenvectors and finaly some expectation values for the system.

The first SPINPACK version was based on Nishimori's TITPACK (Lanczos method, no symmetries), but it was early converted to C/C++ and completely rewritten (1994/1995).

Other diagonalization algorithms are implemented too (Lanzcos, 2x2-diagonalization and LAPACK/BLAS for smaller systems). It is able to handle Heisenberg, t-J, and Hubbard-systems up to 64 sites or more using special compiler and CPU features (usually up to 128) or more sites in slower emulation mode (C++ required).

For instance we got the lowest eigenstates for the Heisenberg Hamiltonian on a 40 site square lattice on our machines at 2002. Note that the resources needed for computation grow exponentially with the system size.

The package is written mainly in C to get it running on all unix systems. C++ is only needed for complex eigenvectors and twisted boundary conditions when C has no complex extension. This way the package is very portable.

Parallelization can be done using MPI- and PTHREAD-library. Mixed mode (hybrid mode) is possible, but not always faster than pure MPI (2015). v2.60 has slightly hybrid mode advantage on CPUs supporting hyper-threading.

This will hopefully be improved further. MPI-scaling is tested to work up to 6000 cores, PTHREAD-scaling up to 510 cores but requires careful tuning (scaling 2008-1016).

The program can use all topological symmetries, S(z) symmetry and spin inversion to reduce matrix size. This will reduce the needed computing recources by a linear factor.

Since 2015/2016 CPU vector extensions (SIMD, SSE2, AVX2) are supported to get better performance doing symmetry operations on bit representations of the quantum spins.

The results are very reliable because the package has been used since 1995 in scientific work. Low-latency High-bandwith network and low latency memory is needed to get best performance on large scale clusters.


  • Groundstate of the S=1/2 Heisenberg AFM on a N=42 kagome biggest sub-matrix computed (Sz=1 k=Pi/7 size=36.7e9, nnz=41.59, v2.56 cplx8, using partly non-blocking hybrid code on supermuc.phase1 10400cores(650 nodes, 2 tasks/node, 8cores/task, 2hyperthreads/core, 4h), matrix_storage=0.964e6nz/s/core SpMV=6.58e6nz/s/core Feb2017)
  • Groundstate of the S=1/2 Heisenberg AFM on a N=42 linear chain computed (E0/Nw=-0.22180752, Hsize = 3.2e9, v2.38, Jan2009) using 900 Nodes of a SiCortex SC5832 700MHz 4GB RAM/Node (320min).
    Update: N=41 Hsize = 6.6e9, E0/Nw=-0.22107343 16*(16cores+256GB+IB)*32h matrix stored, v2.41 Oct2011).
  • Groundstate of the S=1/2 Heisenberg AFM on a N=42 square lattice computed (E0 = -28.43433834, Hsize = 1602437797, ((7,3),(0,6)), v2.34, Apr2008) using 23 Nodes a 2*DualOpteron-2.2GHz 4GB RAM via 1Gb-eth (92Cores usage=80%, ca.60GB RAM, 80MB/s BW, 250h/100It).
  • Program is ready for cluster (MPI and Pthread can be used at the same time, see the performance graphic) and can again use memory as storage media for performance measurement (Dec07).
  • Groundstate of the S=1/2 Heisenberg AFM on a N=40 square lattice computed (E0 = -27.09485025, Hsize = 430909650, v1.9.3, Jan2002).
  • Groundstate of the S=1/2 J1-J2-Heisenberg AFM on a N=40 square lattice J2=0.5, zero-momentum space: E0= -19.96304839, Hsize = 430909650 (15GB memory, 185GB disk, v2.23, 60 iterations, 210h, Altix-330 IA64-1.5GHz, 2 CPUs, GCC-3.3, Jan06)
  • Groundstate of the S=1/2 Heisenberg AFM on a N=39 triangular lattice computed (E0 = -21.7060606, Hsize = 589088346, v2.19, Jan2004).
  • Largest complex Matrix: Hsize=1.2e9 (26GB memory, 288GB disk, v2.19 Jul2003), 90 iterations: 374h alpha-1GHz (with limited disk data rate, 4 CPUs, til4_36)
  • Largest real Matrix: Hsize=1.3e9 (18GB memory, 259GB disk, v2.21 Apr2004), 90 iterations: real=40h cpu=127h sys=9% alpha-1.15GHz (8 CPUs, til9_42z7) 


    Verify download using: gpg --verify spinpack-2.55.tgz.asc spinpack-2.54.tgz



    • gunzip -c spinpack-xxx.tgz | tar -xf - # xxx is the version number
    • cd spinpack; ./configure --mpt
    • make test # to test the package and create exe path
    • # edit src/config.h exe/daten.def for your needs (see models/*.c)
    • make
    • cd exe; ./spin



    The documentation is available in the doc-path. Most parts of the documentation are rewritten in english now.

    If you still find some parts written in german or out-of-date documentation send me an email with a short hint where I find this part and I want to rewrite this part as soon as I can.

    Please see doc/history.html for latest changes. You can find a documentation about speed in the package or an older version on this spinpack-speed-page.

    Most Important Function:

    The most time consuming important function is b_smallest in hilbert.c. This function computes the representator of a set of symmetric spin configurations (bit pattern) from a member of this set.

    It also returns a phase factor and the orbit length. It would be a great progress, if the performance of that function could be improved. Ideas are welcome.

    One of my motivations is to use FPGAs in 2009 was inspired by the FPGA/VHDL-Compiler.

    These are Xilings-tools and are so slow, badly scaling and buggy, that code generation and debugging is really no fun and a much better FPGA toolchain is needed for HPC, but all that is fixed now with updates.

    2015-05 I added software benes-network to get gain of AVX2, but it looks like that its still not the maximum available speed (HT shows near 2 factor, bitmask falls out of L1-cache?).

    Examples for open access

    Please use these data for your work or verify my data. Questions and corrections are welcome. If you miss data or explanations here, please send a note to me.


    Frequently asked questions (FAQ):

     Q: I try to diagonalize a 4-spin system, but I do not get the full spectrum. Why?
     A: Spinpack is designed to handle big systems. Therefore it uses as much
        symmetries as it can. The very small 4-spin system has a very special
        symmetry which makes it equivalent to a 2-spin system build by two s=1 spins.
        Spinpack uses this symmetry automatically to give you the possibility
        to emulate s=1 (or s=3/2,etc) spin systems by pairs of s=1/2 spins.
        If you want to switch this off, edit src/config.h and change

    This picture is showing a small sample of a possible Hilbert matrix. The non-zero elements are shown as black pixels (v2.33 Feb2008 kago36z14j2).

    Hilbert matrix N=36 s=1/2 kago lattice

    This picture is showing a small sample of a possible Hilbert matrix. The non-zero elements are shown as black (J1) and gray (J2) pixels (v2.42 Nov2011 j1j2-chain N=18 Sz=0 k=0). Config space is sorted by J1-Ising-model-Energy to show structures of the matrix. Ising energy ranges are shown as slightly grayed arrays.

    Hilbert matrix for N=18 s=1/2 quantum chain

    Ground state energy scaling for finite size spin=1/2-AFM-chains N=4..40 using up to 300GB memory to store the N=39 sparse matrix and 245 CPU-houres (2011, src=lc.gpl).
    ground state s=1/2-AFM-LC

    Author: Joerg Schulenburg, Uni-Magdeburg, 2008-2016

    XC7V2000T-x690T_HTG-700: Models

    No comments: