DarkCoin FPGA Mining Co-op?

glamorgoblin

New Member
May 24, 2014
20
2
3
Hey,
So I've mined BTC, LTC, DOGE, etc using FPGA HW in the past (I'm an FPGA/Verilog developer by trade). I've only worked on a small scale as a hobby though and was/am not looking to productize anything. Just nerding out on cryptos. All the FPGA mining I've done so far has been SHA256 or Scrypt, but I'm curious if anyone is interested in cooperating on an X11 mining FPGA design. I'm quite comfortable with the Verilog code, a novice with the SW side, and a nube when it comes to the network side. Are there any other crypto-nerds out there that want to tinker?

I have FPGA HW that I would be willing to share to facilitate cooperative development. Ideally I would find a SW-God that could abstract the existing DarkCoin C code to some register friendly psuedo-code and provide some dumps of nonces proceeding though each of the different stages and substages. I could do the target specific Verilog coding and validate simulations against the dumps. I assume we could do something similar to what the open source SHA256 and Scrypt FPGA guys do for interfacing to stratum/getwork.

My current FPGA HW can support 100MH/s for SHA256 mining and (assuming that none of the X11 specific hashes aren't significantly more difficult, or require intensive memory like Scrypt) I would expect a similar hashrate for X11. If the 11 hashes can't fit into a single FPGA fully pipelined, the hashrate might half or quarter to make room for all the logic though.

Just tossing it out there. What say ye?
 
  • Like
Reactions: ottokoester

Lzeppelin

Member
Feb 27, 2014
283
57
88
Sounds great! I'm a SW developer but unfortunately I'm not a code god. All this register and pipeline talk is giving me a flashback to my assembly lab and I'm breaking out in a cold sweat. :tongue:
 

glamorgoblin

New Member
May 24, 2014
20
2
3
LZ,

Hey, maybe it's a good match then because every time I've tried to look through the C code I get flash backs to my CE classes and break into a raging fit of apathy. Have you looked at the code over at https://github.com/ig0tik3d/darkcoin-cpuminer-1.2c? That's what I've been looking at until my eyes glaze over. I always get lost in code that has been written with "Good" coding style. If it were sloppy and hacky (the way I fumble through SW efforts) I'd probably be able to grasp in more easily. All the different levels of abstraction, data types, linking of a gazillion different files types ... sigh.

What REALLY helped when I did the Scrypt FPGA design was the document at https://tools.ietf.org/html/draft-josefsson-scrypt-kdf-01. That showed the entire algorithm in generic pseudo code and provided some example data streams of hashes progressing through the core. It was platform agnostic, simplified, and reasonably portable to Verilog. It seems something like that could be created for X11 by tinkering with the github code. Am I oversimplifying?
 

glamorgoblin

New Member
May 24, 2014
20
2
3
I've received a few PM's about this post, so I thought I'd give a status update here instead of replying individually.

During the evaluation of the candidates for SHA-3, evidently lots of universities developed generic FPGA code for each of the X11 hashes to benchmark their performance in real hardware. Since all 11 hashes in X11 were considered for the competition, they're all out there and can be found at https://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html. This is exactly what I was looking for and I think I can get what I need from there. It would still be nice to have a dump of nonces/digests progressing through the DarkCoin X11 specific hash chain so I can doublecheck my work as I go. There are some implementation specific details of the hashes that I still have to figure out.

If you understand what I'm asking for when I say "a dump of the nonces/digests progressing through the DarkCoin X11 specific hash chain" and are interested in a cooperative effort where we both potentially wind up with X11 FPGA miners in the end please let me know. If the above statement doesn't make sense though, the chances of a meaningful partnership aren't very good.
 
  • Like
Reactions: atavacron

glamorgoblin

New Member
May 24, 2014
20
2
3
Well my current mining rig is a sea of Altera EP2S90's. These are pretty old parts (Stratix II), and are really expensive compared to more modern FPGA's of similar (or even far superior) performance. So, I'll leave it to the reader to do the math related to cost if a more appropriate part were used. I have all the FPGA horsepower I want and will simply retarget my current HW from scrypt mining to X11 mining when/if a solution is found, so I haven't done much thinking about optimization of cost/power/speed. My HW architecture is already set in stone so these characteristics will just be whatever they will be.
 

esuncloud

New Member
May 31, 2014
7
1
3
Any idea what FPGA might be large enough to support X11?
I just added the required slice for each algorithm and the sum is slightly larger than the total slice number of spartan 6 used in the popular miners.
However, considering normally only 80% slice could be used, it could be even worse, therefore I am wondering to combine two spartan on Icarus together.
The good thing to reuse Icarus is that we have many available resource in hand, but the bad thing is that I do not have Icarus or Lancelot. It should be noted that many other FPGA miners did not have interconnect pins between the FPGAs.
 

atavacron

Member
Apr 27, 2014
45
16
48
Other than a dual Spartan 6 board what single FPGA would be large enough, maybe a Virtex 5 or 6? By the way, how many slices did you calculate that the X11 algo would need?
 

glamorgoblin

New Member
May 24, 2014
20
2
3
The FPGA resource utilization numbers from the universities are for flat implementations. What you can see from Bitcoin (SHA256) FPGA coding though is that there are techniques to trade off speed for gates in an FPGA. If any part of any hash algorithm repeats the same type of operation at multiple points, the duplicate logic can be eliminated if the hash rate is slowed down enough to allow sharing of a single instance of the common logic. I haven't looked far enough into the university code yet to see how much opportunity there is for this type of technique, but I'd be surprised if the design can't be maneuvered into a reasonably sized FPGA with enough work in this area. That's why I listed a 100MH/s rate with the possibility of halving or quartering. Halving will result from sharing common logic across 2 paths. Quartering will result from sharing common logic across 4 paths.

Interconnected FPGAs would work, but it would be difficult to find a place in the hash chain to break up the design without drastically affecting the overall hashrate. The intermediate hashes are likely 1Kb each or larger, so even if the FPGA interconnect can somehow support 1Gb/s (rather unlikely), this still limits the overall hashrate to 1MH/s if the interconnect is fully saturated. We're still far better off using the common-logic technique described above than stretching X11 across board interconnects.
 

atavacron

Member
Apr 27, 2014
45
16
48
Oh, OK. I think I've got it know. Thanks for the explanation. The more I study FPGAs the more fascinating I find them.
 

esuncloud

New Member
May 31, 2014
7
1
3
One more thing is the unrolled pipeline, and this is used to accelerate the speed at the cost of resource, right?
We have the resource problem because we want to implement 11 algorithms in one chip, therefore optimization is required or just get a better FPGA.
 
Last edited by a moderator:

flipme

New Member
Apr 27, 2014
17
3
3
I hope this becomes a serious approach. How about
- A crowd funded or shareholder financed open source project for an X11 mining box, under the DRK umbrella.
- Running on off the shelf, freely available FPGA and control hardware. I have close to no knowledge about FPGAs.
Which boards suit best, this ok? http://www.opalkelly.com/products/xem7350/
Controlled by a RaspPi?
What hash rates could be achieved?
 

glamorgoblin

New Member
May 24, 2014
20
2
3
The SHA256 FPGA designs (bitcoin) use the concept of "unrolled" and "pipelined" logic as well. That's their terminology for the design approaches that don't share any common logic anywhere. So no part of the process has to wait for shared logic to become available and the entire machine can run as fast as possible. Folks with smaller FPGAs can still mine, just at lower hash rates. While those with big FPGAs can implement the fully unrolled design and get the most bang for their buck. It looks like there are examples of this in the SHA-3 candidate code from the universities as well. Although they might be calling it "folding" rather than "rolling".
 

atavacron

Member
Apr 27, 2014
45
16
48
That make sense to me. If you don't have to wait for a logic block to become "not busy" then your speed will be much quicker.
So if there is unused space on the FPGA, one can place multiple pipelines for parallel processing? How do you determine how many logic blocks or pipelines can fit on a given FPGA? I would suppose it's more of an art form because you can multiple ways of doing the same logic yet the size could be different.
 

glamorgoblin

New Member
May 24, 2014
20
2
3
Sizing into an FPGA is REALLY tough. It requires intimate knowledge of the hash function to know what pieces can be shared without corrupting the results. Kramble's Scrypt code on github is a great example of shared logic where more than a dozen passes through the SHA-256 function are required according to the scrypt definition, but there only exists one SHA-256 logic block in the design that gets used over and over. There is a rather sophisticated control mechanism required to manage it all and keep everything straight, but as far as resource utilization is concerned it uses very few gates and very little power.

The "pipeline" concept isn't really a parallel logic approach as you've described. A "fully pipelined" design has organized the entire function into logical steps without any interdependence between steps. So even though it may take 4 steps (for example) to execute the function for any given nonce, at any given time there are 4 nonces in the "pipeline". When nonce1 completes step1, it moves on to step2 and nonce2 begins step1. This results in a machine where the number of required steps is irrelevant. The output is simply a constant stream of hashes since a new nonce is always just completing the last step in the process.
 

atavacron

Member
Apr 27, 2014
45
16
48
Thanks glamorgoblin. That clears up one of the stumbling blocks I have been trying to wrap my head around.
 

glamorgoblin

New Member
May 24, 2014
20
2
3
Again sorry to the folks that PM'ed me and I haven't responded. I'm going to vomit everything relevant into a single post here instead of replying individually.

It sounds like there are a lot of HW/FPGA centric folks that want to be involved. That's cool. The first thing we should be looking for is a true X11 specification. The web sleuthing I've done so far turned up nothing. I've found things called "specification" but they're just marketing smatch that doesn't help developers in the least. What would be ideal is a document that spells out explicitly the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc. If anyone knows of such a document, please chime in.

Barring the discovery of such a document though we can use the existing open source Darkcoin SW miner code to reverse engineer these details. This is how this thread started. I was hoping for someone with better SW skills than myself to help with the initial definition (reverse engineering) of the SW implemented hashchain so that we could duplicate it in HW.

So, maybe it would be best to make a "Statement of Work" to say exactly what we're after. If we find an interested SW developer we could show our appreciation by pooling altcoins, giving away FPGA mining HW, beer, weed, percentage of FPGA mined Darkcoins in the future, or some such thing. Here's how I would define the required SW task:

- Start mining Darkcoins using the open source code at https://github.com/elmad/darkcoin-cpuminer-1.3-avx-aes. (or any other open source code that exposes the X11 algorithm within). This will require compiling the source code, creating a Darkcoin wallet to point to, and connecting to a mining pool. This requires no special HW and can be done on any PC. It will be slow on a standard PC, but speed at this stage is irrelevant. This stage is intended for simple data gathering and working on a standard PC is probably optimal.
- Work through the source code and specify the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc.
- Modify the code to dump data streams of nonces moving through the hashchain. For a given input digest, show all intermediate hash values for a sequence of nonces. Verify the integrity of the modified X11 hashchain by verifying accepted pool shares. This dataset will be important as it can be used to verify the FPGA implementation at a low level. We can split the FPGA work into blocks and use this test data to verify a block at a time.
- Modify the open source getwork scripts currently used on other altcoins to interface to Darkcoin specific FPGA HW. This might be over a serial connection or through an Altera USB Blaster cable. This is likely not as arduous as it sounds because it won't need to be much different from what's out there for other altcoins and it can rely on a proxy_miner type program if necessary to simplify things.

If there exists such a SW developer with the skills to do the above, by all means ... reply to this post. State you demands!
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Nice workflow summary, I will happily include myself in such a funds/beer/weed pooling!
 

esuncloud

New Member
May 31, 2014
7
1
3
Here is the hash variants you want:
blake512->bmw512->groestl512->skein512->jh512->keccak512->luffa512->cubehash512->shavite512->simd512->echo512
By the way, your FPGA guys should start to prepare the 64 bit word hash code, cause many of the open source code is only 32 bit.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Are you able to break it down for me a little more. do you need to hash all the 11 algos until you solve each? if so, do you submit 11 nonces'?
 

fusecavator

Member
Jun 4, 2014
40
38
58
- Work through the source code and specify the format of the input digest to the hash chain, the exact order of the hashes, which variants of the hashes are used, etc.
- Modify the code to dump data streams of nonces moving through the hashchain. For a given input digest, show all intermediate hash values for a sequence of nonces. Verify the integrity of the modified X11 hashchain by verifying accepted pool shares. This dataset will be important as it can be used to verify the FPGA implementation at a low level. We can split the FPGA work into blocks and use this test data to verify a block at a time.
I worked off the the darkcoin block verification code rather than that miner, as I figured a straightforward implementation would be easier to look at than a highly optimized one.

The data being hashed is the block header(80 bytes total). It is in the following format:
(4 bytes little endian) version
(32 bytes little endian) previous block hash
(32 bytes little endian) merkle root hash
(4 bytes little endian) time
(4 bytes little endian) some value called bits that I'm not sure what it is
(4 bytes little endian) nonce

This chunk of data is fed into blake512. The blake512 hash(just the resulting hash; none of the header is used again) is hashed with bmw512. The same thing continues with the hash resulting from the previous hashing function being hashed by the next algorithm through the whole chain. The chain is blake512 -> bmw512 ->groestl512 -> skein512 -> jh512 -> keccak512 -> luffa512 -> cubehash512 -> shavite512 -> simd512 -> echo512. Only the first 32 bytes(256 bits) of the echo512 hash are used, the 2nd half of the hash is just discarded.

For dumping values for testing, I whipped up a program that takes the header values from a command line, and dumps out each step of the way. Disclaimer: really shitty code, and a lot of stuff copied and pasted from the darkcoin source. https://mega.co.nz/#!fMRVBSIa!VH7eGk9iyg2mPdA-MdAGeTGs5nm8GsFnm7_VpnPgpl4 pool share verification should be unecessary, as one can plug in values from the blockchain to see that it comes out the same as the block hash shown in the block explorer

example:
for block 79695
block explorer(click raw block to see the important values): http://chainz.cryptoid.info/drk/block.dws?79695.htm
program command (bits is taken as integer(i was getting lazy), in case you can't figure out where 453957317 came from):
Code:
x11dump.exe 2 000000000003ef7c942336b52405cb8cba63848e74762f892de100bf645f7a91 9013a9db46bd1872c1b95ee12add669d631d32853fdc80b1643189947ee19828 1401847865 453957317 6893042
and the output is:
Code:
nVersion:       2
hashPrevBlock:  000000000003ef7c942336b52405cb8cba63848e74762f892de100bf645f7a91
hashMerkleRoot: 9013a9db46bd1872c1b95ee12add669d631d32853fdc80b1643189947ee19828
nTime:          1401847865
nBits:          453957317
nNonce:         6893042
Combined for hashing:
02000000917a5f64bf00e12d892f76748e8463ba8ccb0524b53623947cef0300000000002898e17e94893164b180dc3f85321d639d66dd2ae15eb9c17218bd46dba9139039808e53c5d60e1bf22d6900
Hash 1: blake512
input: 02000000917a5f64bf00e12d892f76748e8463ba8ccb0524b53623947cef0300000000002898e17e94893164b180dc3f85321d639d66dd2ae15eb9c17218bd46dba9139039808e53c5d60e1bf22d6900
output: a3d4ca17aefae732402b4a236d0ba5818fb9263cea3ab731d6e0e5ad4338906fd6035fa803931ecc27f66c11b2699e2d0f2da3a3e9cf93f064f6fed0c49ac031

Hash 2: bmw512
input: a3d4ca17aefae732402b4a236d0ba5818fb9263cea3ab731d6e0e5ad4338906fd6035fa803931ecc27f66c11b2699e2d0f2da3a3e9cf93f064f6fed0c49ac031
output: 89c3c3217f1ddda9307773b0f02b317966f2e881b0138417b35cbf74dd67bdec593e3eec98669c4ef05a2b0889179bab174cf16e19b57e64cc20ccd8b4e92a35

Hash 3: groestl512
input: 89c3c3217f1ddda9307773b0f02b317966f2e881b0138417b35cbf74dd67bdec593e3eec98669c4ef05a2b0889179bab174cf16e19b57e64cc20ccd8b4e92a35
output: c5753e3735813ceeb8d6cd566cf482f374ae13b7bd9cf4ad896ba53c726e52c2299bc21b60aa2b7d9dafb35d160031137d0451643f8b96cd2eedbbf7ede2c691

Hash 4: skein512
input: c5753e3735813ceeb8d6cd566cf482f374ae13b7bd9cf4ad896ba53c726e52c2299bc21b60aa2b7d9dafb35d160031137d0451643f8b96cd2eedbbf7ede2c691
output: 3374d75a22434b825e5fe49f0f9615d837b779d6beaef99e2ee18218732be69da97bc14c4373bffe791026684b5203a1cdf4cff3c129bd328e72db34f9f11fc1

Hash 5: jh512
input: 3374d75a22434b825e5fe49f0f9615d837b779d6beaef99e2ee18218732be69da97bc14c4373bffe791026684b5203a1cdf4cff3c129bd328e72db34f9f11fc1
output: 3600ae5de6b0cd7e67ea5f8ccc14b3bdd8794dc315d303aa8b2b2c5547d409b6175e096a8502f2b8072c7750428422b0b74a4e6640149583b89bed7f9bcbab86

Hash 6: keccak512
input: 3600ae5de6b0cd7e67ea5f8ccc14b3bdd8794dc315d303aa8b2b2c5547d409b6175e096a8502f2b8072c7750428422b0b74a4e6640149583b89bed7f9bcbab86
output: 8b292eac29e627290ef3e919373a8f191f5baf5da7e0f4402acdcb7cef37b9ec20c71569eb5b63c5ce2edec9fa5c7b1ebaa687fc6c28bdfbce8d77d23bec1ed7

Hash 7: luffa512
input: 8b292eac29e627290ef3e919373a8f191f5baf5da7e0f4402acdcb7cef37b9ec20c71569eb5b63c5ce2edec9fa5c7b1ebaa687fc6c28bdfbce8d77d23bec1ed7
output: f21851164bb075bc598e3a6587420b606e6906f183a9b94d713e393026a74fa58239adef113b4ce633b1fb2c106b2d713442a27653abfc2d7c738a134f4eedbf

Hash 8: cubehash512
input: f21851164bb075bc598e3a6587420b606e6906f183a9b94d713e393026a74fa58239adef113b4ce633b1fb2c106b2d713442a27653abfc2d7c738a134f4eedbf
output: ea7a9fcdcb5c4fe53ed239b1a468005ba3f4f4a4fd1a12752f6f71cccbda5d06601059d324104a28bc945a9cd2fc690db986e5caeb82676b1f021b593d8c459a

Hash 9: shavite512
input: ea7a9fcdcb5c4fe53ed239b1a468005ba3f4f4a4fd1a12752f6f71cccbda5d06601059d324104a28bc945a9cd2fc690db986e5caeb82676b1f021b593d8c459a
output: e3ec7fb3adc45af9b0ad7e02a55dc39477ccb2b5a15c1fa71fe2c3f499d9ef8037fdc75436c59cddcced300d640b348758b9ad3f941fc7316e997e3df9cb843e

Hash 10: simd512
input: e3ec7fb3adc45af9b0ad7e02a55dc39477ccb2b5a15c1fa71fe2c3f499d9ef8037fdc75436c59cddcced300d640b348758b9ad3f941fc7316e997e3df9cb843e
output: d46905bf6b915d5d88d35b5aee5e448eb658ad1ca9f5904b90fe3abe32355aa072e38b7e7e5721443b88beedf09d23af022adea932b16dbca64e201c8de7f1a6

Hash 11: echo512
input: d46905bf6b915d5d88d35b5aee5e448eb658ad1ca9f5904b90fe3abe32355aa072e38b7e7e5721443b88beedf09d23af022adea932b16dbca64e201c8de7f1a6
output: c347cb8077e2cb7ce01b99d56e91d916588761d510d8352f3c2f01000000000019a3e6d8c882a0be029f08f8c869ad2508ddf67cf19941b6337922ae14f485bb

trimmed:
Hash: 0000000000012f3c2f35d810d561875816d9916ed5991be07ccbe27780cb47c3
The ending hash might not look like the echo512 output at first, but thats due to the endianness. all the input/output parts on each hash dump it one byte at a time, exactly in the order it is in ram, but where the 256 bit hashes are displayed, they're flipped around.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Fusecavator thankyou so much. Everything I needed. This means an FPGA could run 11 hashes at the same time, limited by the largest nonce required by the one of the 11 algos. Awesome! Lets start with just getting 1 hash at a time going
 

bodhi

New Member
Jun 4, 2014
1
0
1
Thanks much fusecavator =)

With that out of the way, I'm a software dev and total newb to FPGAs but I have a Virtex-5 LX50. Think I can fit X11 on it?

Also, not sure what all I can do to help with this effort but as it progresses I'd like to help.
 
Last edited by a moderator:

glamorgoblin

New Member
May 24, 2014
20
2
3
bodhi,

Well fusecavator did most everything in the original statement of work. The only thing he didn't do is:

"Modify the open source getwork scripts currently used on other altcoins to interface to Darkcoin specific FPGA HW. This might be over a serial connection or through an Altera USB Blaster cable. This is likely not as arduous as it sounds because it won't need to be much different from what's out there for other altcoins and it can rely on a proxy_miner type program if necessary to simplify things."

If you're comfortable with python and tcl maybe you'd be interested in looking into how to interface an FPGA miner to a mining pool? My hardware uses an Altera USB Blaster interface to communicate between the FPGAs and the PC host. Anyone else on the forum use a different kind of interface we should consider supporting as well? I think the open source projects all support the Blaster interface so that's probably the lowest common denominator. Other interfaces could be added after the fact.

Have a look at the scripts used for https://github.com/kramble/FPGA-Litecoin-Miner and https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner. They show how to interface the USB Blaster HW to a getwork pool server. We could simply modify these scripts for Darkcoin assuming the getwork interface is similar and then look for a pool with a getwork interface, or look for/create a stratum_proxy to use, or try to get fancy and add stratum support from the start.

Does any of this sound interesting?