DarkCoin FPGA Mining Co-op?

glamorgoblin

New Member
May 24, 2014
20
2
3
I would imagine the counters are for keeping track of round iterations, but I haven't looked at Blake much yet. I've looked a little bit at the Skein hash and it has counters to keep track of rounds and to know when to mux in feedback data or new data. I've been caught up in end-of-school-year stuff lately and haven't done much other than peeking at Skein. There's a bit of a disconnect there because the University code for Skein (as well as the 10 other hashes I expect) does a good job of demonstrating the 256 bit implementation, but there aren't a lot of clues how to extrapolate to 512 for X11. There are also some "optional" implementation details for Skein that are unclear as to whether they exist in X11 or not. This is going to take a while.
 

fusecavator

Member
Jun 4, 2014
40
38
58
There are also some "optional" implementation details for Skein that are unclear as to whether they exist in X11 or not. This is going to take a while.
The darkcoin code directly uses sphlib ( http://www.saphir2.com/sphlib/ (page isn't loading at the time of writing this, but it was working for me not too long ago, so probably just temp downtime)) for its hashes, so the documentation can likely clear up those issues. There actually is a warning about that on that page:
*************************************************************************
IMPORTANT NOTE: for users of the previous version (sphlib-2.1)
--------------------------------------------------------------
BLAKE, Groestl, JH, Keccak and Skein have been updated, to match the "tweaked" specifications published for the third round of the SHA-3 competition. Thus, these function now return distinct values from what they were producing previously. Also, for Skein with a 224-bit or 256-bit output, the size of the context structure has changed, so calling code must be recompiled as well.
*************************************************************************
I'm guessing darkcoin is using the updated version 3, but I'll compare the source later(don't have sphlib-3 on this comp, and can't dl it when the site is down, but I've got it stored elsewhere)
 
  • Like
Reactions: glamorgoblin

Sbatto

New Member
Jun 2, 2014
11
0
1
I haven't looked enough into Skien however Blake just ups all words to 64-bit from 32-bit for 512 and 256 respectively
 

glamorgoblin

New Member
May 24, 2014
20
2
3
It looks like Skein's 512 (as well as all other Skein implementations) is based on repeated 64 bit adder entities. Going to 512 from 256 just doubles the number of adders. The only missing piece then is how to tie in the tweak calc (which remains the same size for all widths) to a wider round width. I'm getting there.
 

glamorgoblin

New Member
May 24, 2014
20
2
3
atavacron, great find for the hash functions on github. I think the java implementations will translate much more easily to Verilog than C. I spend some time digging through there. Why don't you, Sbatto, and fusecavator put a crypt coin address in your forum signature so we can give you more than just likes for gems like this.
 

crowning

Well-known Member
May 29, 2014
1,414
1,997
183
Alpha Centauri Bc
Does anyone know what the counters t0 and t1 are in the blake algo? https://131002.net/blake/blake.pdf, Glamorgoblin, have you got any of the algos working? maybe we can work on different algos and combine?
typedef unsigned long sph_u64;
#define SPH_C64(x) ((sph_u64)(x ## UL))
T0 = SPH_C64(0xFFFFFFFFFFFFFC00)
T1 = 0xFFFFFFFFFFFFFFFF

So they are just constants for our purpose here.
Is that what you wanted to know?
 
Last edited by a moderator:

glamorgoblin

New Member
May 24, 2014
20
2
3
So, I've poked through the X11 hashes enough now to get the feeling that it will take a LARGE FPGA to fit it all in. Even with a large FPGA it will probably take a fair amount of rolling or folding to squeeze everything down. That got me thinking about a "practical" FPGA board architecture for X11. If anyone is developing a custom board for X11 FPGA work consider this approach:

One FPGA sized to fit just two instances of the largest hash machine in the X11 hashchain. Use one of the more recent FPGA's that support dynamic reconfiguration. Attach wide and fast DDR3 or equivalent memory externally. Connect a small microcontroller to the configuration port of the FPGA. Partition the FPGA into two dynamically reconfigurable hash spaces (slots A and B). The first micro programs the A slot with the Blake hash machine and loads the initial header. The Blake machine runs a sequence of nonces through the Blake machine storing the intermediate hashes in the external RAM. It should be able to store 2K hashes in the external RAM. While the A-Blake machine is running the processor programs the BMW machine into slot B. Once RAM is full, the B slot starts processing the hashes in memory and overwritting Blake hashes in external memory with BMW hashes. While BMW is running the processor reconfigures the A slot with the Groestl machine (where Blake used to be). After BMW is finished the A slot overwrites BMW hashes with Groestl hashes and B gets reconfigured with Skein. This continues until all the hashes have executed.

This approach requires approximately 1/6 of the FPGA gates as a full implementation. It would run at about 1/10 the speed of a full single device implementation, but with the exponential price curve of FPGAs could wind up at 1/20 or 1/50 of the cost. You could make multiple instances of this and still come out dollars ahead for an equivalent hash rate.

I'm going to target whatever X11 solution I get to my existing HW, but if anyone is developing custom X11 FPGA HW, please let me know. I'd be very interested in seeing how it goes.
 

ray

New Member
Jun 11, 2014
1
0
1
Hi there, saw you guys made some progress here !!! In the meantime could anyone point me some direction on implanting the keccak sha3 algo on my virtex-5. I'm total new to fpga just can't find the start. Thanks alot for any help !
 

atavacron

Member
Apr 27, 2014
45
16
48
Hi there, saw you guys made some progress here !!! In the meantime could anyone point me some direction on implanting the keccak sha3 algo on my virtex-5. I'm total new to fpga just can't find the start. Thanks alot for any help !
Hi Ray,

I'm in the same boat, trying to learn to program FPGAs. I ordered a Virtex-5 dev kit that should be here soon. If you find out how to do it please share. I'll do the same of course.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
typedef unsigned long sph_u64;
#define SPH_C64(x) ((sph_u64)(x ## UL))
T0 = SPH_C64(0xFFFFFFFFFFFFFC00)
T1 = 0xFFFFFFFFFFFFFFFF

So they are just constants for our purpose here.
Is that what you wanted to know?
that's perfect, thanks!
I think you're right glamorgoblin, my full parallel Blake512 implementation took up a majority of my cyclone IV. I don't quite understand your idea, would the micros be there just to reprogram the FPGA between each algo?
 

glamorgoblin

New Member
May 24, 2014
20
2
3
I think you're right glamorgoblin, my full parallel Blake512 implementation took up a majority of my cyclone IV. I don't quite understand your idea, would the micros be there just to reprogram the FPGA between each algo?
Right, the micro would have its own flash memory with a bunch of partial FPGA images in it. It would have to supervise the routine. It would sequence through the hashes and then parse through the resulting 2K hashes to look for hits to submit. Some new FPGA's can support partial, on-the-fly reprogramming. With those devices the micro could ping-pong images into the FPGA. While one is executing the other is programming ... then swap. The intermediate hash states are just stored in the external RAM and run through the new hash algo after reprogramming. I don't know of a simple eval board though that could support this though. It requires a processor, FLASH, FPGA, and dedicated FPGA external DRAM. Not expensive, but also not something you find lying around.

Does your Blake512 implementation result in the same hash as that provided by fusecavator when given the same input? Are you using Verilog or VHDL? University code or your own? I'm almost done with Skien myself.
 

glamorgoblin

New Member
May 24, 2014
20
2
3
Ray and Atavacron,

There is a VHDL implementation of Keccak linked from the page at http://keccak.noekeon.org/. Look for the link on the right called "Hardware implementation in VHDL". This likely isn't the exact variant used in X11, but should be a great starting point for tweaking.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Right, the micro would have its own flash memory with a bunch of partial FPGA images in it. It would have to supervise the routine. It would sequence through the hashes and then parse through the resulting 2K hashes to look for hits to submit. Some new FPGA's can support partial, on-the-fly reprogramming. With those devices the micro could ping-pong images into the FPGA. While one is executing the other is programming ... then swap. The intermediate hash states are just stored in the external RAM and run through the new hash algo after reprogramming. I don't know of a simple eval board though that could support this though. It requires a processor, FLASH, FPGA, and dedicated FPGA external DRAM. Not expensive, but also not something you find lying around.

Does your Blake512 implementation result in the same hash as that provided by fusecavator when given the same input? Are you using Verilog or VHDL? University code or your own? I'm almost done with Skien myself.
How many cycles does it take to reprogram an FPGA?

I haven't got the padding going but the hash it's correct for a 1024-bit message. I'm going to try and get groestl going so I can at least mine something in the mean time.

EDIT: I forgot! It's my code in verilog. How about you?
 
Last edited by a moderator:

glamorgoblin

New Member
May 24, 2014
20
2
3
How many cycles does it take to reprogram an FPGA?
It depends on the FPGA, but if there's an external DDR device you'll have quite a bit of time to work with. I did the math wrong by the way. 1Gb worth of external RAM can hold 2M hashes, not 2K. The micro would have as much time as it takes to fully address all of the external RAM to reprogram the offside slot. Hashes would finish in 2M blocks rather than at regular intervals, but the overall hash rate would average out at the pool.
I haven't got the padding going but the hash it's correct for a 1024-bit message. I'm going to try and get groestl going so I can at least mine something in the mean time.
What can you mine with just groestl? I'm mining LTC with my FPGA rig in the meantime, but that's only marginally profitable. If there's something better I'll consider it too.
EDIT: I forgot! It's my code in verilog. How about you?
Yes, Verilog. I hate VHDL, but that seems to be what all the universities like. Yuk. Such an inefficient language for digital logic. Sigh.
 

hyphenx

New Member
Jun 12, 2014
1
0
1
Its been a while since I've coded, but I've got a Xilinx Kintex-7 FPGA KC705 Evaluation Kit that I could make available for testing.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Hi, wouldn't this board fit quite well?
http://www.dinigroup.com/new/DNBFC_S12_PCIe.php
I just inquired about prices, have no idea how much such gear costs.

This page has quite some code for the SHA-3 candidates
https://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html
I think you probably could fit it on that board, full parallel. Your only limitation would be propagation errors. Let us know what you get for the quote, I've got a feeling it's gonna be $8000. Thanks for the link!

It depends on the FPGA, but if there's an external DDR device you'll have quite a bit of time to work with. I did the math wrong by the way. 1Gb worth of external RAM can hold 2M hashes, not 2K. The micro would have as much time as it takes to fully address all of the external RAM to reprogram the offside slot. Hashes would finish in 2M blocks rather than at regular intervals, but the overall hash rate would average out at the pool.
I get it know, you would run the same amount of hashes+DDR interfacing, that it would take to program the offline side. That's wicked, I'll have to give it a crack.
What can you mine with just groestl? I'm mining LTC with my FPGA rig in the meantime, but that's only marginally profitable. If there's something better I'll consider it too.
There's Diamond coin and groestlcoin. They each have dismal net hashrates and volume however it's probably the most profitable move for us ATM. I've got my last Exam on Monday, so I'll smash out the Verilog for the hash on Tuesday. I imagine I will hit a wall at interfacing with the PC and, in turn, the network, especially running at 1H/cycle. Are you able to help out with that at all? I was thinking I would just use the bitcoin FPGA miner to do it.
 

glamorgoblin

New Member
May 24, 2014
20
2
3
I imagine I will hit a wall at interfacing with the PC and, in turn, the network, especially running at 1H/cycle. Are you able to help out with that at all? I was thinking I would just use the bitcoin FPGA miner to do it.
Sure, I can help. I've tinkered with the PC side scripts for BTC, LTC, and DOGE. I'm assuming though that you have a USB Blaster connected FPGA rig? That's what I'm most familiar with.
It looks like the Groestl mining pools support the older getwork protocols which is a plus too, since the scripts will port more easily. Once you get it all working cleanly with getwork, you can either call it a day or try to get one of the stratum proxies to work with it. I'm a little leery of the mining proxies. Seems like a perfect opportunity for someone to write a .exe that gives you 95% of your shares and just happens to attribute the other 5% to the author's account without telling you. Why mine when you can write a proxy script and skim off of hundreds of other miners? Of course there's much much worse that an .exe downloaded from God knows where could do as well.

PM me with details of GroestlCoin like header size/format. Also, if you're using the USB Blaster you'll need to insert Groestl specific probes/sources as virtual wires. Let me know the format of those in your Verilog and I'll see what I have that might match.
 

flipme

New Member
Apr 27, 2014
17
3
3
I think you probably could fit it on that board, full parallel. Your only limitation would be propagation errors. Let us know what you get for the quote, I've got a feeling it's gonna be $8000. Thanks for the link!
Thanks, they just replied without a quote and offered to talk about whats really needed.
How much memory would be required for each core ?
As it has 13 FPGAs, would X13 fit on it also, if the main dispatcher runs an algo task aside?

I'd like to run a calculation for a complete mining machine, based on that board.
Another idea would be a combi-box: A miner with a masternode included. Plug and play.
 

Sbatto

New Member
Jun 2, 2014
11
0
1
Thanks, they just replied without a quote and offered to talk about whats really needed.
How much memory would be required for each core ?
As it has 13 FPGAs, would X13 fit on it also, if the main dispatcher runs an algo task aside?

I'd like to run a calculation for a complete mining machine, based on that board.
Another idea would be a combi-box: A miner with a masternode included. Plug and play.
Not sure if you mean external or internal memory, I'll assume external. It matters how you implement it. If it's full combinatorial, you wouldn't need much memory at all (if any) apart from the controller chip. The main thing is that you would need a lot of logic elements per algo to do this, blake took 80,000 for me and it isn't the largest algo.
If you did each algo pipelined, say computing x-amount of hashes at a time, then you would need enough memory to store x-amount of hashes.

If you can fit x11 on that board, I'd say the additional 2 algos would also fit.

That would be wicked if you could have a controller that just programs the FPGAs once their powered up.
 

esuncloud

New Member
May 31, 2014
7
1
3
How about design an ASIC for darkcoin at the same time, any experienced ASIC designer interested here?
 

esuncloud

New Member
May 31, 2014
7
1
3
We could do this with 0.11 um or even 0.18 um technology, and finish the design firstly.
The MPW fee could be affordable with a small amount IPO and pre-order for full-mask in the future.
However, the risk is still high, because the Darkcoin team may change the mining algorithm anytime.
 
Last edited by a moderator:

esuncloud

New Member
May 31, 2014
7
1
3
Sure, I can help. I've tinkered with the PC side scripts for BTC, LTC, and DOGE. I'm assuming though that you have a USB Blaster connected FPGA rig? That's what I'm most familiar with.
It looks like the Groestl mining pools support the older getwork protocols which is a plus too, since the scripts will port more easily. Once you get it all working cleanly with getwork, you can either call it a day or try to get one of the stratum proxies to work with it. I'm a little leery of the mining proxies. Seems like a perfect opportunity for someone to write a .exe that gives you 95% of your shares and just happens to attribute the other 5% to the author's account without telling you. Why mine when you can write a proxy script and skim off of hundreds of other miners? Of course there's much much worse that an .exe downloaded from God knows where could do as well.

PM me with details of GroestlCoin like header size/format. Also, if you're using the USB Blaster you'll need to insert Groestl specific probes/sources as virtual wires. Let me know the format of those in your Verilog and I'll see what I have that might match.
Any update on the GroestlCoin FPGA miner, cause it looks like a good starting point of X11. However, it should be noted that GroestlCoin will switch to PoS after 150000 in a month.
I am still working to upgrade the following Groestl Verilog code to 512 bit
https://www.rcis.aist.go.jp/files/special/SASEBO/SHA3-ja/Grostl.zip
Have you gotten a workable Groestl512 Verilog code integrated in the FPGA miner, meanwhile we may need another dump program of GroestlCoin, which used double Groestl512.
Maybe fusecavator will be a better person who could do this for us?