DarkCoin FPGA Mining Co-op?

jonpry

New Member
Jun 17, 2014
2
0
1
I have one of those SoC fpga devices which at first seems like it could be a fit for your plan to reprogram the fpga for every hash. However I think you need to redo some of the math here. The board I have has a 32bit 400/800 DDR interface. Which comes out to 25.6gbit/s. Since the result of each hash is 512 bits. This gives you roughly 50M hashes into and out of memory. Each hash algorithm will have to read the previous hash and then write it's output, halving the bandwidth. So 25M raw hashes. Divided by 11 algorithms this gives best case performance of 2.3 MH/s. Aka quite a bit less than a 750ti. This also puts into question the idea of fully unrolling the hashes since 25MH/s/algo should only require 2/4 unrolls. Obviously there is a lot to be gained by fitting multiple hashes into the chip at one time and avoiding the memory.

I have only looked into Blake and BMW so far. They are both Merkle-Damgard wide hash constructions. If all the other hashes were the same. It should be possible to maybe make some kind of construction where the inner loops of the hash use a type of automata so that resources like these huge numbers of 64bit adders can be reused for different algorithms. In such a case maybe all the hashes could fit in a single "generic hash" module. If this module could run > 100Mhz then total performance could be as high as 10MH. Which is like a lot better imho than what you get with memory.
 
Last edited by a moderator:

alnoor1231

New Member
Jun 16, 2014
8
0
1
I have one of those SoC fpga devices which at first seems like it could be a fit for your plan to reprogram the fpga for every hash. However I think you need to redo some of the math here. The board I have has a 32bit 400/800 DDR interface. Which comes out to 25.6gbit/s. Since the result of each hash is 512 bits. This gives you roughly 50M hashes into and out of memory. Each hash algorithm will have to read the previous hash and then write it's output, halving the bandwidth. So 25M raw hashes. Divided by 11 algorithms this gives best case performance of 2.3 MH/s. Aka quite a bit less than a 750ti. This also puts into question the idea of fully unrolling the hashes since 25MH/s/algo should only require 2/4 unrolls. Obviously there is a lot to be gained by fitting multiple hashes into the chip at one time and avoiding the memory.

I have only looked into Blake and BMW so far. They are both Merkle-Damgard wide hash constructions. If all the other hashes were the same. It should be possible to maybe make some kind of construction where the inner loops of the hash use a type of automata so that resources like these huge numbers of 64bit adders can be reused for different algorithms. In such a case maybe all the hashes could fit in a single "generic hash" module. If this module could run > 100Mhz then total performance could be as high as 10MH. Which is like a lot better imho than what you get with memory.
So, you're saying that it`s possible to get a whole 10 mh out of my old spartan?
 

jonpry

New Member
Jun 17, 2014
2
0
1
My chip is a cyclone V with 45k ALM's, 166k registers and 5570 300Mhz BlockRAM's. It still isn't clear that my design is small enough or that you could fit enough of them inside. You would need a very large spartan-6 to even think about it. Essentially what I am proposing is a type of CPU. This processor will have a fixed program memory, no branching capabilities and i think no memory access. Just operations on a hybrid memory/register file. Instructions will be optimized for hashing. So for example on keccak, we could have an instruction that calculates parity on 5 register locations in parallel.
Having a lot of logic sitting around that is not used every cycle can end up creating a lot of waste. So in the case where there is enough space it might be best to have a different CPU for each algorithm. In which case it starts to sound a lot like the usual implementation of a hash. However I think that it is not. None of these FPGA hash implementation make use of features like block ram. They require much more register usage to achieve the same amount of pipelining.
 

sharnalk

New Member
Jun 18, 2014
1
0
1
hi there,
I'm new here as you can see, and am very interested by the dev of an x11 fpga, but since i'm only a soft dev, i cant really help in my "state" . Frome where do you think a i should start to get into it ? Is there some specifications/documentation about x11 ? mining softs ?
 

alnoor1231

New Member
Jun 16, 2014
8
0
1
From what Understood, it's possible to reprogram the spartan 6 to hash x11, but it's not worth it? Sigh.... Well, if anyone is in need of an old fpga, hit me up. Otherwise, I'm putting my old miner back into storage to collect dust, lol.
 

mattmct

Member
Mar 13, 2014
259
92
88
I just stumbled onto this thread. Very interesting indeed, I'll be watching closely. Most way over my head, but I find it very interesting.
 

guppysb

New Member
Jul 17, 2014
1
0
1
I've got a nexys 2 board with spartan 3e-500.
It has been a while since I did some xilinx work, like 3 years ago, mostly OOP software dev now.
 

fusecavator

Member
Jun 4, 2014
40
38
58
Sooo, anyone have any progress on an x11 FPGA??
It was determined that the sha3 candidates are too complicated for affordable fpgas. Even single hash pow functions using sha3 candidates can't be sufficiently unrolled to produce high enough hash rates. Implementing all 11 used in x11 would require a massive fpga, and the results would be poor. IIRC testing showed that skein running on a stratix board only managed to produce about 10% of the hashrate of a radeon 7950, so there really isn't any profit to be made unless you're already sitting on a large fpga farm of very expensive boards.
 

vertoe

Three of Nine
Mar 28, 2014
2,573
1,652
1,283
Unimatrix Zero One
It was determined that the sha3 candidates are too complicated for affordable fpgas. Even single hash pow functions using sha3 candidates can't be sufficiently unrolled to produce high enough hash rates. Implementing all 11 used in x11 would require a massive fpga, and the results would be poor. IIRC testing showed that skein running on a stratix board only managed to produce about 10% of the hashrate of a radeon 7950, so there really isn't any profit to be made unless you're already sitting on a large fpga farm of very expensive boards.
Mmmmh... I've talked to an FPGA dev who told me the opposite. You can solve each of the 11 algos in a separate chip/core and parallize the work. But I've no idea if thats worth the try if you only want FPGAs. I think he was working on ASIC designs.
 

aTg

New Member
Jul 29, 2014
3
1
3
Is not it possible to interconnect the 4 Spartan6 XC6SLX150 FPGAs designed by the ZTEX?

 
Last edited by a moderator:
  • Like
Reactions: ottokoester

fusecavator

Member
Jun 4, 2014
40
38
58
Mmmmh... I've talked to an FPGA dev who told me the opposite. You can solve each of the 11 algos in a separate chip/core and parallize the work. But I've no idea if thats worth the try if you only want FPGAs. I think he was working on ASIC designs.
Space is at a premium on fpgas. Unlike gpus, which process the same instruction on many cores at a time, each part of the fpga only operates on one thing at a time, essentially having a stream of data going through the code, so to get good performance, code has to be unrolled when possible, since otherwise it will hold up the next data while one part goes through the same section multiple times. The 11 algos would be seperate sections, and would run in parallel like you were told, the problem is fitting them unrolled enough to get decent performance. You end up having to make major performance/size tradeoffs, and the performance goes to shit. There might be high-end fpgas that could fit it, but they would be extremely costly.

I don't know very much about asic designs, but I'm guessing you have more flexibility regarding code size, and also your cost per chip would be much lower, and you'd draw less electricity, so those problems likely wouldn't affect asics as much.
 

aTg

New Member
Jul 29, 2014
3
1
3
Space is at a premium on fpgas. Unlike gpus, which process the same instruction on many cores at a time, each part of the fpga only operates on one thing at a time, essentially having a stream of data going through the code, so to get good performance, code has to be unrolled when possible, since otherwise it will hold up the next data while one part goes through the same section multiple times. The 11 algos would be seperate sections, and would run in parallel like you were told, the problem is fitting them unrolled enough to get decent performance. You end up having to make major performance/size tradeoffs, and the performance goes to shit. There might be high-end fpgas that could fit it, but they would be extremely costly.

I don't know very much about asic designs, but I'm guessing you have more flexibility regarding code size, and also your cost per chip would be much lower, and you'd draw less electricity, so those problems likely wouldn't affect asics as much.
Really is not possible to mount a cluster with 11 FPGAs in parallel each dedicated to one algorithm?
 

fusecavator

Member
Jun 4, 2014
40
38
58
Really is not possible to mount a cluster with 11 FPGAs in parallel each dedicated to one algorithm?
Space isn't just a problem for fitting all the algorithms on one board. Even fitting the individual algorithms(512 bit sha3 candidates) alone on boards get poor performance as they can't be unrolled enough and still fit on affordable boards.
 

Ignition75

Active Member
May 25, 2014
332
216
113
Australia
The more I think about this, the more I'm convinced the Scrypt ASIC manufacturers are hiring shills and keeping the price of LTC up, until they can move all their hardware...
 
Last edited by a moderator:

crowning

Well-known Member
May 29, 2014
1,415
1,997
183
Alpha Centauri Bc
The more I think about this, the more I'm convinced the Scrypt ASIC manufacturers are hiring shills and keeping the price of LTC up, until they can move all their hardware...
How ASIC manufacturers work:


  1. Announce fast ASIC
  2. Collect pre-orders
  3. When there's enough money from the pre-orders hire a Chinese ASIC company to develop that thing [*]
  4. When development is finished and units are available mine with them until the difficulty is too high.
  5. Tell the people who have pre-ordered that's there a technical problem, so delivery is postponed
  6. Goto 4.) until people threat to sue you
  7. Deliver or file bankruptcy
  8. Goto 1.
[*] Once the units are finished the Chinese ASIC company does steps 4. - 6. internally until the manufacturer threatens to sue them
 

vertoe

Three of Nine
Mar 28, 2014
2,573
1,652
1,283
Unimatrix Zero One
I agree with @crowing I never had a single piece of mining hardware that ever paid off except for GPUs.

CPUs and GPUs were great because they never really lose value within a few month so hard like ASICs do.
 
Jun 11, 2014
138
46
78
110
Prison
gawminers hashlets might be a game changer. or they might be full of shit.

i will find out soon enough.

supposedly, the hashlets will be able to mine x11 in the next few weeks/months.

i have maybe 50mh in hashlets ready to mine the fuck out of darkcoin if or when that happens. In the meantime, they've been chugging away at scrypt coins and it looks like i will have ROI in about 2 months.

They didn't preorder these things. They just started selling them straight up with instant activation.
 

ilia_2s

Active Member
Oct 3, 2015
479
158
113
Hello!

Can anyone post a source code for full x11 implementation on FPGA?
I am interested in benchmark testing of this.