Welcome to the Dash Forum!

Please sign up to discuss the most innovative cryptocurrency!

Dash Labs - GPU Accelerator Design Specifications

Discussion in 'Development Tech Discussion' started by eduffield222, Aug 22, 2017.

  1. eduffield222

    eduffield222 New Member

    Joined:
    Jun 25, 2017
    Messages:
    5
    Likes Received:
    15
    Trophy Points:
    3
    We desire to make a fully functional prototype masternode, which is armed with a GPU and software capable of offloading the transaction elipicial curve cryptography checks from the dash-core daemon to the dash-core-gpu implementation.

    This is going to consist of a few pieces

    - GPU Implementation of EC cryptographic code
    - A Bridge implementation, with lock-safe / multi-threaded implementation within the block-checking code of dash-core
    - Configuration data to be added to dash-core to allow switching from stand-alone blockchecking to accelerated block checking.

    Please help to determine the best implementation strategy to add this to dash-core. We can have the conversation here and rework the implementation strategy to suit the requirements discovered below.

    Thanks for the help!
     
    • Like Like x 3
  2. doodlefax

    doodlefax New Member

    Joined:
    Jun 15, 2017
    Messages:
    37
    Likes Received:
    9
    Trophy Points:
    8
    We should start by profiling a single CPU core processing blocks of increasing size with various script types.
    That will give us useful baseline measurements, and optimistic figures for multicore scaling.
    I guess we're most interested in gauging what hardware is required for what tx throughput.

    Like you say, the next step is to multi-thread the block-checking code, aiming for near-linear scaling with
    the number of cores (maybe a bit more with hyper-threading, then likely less than linear with overheads).
    One complication here could be tx-chaining through UTXOs within the block, as those chains would need
    to be serial processed - the tx'es may need to be filtered to extract independent work for parallel processing,
    either on-the-fly or by preprocessing into sub-block batches.

    From what Mr Wright said, it sounds like nchain have taken this step and tested on intel's Xeon Phi
    (aka Knight's Landing 68 core x86). I can't recall the throughput figure he mentioned (it was impressive).
    I don't know if that work is open source. Is anyone his friend?

    Whatever, we could replicate that work (with ASU assist?) and test on more modest multicore.
    That will give us multicore scaling figures and more realistic projections for throughput.

    The next step, to offload the EC crypto to an accelerator, needs pretty large modifications.
    We want to estimate what the cost / benefit for projected loads. Maybe strong multicore is sufficient.
    My feeling is that GPU acceleration should be a win. It ain't easy. But we don't do these things...

    Let's say that 80% of work is in `secp256k1_ecdsa_verify` and that it's suitable for GPU.
    Then, say that it is possible to accelerate by 8x (including cost of CPU <-> GPU comms).
    The total throughput improvement is then around 3x.

    I have an idea of what we'd need to do and can follow up here or elsewhere.
     
    • Like Like x 3
  3. almuheet

    almuheet New Member

    Joined:
    Jan 2, 2019
    Messages:
    4
    Likes Received:
    0
    Trophy Points:
    1
    Thanks, bro!
     

Share This Page