Welcome to the Dash Forum!

Please sign up to discuss the most innovative cryptocurrency!

Evaluating the privacy of PrivateSend

Discussion in 'Projects' started by Antti Kaikkonen, Mar 21, 2018.

  1. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    The goal of this project is to measure how private PrivateSend is
    • for different number of rounds
    • for different number of inputs in a PrivateSend transaction
    • on average
    • in the worst case (can we identify PrivateSend transactions with weak privacy)

    How:

    For a given PrivateSend transaction with k inputs
    1. Find all paths from the PrivateSend transaction to all create denominations transactions that can be reached after a given number of mixing rounds
    2. Find all k-combinations of paths such that in each combination the paths are not allowed to contain same transaction outputs
    3. Now we have all possible mixing combinations for the PrivateSend transaction and we can start analysing the combinations.
    Analyzing the combinations (examples):
    • If we find zero combinations then we can conclude that the PrivateSend transaction didn't use the number of rounds we assumed
    • If we have 1 combination then the PrivateSend transaction wasn't private for the assumed number of rounds (but it could be private for a different number of rounds)
    • If we find that in all of the combinations one of the PrivateSend inputs always maps to the same create denominations transaction then the PrivateSend transaction wasn't private for the assumed number of rounds (but it could be private for a different number of rounds)
    • We could also count the number of possible wallets for each of the PrivateSend inputs and use the minimum as a measure of how private the transaction was
    I don't expect anyone to understand what I mean yet, but I'm planning to create some diagrams etc to help with that.

    I have already completed step 1. I also created an algorithm for step 2, but I haven't tested it with real data yet. I think it will be possible compute all the combinations for a small number of rounds but it might be computationally infeasible if we have many inputs combined with for example 8 rounds.
     
    #1 Antti Kaikkonen, Mar 21, 2018
    Last edited: May 11, 2018
    • Like Like x 5
    • Informative Informative x 1
    • Useful Useful x 1
    • Old Old x 1
  2. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Today I decided to search the blockchain for privatesend inputs that can't be using more than 2 rounds. I found several such transactions.

    Here are 3 examples:
    1:
    v1.png
    dashradar import json

    2: Only 3 possible mixing sources (create-denominations transactions). Actually only 2 because there is only one connection to one of the create-denominations tx so the other PrivateSend input has to go to one of the two other ones.
    v2.png
    dashradar import json

    3:
    v3.png
    dashradar import json

    All the inputs of the mixing transactions in the images are expanded and there is no path to a 3rd round. Basically the first example has 4 possible create-denominations transactions. Second example has only 3 2 and the last example has 7.

    The data in the json links can be imported to https://dashradar.com to reconstruct the graphs.
     
    #2 Antti Kaikkonen, Mar 22, 2018
    Last edited: Jan 15, 2019
    • Like Like x 5
  3. jimbursch

    jimbursch Active Member

    Joined:
    Mar 5, 2017
    Messages:
    825
    Likes Received:
    487
    Trophy Points:
    133
    @UdjinM6 -- can you evaluate what @Antti Kaikkonen is reporting?
     
  4. demo

    demo Well-known Member

    Joined:
    Apr 23, 2016
    Messages:
    3,116
    Likes Received:
    262
    Trophy Points:
    153
    Dash Address:
    XnpT2YQaYpyh7F9twM6EtDMn1TCDCEEgNX
    I was wondering, if you expand all graphs, is it possible to discover the recipient address of a privatesend transaction?
     
  5. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    PrivateSend does not hide the recipient address or amount. You can simply expand the outputs of a private send transaction to find the address.. Or
    simply use a block explorer.
     
    • Useful Useful x 1
  6. jimbursch

    jimbursch Active Member

    Joined:
    Mar 5, 2017
    Messages:
    825
    Likes Received:
    487
    Trophy Points:
    133
    Hi @Antti Kaikkonen

    I found your report in Bugcrowd and replied. My understanding is that 2 rounds of mixing, while weak for privacy, is a user's choice and does not pose a security risk. There may be use-cases where 2 rounds of mixing is adequate. But I could be wrong in my limited knowledge.
     
  7. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Hi. I replied you in bugcrowd. Here is the same reply:

    For a normal PrivateSend transaction input it is not possible to determine the number of rounds used just by analyzing the blockchain. However, in the examples I provided it was possible determine that no more than two rounds was used. This was caused by some of the mixing transactions having inputs only from create denominations transactions. As a result of knowing that the PrivateSend inputs were using only 2 rounds it is easy to determine all the possible mixing sources (only two in one of the examples). If there was paths to more mixing rounds, then someone analyzing the blockchain would have to assume 2-8 rounds were used and there would be multiple orders of magnitude more possible mixing sources. So even using only 2 rounds could be a lot more private than it currently is in some cases.
     
  8. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    In my opinion at least the user should receive a warning before sending if his/her PrivateSend transaction could be tracked to only two possible mixing sources.
     
    • Useful Useful x 1
  9. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    As I suspected, computing all the possible mixing combinations for a PrivateSend transaction is computationally really hard. For example I managed to do it for a privatesend transaction with 3 inputs using 4 rounds. That resulted in ~10 billion combinations and took something like 30 minutes using all 8 cpu cores of my computer. Analysing 5 rounds might already take years.

    For that reason I developed a method which repeatedly generates random combinations and then analyses how many times different create denominations transactions are reached from each of the PrivateSend inputs.

    I tested it on the PrivateSend transaction 59d51690d4b56ddbf1e393fa8d3a49bcfc3247f270f36be3b6ee411802666cba using 4 rounds and 1 million random combinations. Here is the result: https://pastebin.com/raw/sTemVnbY

    1b8c10b8437102810650d1e35e639938218512919c5c53ba36e9f54cfce3e904 and 50b2089d6d0805b4699134f9d4a3f4c89e6d9f278cd9066e96a697e35eefe320 are the inputs to the PrivateSend transaction and the transaction ids below them are all create denominations transactions. The number after "=" is the probability of a PrivateSend input coming from that create denominations transaction. For example 0.12035 should mean that there is ~12% probability that it is one of the sources of that PrivateSend transaction.
     
    #9 Antti Kaikkonen, Apr 8, 2018
    Last edited: Apr 8, 2018
    • Like Like x 1
    • Useful Useful x 1
  10. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    I changed the output format of the Monte Carlo simulation defined in the previous post.

    The new output format simply lists txids of create denominations transaction and the probability that at least one of the privatesend inputs is originated from that transaction.

    Here is the the result for 59d51690d4b56ddbf1e393fa8d3a49bcfc3247f270f36be3b6ee411802666cba with 4 rounds assumed: https://pastebin.com/zv2v8NhB

    For example 911d8271516aab449dd108b002fe82e3757c2bf181040ee7718f088c871c96da=0.25711422641816556 means that 911d8271516aab449dd108b002fe82e3757c2bf181040ee7718f088c871c96da appeared in 25.71% of the one million randomly generated combinations.

    I'm planning to implement "address clustering" a.k.a "guesstimated wallets" so that create denominations transactions belonging to the same wallet can be combined and the program could list the wallets and associated probabilities instead. I'm also planning to create a budget proposal to fund implementing all this in the dashradar website.
     
    • Like Like x 1
    • Useful Useful x 1
  11. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    I made two diagrams in an attempt to better explain what I'm doing.

    In both diagrams I use an example of a PrivateSend transaction with 3 inputs: red, blue and orange. The remaining graph contains all the possible 2 round paths to PrivateSend create denominations transactions.

    In the first diagram all 24 of the possible mixing combinations are displayed. In each combinations, multiple inputs are allowed travel through same mixing transaction, but they are not allowed to share common edges because that would mean that a single transaction output was spent twice. It is practically possible compute all the possible mixing combinations when the number of PrivateSend inputs is low and/or the number of mixing rounds analyzed is low.

    The second diagram shows how it is possible to generate a single random mixing combination. This process can be quickly repeated a lot of times and then the randomly generated combinations can be analyzed to determine if the PrivateSend transaction is secure.

    In this specific example both methods would come to the conclusion that the PrivateSend transaction is not secure if it used two rounds, because in all of the mixing combinations shown in the first diagram, there is at least one input mapped to the create denominations transaction in the middle.
     

    Attached Files:

    #11 Antti Kaikkonen, Apr 19, 2018
    Last edited: Apr 19, 2018
    • Like Like x 2
    • Informative Informative x 2
  12. strophy

    strophy Administrator
    Dash Core Team Dash Support Group Moderator

    Joined:
    Feb 13, 2016
    Messages:
    579
    Likes Received:
    327
    Trophy Points:
    133
    Hi @Antti Kaikkonen I confess I don't fully understand the technicalities of what you are reporting, but I wonder if it is the same thing as was reported here:

    https://arxiv.org/pdf/1709.02489.pdf

    In Section 3.2 these guys describe a method for a cluster intersection attack on PrivateSend, is this similar to your method? Would this attack be practically possible against 2 rounds of mixing?
     
  13. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    The cluster intersection attack would only consider the mixing combinations where all the mixing sources (create denominations transactions) are originated from the same address cluster. The address clusters a.k.a. guestimated wallets can be computed by using address clustering heuristics.

    I believe that the cluster intersection attack can also produce false positives. For example if I mix a large amount of Dash from a single source address then it can easily be possible that all PrivateSend inputs of someone else's transaction can be traced back to that address.

    In other words I believe that the cluster intersection method can provide circumstantial evidence at best. For example we might be able to say that:
    if privatesend transaction "xyz" used 2 rounds and the mixing was originated from a single address cluster, then the address cluster containing addresses {a1, a2 ...} is the source. But if one of the two assumptions is wrong then that might not be the case.
     
    • Like Like x 2
    • Useful Useful x 2
  14. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Update:

    I made a demo website to demonstrate the PrivateSend analysis tool. Here is a few example PrivateSend transactions analyzed:

    https://demo.dashradar.com/?txid=d055b2a298522124a6ad00acf03c280efdb0ace409ade2bff6ef4b4ff159a1f2
    https://demo.dashradar.com/?txid=3b26a21ee041e5cafec3171d2bdc3e49252c12c9ae439ddf6f586bc30938aeb6
    https://demo.dashradar.com/?txid=80e15776ed7e21158fc887d43bf85733017254bc196051a314df7412a2b6665f
    https://demo.dashradar.com/?txid=e42a68729d9aaab4f11dfde7fd21a0c3c959e3dbaa9d3c64f5e406b568c199c7
    https://demo.dashradar.com/?txid=ee30fb04d2526c7f02266ec83b6f95f2054456e6145ead532af9e92e587b1e73
    https://demo.dashradar.com/?txid=8c62533b8aa604613905eada0c27a54d92f1bc9d97ffce183d894bb21051f8a1

    It contains all 25250 PrivateSend transactions between blocks 500000 and 862348. Here is the full list of included PrivateSend transactions: list.

    How to interpret the results:
    For example in the 2 rounds tab of d055b2a29.. it says that the create denominations transaction f63bb317.. appeared in 100% of the randomly generated mixing combinations. This means that if the PrivateSend transaction used 2 rounds, then that create denominations transaction is almost guaranteed to be one of the mixing sources. But it is also possible that the PrivateSend transaction used more than 2 rounds (analyzed in other tabs). It is also possible that the PrivateSend transaction used a different number of rounds for different inputs (not included in the analysis). It is also possible that there exists a 2-round mixing combination that doesn't contain that create denominations transaction but it is so unlikely that it didn't appear in any of the 107017 simulations.

    If the number of randomly simulated mixings is low then the results can be inaccurate. If it is 0 then that means that most likely the PrivateSend transaction didn't use that number of rounds for all inputs.


    Currently there are many transactions with a low number of simulated mixings. That is because it takes a lot of processing to run the simulations for 25250 transactions and 7 different number of rounds and I have been only been able run them for ~24 hours using 8 cpu cores so far.

    If you are a masternode and think that it is important to analyze the privacy of PrivateSend then please vote on my proposal: https://www.dashcentral.org/p/DashRadar-development-2
     
    • Like Like x 4
    • Winner Winner x 1
  15. AndyDark

    AndyDark Well-known Member

    Joined:
    Sep 10, 2014
    Messages:
    347
    Likes Received:
    691
    Trophy Points:
    163
    nice analysis!
     
  16. jeffh

    jeffh Member

    Joined:
    May 8, 2017
    Messages:
    107
    Likes Received:
    45
    Trophy Points:
    78
    This is great work @Antti Kaikkonen !!

    Looking forward to seeing what comes next out of. I wonder if there's any way to offload this process to the GPU from your CPU, would it not be better suited to the task?
     
  17. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Thanks!

    Thanks for the feedback. That thing has crossed my mind also but I have never done any GPU programming so I don't know if graph traversal type algorithms are possible to implement in a gpu.

    I have been thinking of more ways to measure privacy of PrivateSend transactions. One of them is finding the smallest possible set of create denominations transactions in which at least one has to be a mixing source. For example it might be possible say that one of {create_denoms_1, create_denoms_2, ... create_denoms_n} has to be a mixing source without assuming anything about the number of rounds used. More transactions in the set is obviously means better privacy. I tried that method once and got to 11 create denominations transactions for a specific PrivateSend transaction, but it took too long before it could find a larger set or conclude that a larger set doesn't exist, so I gave up.

    I will also be adding address clustering functionality so that some create denominations transactions belonging in the same wallet can be combined.
     
  18. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Updated the analysis data on the demo website. Now each transaction has been analyzed for at least 70 seconds (10 for each round). Transactions still have varying number of simulations because it is slower to find valid PrivateSend mixings when there is a large amount of inputs or when the inputs have many common potential mixing transactions.
     
  19. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    I made some charts to see how the number of inputs in a given denomination affects privacy.

    The series (0.01, 0.1, 1.0 and 10.0) represent privatesend transactions that contain inputs only from the specific denomination.

    The y-axis (Traceability) is the average percentage of the most often appearing mixing source(create denominations transaction).

    The x-axis is the number of inputs.

    There is a lot of variance when the number of inputs is high because PrivateSend transactions that use only one denomination and have many inputs are quite rare.

    8-rounds.png 7-rounds.png 6-rounds.png 5-rounds.png 4-rounds.png 3-rounds.png 2-rounds.png
    Next I'm probably going to study how much it affects privacy if many PrivateSend inputs are from the same transaction or time period. I think that the privacy will be stronger for PrivateSend transactions where the inputs are many blocks apart from each other.

    If you are a masternode then please vote on my proposal here!
     
    #19 Antti Kaikkonen, May 18, 2018
    Last edited: May 18, 2018
    • Like Like x 3
    • Informative Informative x 1
  20. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    8-rounds-2.png 7-rounds-2.png 6-rounds-2.png 5-rounds-2.png 4-rounds-2.png 3-rounds-2.png 2-rounds-2.png
    2-input PrivateSend transaction charts above. I would have expected PrivateSend transactions where both inputs are from the same block to have significantly worse privacy but that doesn't seem to be the case. Sometimes it is even the opposite. I guess there might be too little data to analyze 2-input transactions reliably.
     
    #20 Antti Kaikkonen, May 18, 2018
    Last edited: May 18, 2018
    • Like Like x 1
  21. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Hey Dashcoiner.

    Can you provide some links to the other studies that you are talking about? In my opinion even if no transactions can't be de-anonymized it doesn't mean that PrivateSend is flawless and there is no room for improvements.

    Have you seen any other analysis that can give you the most probable mixing sources even when a transaction can't be de-anonymized?

    Did you check my analysis of the de-anonymization contest transaction?
     
    #21 Antti Kaikkonen, May 19, 2018
    Last edited: May 19, 2018
  22. akhavr

    akhavr Active Member

    Joined:
    Oct 11, 2014
    Messages:
    692
    Likes Received:
    355
    Trophy Points:
    133
    Can you explain why you named the percentage axis a Traceability? I'm afraid, I don't get it :(
     
  23. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Thanks for asking and no need to be afraid because I'm not very good at explaining stuff..

    For example, for this transaction the value would be 0.5394215404817816 for 2 rounds, 0.25030889621087315 for 3 rounds etc. So when there is multiple transactions analyzed in the charts, the average of those values was used. For example that transaction has two 0.01 inputs so it would be included in the blue lines at the x-point of 2 inputs. In the column charts it would be included in the blue 1-500 height difference columns because the two inputs (this and this) have a height difference of 8 blocks.
     
  24. akhavr

    akhavr Active Member

    Joined:
    Oct 11, 2014
    Messages:
    692
    Likes Received:
    355
    Trophy Points:
    133
    Perhaps I'm not caffeinated enough this morning yet, but I still don't get.

    Can you define traceability maybe? To my imagination it is a value, correlated to the probability of identification of a (set) of source txids for any defined output, after mixing. (As you might see, I'm also not good at explaining ;) )
     
  25. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Among the set of all possible mixing sources (create denominations transactions) of a PrivateSend transaction it is the probability of the most probable mixing source (create denominations transactions). The probabilities are calculated by repeatedly performing random walks from the PrivateSend transaction in the PrivateSend graph with the limitation that transaction outputs can only be traveled once in multi-input PrivateSend transactions. And the PrivateSend graph is a DAG where the root node is the PrivateSend transaction, leaf nodes are create denominations transactions and the intermediate nodes are mixing transactions.

    So for example, when a PrivateSend transaction has a "traceability" of 0.5 for a given number of rounds, it means that it's possible guess one of the mixing sources and with 50% probability the guess would be correct if the assumed number of rounds is correct. In reality it is almost never possible the know the number of rounds with a few rare exceptions.

    Maybe there is a better word to describe what I mean. And by the way I should have used "Average traceability" in the charts because they are all averages.
     
    • Informative Informative x 1
  26. akhavr

    akhavr Active Member

    Joined:
    Oct 11, 2014
    Messages:
    692
    Likes Received:
    355
    Trophy Points:
    133
    Does that mean, that 1 in graphs above mean 100% probability of the correct guess, given the assumed number of rounds is correct?
     
  27. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    Yes. It means that at least one of the mixing sources appeared in all of the randomly simulated mixings. But of course it is easily possible that a different number of rounds was actually used. It is also possible that different inputs were mixing using a different number of rounds.
     
    • Like Like x 1
  28. Antti Kaikkonen

    Antti Kaikkonen Active Member

    Joined:
    Jun 20, 2017
    Messages:
    250
    Likes Received:
    164
    Trophy Points:
    103
    Dash Address:
    XnZdwT1w2kGeH6RujwoyJ7BBNrukdyTBRB
    #28 Antti Kaikkonen, Jun 8, 2018
    Last edited: Jun 8, 2018
    • Like Like x 1
    • Informative Informative x 1

Share This Page