Evolution question; DashDrive

camosoul · Jun 29, 2016

I was reading again...

Say someone is a really big dick. Or an accident happens...

1) If just the right MNs, or a really large chunk of MNs, goes offline. Say, Earthquake, Datacenter burns down, gets nuked, meteor lands on it...

2) Or, worse, a MN sharing service (which also never votes) decides to be a prick and just dumps 1/3 of the network.

3) Or an MN just goes offline temporarily due to circumstances beyond the operator's control.

How does the network handle the loss of all redundant shard data?

In the case of instance 2, it is not merely probable, but certain, that there will be pieces of data lost entirely.

How do we discourage that behavior?

Can the collateral be more than locked? Can bad actors be stripped of their 1000 DASH? A proper shutdown process whereby data is moved and secured and then the 1000 DASH is unlocked to allow the node to shut down gracefully without penalty?

But, then how does this impact those affected by Acts of God or a datacenter's switch blows out? If your node goes down in an Earthquake, your house burns up due to a ruptured gas line.. Oh, and all your Dash is now gone because in the fight for your life and the loss of all your worldly possessions, you just didn't get around to setting up a new node... Way to add insult to injury... Assuming there is a temporary reprieve at all... And the fact that that data is lost for good either way...

What if some malicious entity with money to burn, say, tax money... What stops such an entity from setting up a major fraction of the MN capacity, and deliberately scuttling it? They don't care. It's not their money anyway, all stolen...

DASH could be perceived as a real threat to power. BitClones are toys that can't be taken seriously due to flaws the userbase doesn't understand, and the devs see no reason to fix because the userbase isn't complaining... So BitClones are minor threats. DASH needs more than just a fiscal guard in the event of an attacker that doesn't care about the price because it's evil empire is finally facing a real threat for the first time, ever... Desperate and Evil is a combination that doesn't give a fuck how much it costs.

Am I not properly understanding sharding? Will it be more like a massively striped RAID? Details.... I need details.

What assures the integrity/authenticity of the data? Not mining? So, doesn't this also prove that mining is vestigial and other metrics can be used for the purpose of validating and securing the blockchain if it can be used to do the same for other data? Is this why nobody wants to talk about it every time I bring it up? Don't want to panic the whiner miners by confirming or denying their logical, inevitable, and impending extinction?

TaoOfSatoshi · Jun 29, 2016

camosoul said:
I was reading again...

Say someone is a really big dick. Or an accident happens...

1) If just the right MNs, or a really large chunk of MNs, goes offline. Say, Earthquake, Datacenter burns down, gets nuked, meteor lands on it...

2) Or, worse, a MN sharing service (which also never votes) decides to be a prick and just dumps 1/3 of the network.

3) Or an MN just goes offline temporarily due to circumstances beyond the operator's control.

How does the network handle the loss of all redundant shard data?

In the case of instance 2, it is not merely probable, but certain, that there will be pieces of data lost entirely.

How do we discourage that behavior?

Can the collateral be more than locked? Can bad actors be stripped of their 1000 DASH? A proper shutdown process whereby data is moved and secured and then the 1000 DASH is unlocked to allow the node to shut down gracefully without penalty?

But, then how does this impact those affected by Acts of God or a datacenter's switch blows out? If your node goes down in an Earthquake, your house burns up due to a ruptured gas line.. Oh, and all your Dash is now gone because in the fight for your life and the loss of all your worldly possessions, you just didn't get around to setting up a new node... Way to add insult to injury... Assuming there is a temporary reprieve at all... And the fact that that data is lost for good either way...

What if some malicious entity with money to burn, say, tax money... What stops such an entity from setting up a major fraction of the MN capacity, and deliberately scuttling it? They don't care. It's not their money anyway, all stolen...

DASH could be perceived as a real threat to power. BitClones are toys that can't be taken seriously due to flaws the userbase doesn't understand, and the devs see no reason to fix because the userbase isn't complaining... So BitClones are minor threats. DASH needs more than just a fiscal guard in the event of an attacker that doesn't care about the price because it's evil empire is finally facing a real threat for the first time, ever... Desperate and Evil is a combination that doesn't give a fuck how much it costs.

Am I not properly understanding sharding? Will it be more like a massively striped RAID? Details.... I need details.

What assures the integrity/authenticity of the data? Not mining? So, doesn't this also prove that mining is vestigial and other metrics can be used for the purpose of validating and securing the blockchain if it can be used to do the same for other data? Is this why nobody wants to talk about it every time I bring it up? Don't want to panic the whiner miners by confirming or denying their logical, inevitable, and impending extinction?

I've had these thoughts as well. It would be nice to get some reassurance there....

AndyDark · Jul 1, 2016

camosoul said:
I was reading again...

Say someone is a really big dick. Or an accident happens...

1) If just the right MNs, or a really large chunk of MNs, goes offline. Say, Earthquake, Datacenter burns down, gets nuked, meteor lands on it...

2) Or, worse, a MN sharing service (which also never votes) decides to be a prick and just dumps 1/3 of the network.

3) Or an MN just goes offline temporarily due to circumstances beyond the operator's control.

How does the network handle the loss of all redundant shard data?

In the case of instance 2, it is not merely probable, but certain, that there will be pieces of data lost entirely.

How do we discourage that behavior?

Can the collateral be more than locked? Can bad actors be stripped of their 1000 DASH? A proper shutdown process whereby data is moved and secured and then the 1000 DASH is unlocked to allow the node to shut down gracefully without penalty?

But, then how does this impact those affected by Acts of God or a datacenter's switch blows out? If your node goes down in an Earthquake, your house burns up due to a ruptured gas line.. Oh, and all your Dash is now gone because in the fight for your life and the loss of all your worldly possessions, you just didn't get around to setting up a new node... Way to add insult to injury... Assuming there is a temporary reprieve at all... And the fact that that data is lost for good either way...

What if some malicious entity with money to burn, say, tax money... What stops such an entity from setting up a major fraction of the MN capacity, and deliberately scuttling it? They don't care. It's not their money anyway, all stolen...

DASH could be perceived as a real threat to power. BitClones are toys that can't be taken seriously due to flaws the userbase doesn't understand, and the devs see no reason to fix because the userbase isn't complaining... So BitClones are minor threats. DASH needs more than just a fiscal guard in the event of an attacker that doesn't care about the price because it's evil empire is finally facing a real threat for the first time, ever... Desperate and Evil is a combination that doesn't give a fuck how much it costs.

Am I not properly understanding sharding? Will it be more like a massively striped RAID? Details.... I need details.

What assures the integrity/authenticity of the data? Not mining? So, doesn't this also prove that mining is vestigial and other metrics can be used for the purpose of validating and securing the blockchain if it can be used to do the same for other data? Is this why nobody wants to talk about it every time I bring it up? Don't want to panic the whiner miners by confirming or denying their logical, inevitable, and impending extinction?

Hey @camosoul

First thing I would say is that we probably don’t need to implement full distributed storage for Evolution V1 because it’s main function is to let the system scale when the total dataset can’t be stored on individual nodes and needs to be sharded across nodes, which won’t be the case for a while and we can speed up by implementing a basic version of DashDrive first that can be upgraded later.

So current plan for DashDrive V1 is to use the Sentinel based approach that functions more like a mempool with local persistent storage of the full dataset (which is also minimized to just user-data set by the system, it’s not like a ‘drive’ where users can upload their own files) and using quorum based consensus.

Thinking longer term on the distributed implementation, it’s a good question how data will survive large-correlated failures, generally I would say the durability depends on the replication mechanism used and the replication factor, meaning what’s the strategy for reading/writing/restoring copies of each blob of sharded data across the network and how many nodes the system is configured to maintain these replicas on.

Centralized distributed storage systems have low replication factors and low durability e.g. HDFS is usually R=3 (3 copies of data) which starts to lose data if just 1% of nodes vanish simultaneously across a randomly replicated cluster, e.g. https://www.usenix.org/system/files/conference/atc13/atc13-cidon.pdf.

P2P storage systems like DashDrive will use more sophisticated replication strategies and replication factors a lot higher, e.g. with 60 copies of data on a 10,000 node network using erasure codes it can survive up to 70% correlated node failure and this can be tweaked by changing the config, e.g. https://gnunet.org/sites/default/files/10.1.1.102.9992.pdf.

So to survive massive correlated failures, it’s a tradeoff between durability and performance, because the more copies of data we have, the more bandwidth / processing / storage is required to access / modify / maintain to the desired cluster graph, but we can decide on the % failure we can handle.

Usually P2P storage systems are modeled stochastically though, for example using continuous Markov Chain processes to estimate the probability of system-wide data loss within a certain time based on individual node behavior, and then configure the system to the desired durability analytically. This allows more factors to be taken into consideration like data availability and restoration and is more realistic, there’s a nice explanation here http://technical-ramblings.blogspot.com/2015/11/calculating-durability-and-availability.html

Therefore we can choose the durability we want but it needs to be balanced with other factors of the implementation that aren’t fully decided yet, and it’s not being worked on right now anyway as the early versions won’t implement the full distributed storage mechanism with the current plan.

Ryan Taylor · Jul 1, 2016

What if we were to supplement with a pair of centralized backups, only to be used / accessed when the decentralized system loses a data point? It could be hosted in three data centers, a pair within about 100 km of each other (maybe Swiss and Germany) to keep a pair of real-time copies, and a third tertiary data center in another country to provide hourly snapshots (on the other side of the earth).

AndyDark · Jul 1, 2016

babygiraffe said:
What if we were to supplement with a pair of centralized backups, only to be used / accessed when the decentralized system loses a data point? It could be hosted in three data centers, a pair within about 100 km of each other (maybe Swiss and Germany) to keep a pair of real-time copies, and a third tertiary data center in another country to provide hourly snapshots (on the other side of the earth).

yes something like that was mentioned as an option in the design before, not sure about the implementation though and would need to be decentralized (like getting paid to run backup nodes of the full data).

camosoul · Jul 6, 2016

There could be a full backup retention option, encrypted, citing a remote drive...

Refer to my previous Optional Proportional Services idea...

There could be the "active" shared network storage, and a form of incentivized hot archival storage that makes sure it specifically excludes the shards of the associated MN?

This way, if the MN goes down, we have known fact of the data elsewhere by specific exclusion.

Shard X is in MN 1. MN 1 also offers downlink to 8tb of NAS at MN 1 operator's house. Include everything that can fit on that 8tb, minus Shard X. Thus, if MN 1 disappears, we know for absolute fact that Shard X can be found somewhere else.

Nodes may perform tracerout and ping on each other and determine if they might be proximal, and make data storage choices based upon this data...

camosoul · Jul 6, 2016

AndyDark said:
Hey @camosoul

First thing I would say is that we probably don’t need to implement full distributed storage for Evolution V1 because it’s main function is to let the system scale when the total dataset can’t be stored on individual nodes and needs to be sharded across nodes, which won’t be the case for a while and we can speed up by implementing a basic version of DashDrive first that can be upgraded later.

So current plan for DashDrive V1 is to use the Sentinel based approach that functions more like a mempool with local persistent storage of the full dataset (which is also minimized to just user-data set by the system, it’s not like a ‘drive’ where users can upload their own files) and using quorum based consensus.

Thinking longer term on the distributed implementation, it’s a good question how data will survive large-correlated failures, generally I would say the durability depends on the replication mechanism used and the replication factor, meaning what’s the strategy for reading/writing/restoring copies of each blob of sharded data across the network and how many nodes the system is configured to maintain these replicas on.

Centralized distributed storage systems have low replication factors and low durability e.g. HDFS is usually R=3 (3 copies of data) which starts to lose data if just 1% of nodes vanish simultaneously across a randomly replicated cluster, e.g. https://www.usenix.org/system/files/conference/atc13/atc13-cidon.pdf.

P2P storage systems like DashDrive will use more sophisticated replication strategies and replication factors a lot higher, e.g. with 60 copies of data on a 10,000 node network using erasure codes it can survive up to 70% correlated node failure and this can be tweaked by changing the config, e.g. https://gnunet.org/sites/default/files/10.1.1.102.9992.pdf.

So to survive massive correlated failures, it’s a tradeoff between durability and performance, because the more copies of data we have, the more bandwidth / processing / storage is required to access / modify / maintain to the desired cluster graph, but we can decide on the % failure we can handle.

Usually P2P storage systems are modeled stochastically though, for example using continuous Markov Chain processes to estimate the probability of system-wide data loss within a certain time based on individual node behavior, and then configure the system to the desired durability analytically. This allows more factors to be taken into consideration like data availability and restoration and is more realistic, there’s a nice explanation here http://technical-ramblings.blogspot.com/2015/11/calculating-durability-and-availability.html

Therefore we can choose the durability we want but it needs to be balanced with other factors of the implementation that aren’t fully decided yet, and it’s not being worked on right now anyway as the early versions won’t implement the full distributed storage mechanism with the current plan.

These models are generally based on static resources. MNs are constantly fluctuating... A dynamic adaptation may result in severe bandwidth suck as it constantly redefines itself...

TanteStefana · Jul 9, 2016

camosoul said:
I was reading again...

Say someone is a really big dick. Or an accident happens...

1) If just the right MNs, or a really large chunk of MNs, goes offline. Say, Earthquake, Datacenter burns down, gets nuked, meteor lands on it...

2) Or, worse, a MN sharing service (which also never votes) decides to be a prick and just dumps 1/3 of the network.

3) Or an MN just goes offline temporarily due to circumstances beyond the operator's control.

How does the network handle the loss of all redundant shard data?

In the case of instance 2, it is not merely probable, but certain, that there will be pieces of data lost entirely.

How do we discourage that behavior?

Can the collateral be more than locked? Can bad actors be stripped of their 1000 DASH? A proper shutdown process whereby data is moved and secured and then the 1000 DASH is unlocked to allow the node to shut down gracefully without penalty?

But, then how does this impact those affected by Acts of God or a datacenter's switch blows out? If your node goes down in an Earthquake, your house burns up due to a ruptured gas line.. Oh, and all your Dash is now gone because in the fight for your life and the loss of all your worldly possessions, you just didn't get around to setting up a new node... Way to add insult to injury... Assuming there is a temporary reprieve at all... And the fact that that data is lost for good either way...

What if some malicious entity with money to burn, say, tax money... What stops such an entity from setting up a major fraction of the MN capacity, and deliberately scuttling it? They don't care. It's not their money anyway, all stolen...

DASH could be perceived as a real threat to power. BitClones are toys that can't be taken seriously due to flaws the userbase doesn't understand, and the devs see no reason to fix because the userbase isn't complaining... So BitClones are minor threats. DASH needs more than just a fiscal guard in the event of an attacker that doesn't care about the price because it's evil empire is finally facing a real threat for the first time, ever... Desperate and Evil is a combination that doesn't give a fuck how much it costs.

Am I not properly understanding sharding? Will it be more like a massively striped RAID? Details.... I need details.

What assures the integrity/authenticity of the data? Not mining? So, doesn't this also prove that mining is vestigial and other metrics can be used for the purpose of validating and securing the blockchain if it can be used to do the same for other data? Is this why nobody wants to talk about it every time I bring it up? Don't want to panic the whiner miners by confirming or denying their logical, inevitable, and impending extinction?

Yah, we were talking about this when Evan first came up with the idea and the solutions he favored back then (it's been a few months, and you know how plans change) is 1. we could have the budget pay for a few massive storage servers that keep an entire copy of all the shards and 2. that everyone can keep a copy of their info in the cloud or on their own computers (only their information - which would keep it small). Those are the two I remember, and the idea was to use both, so redundancy, redundancy, redundancy. If all versions of a shard should disappear at once, 5 new MNs would be sent the shard from the backup storage servers.

Jeztah · Jul 10, 2016

I don't like to hear anything about a backup server. Centralization at it's finest ladies and germs.

1. we could have the budget pay for a few massive storage servers that keep an entire copy of all the shards

The shards should be thrice redundant across the network. I can already hear the trolls if that's the way we end up going.

camosoul · Jul 11, 2016

Aw, no more leeloo multipass?

Jeztah · Jul 12, 2016

camosoul said:
Aw, no more leeloo multipass?

No, but a cutie nonetheless..... don't care if she's a lesbian, always thought she was a cutie anyway.

GermanRed+ · Jul 12, 2016

TanteStefana said:
Yah, we were talking about this when Evan first came up with the idea and the solutions he favored back then (it's been a few months, and you know how plans change) is 1. we could have the budget pay for a few massive storage servers that keep an entire copy of all the shards and 2. that everyone can keep a copy of their info in the cloud or on their own computers (only their information - which would keep it small). Those are the two I remember, and the idea was to use both, so redundancy, redundancy, redundancy. If all versions of a shard should disappear at once, 5 new MNs would be sent the shard from the backup storage servers.

I would like to run a full backup storage server or two if the budget can pay me some DASH. I will run it from a 1Gbps (up to 10Gbps is available here) home fiber connection in Asia with one from the actual physical location while another server thru a VPN at another physical Asian site. All data will be on encrypted RAIDs. @TanteStefana Is the DASH foundation/DAO really interested in this? I think it is good to have at least one full backup not at a commercial datacenter. Could you please point me to how to submit a proposal like this to the budgeting system?

Evolution question; DashDrive

camosoul

Well-known member

TaoOfSatoshi

Well-known member

AndyDark

Well-known member

Ryan Taylor

Well-known member

AndyDark

Well-known member

camosoul

Well-known member

camosoul

Well-known member

TanteStefana

Well-known member

Jeztah

Active member

camosoul

Well-known member

Jeztah

Active member

GermanRed+

Active member