Lunes, Enero 30, 2012
Lunes, Enero 16, 2012
DATA STRUCTURE TASK
Case 4
1. What is hash collision?
2. What really happens during a hash collision? Include an image.
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.
Ideally, the hash function should map each possible key to a unique slot index, but this ideal is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to the table after it is created). Instead, most hash table designs assume
that hash collisions—different keys that map to the same hash value—will occur and must be accommodated in some way.
3. What are the ways and methods to resolve hash collision? Explain Each
Here are some examples of data fields and how we might achieve the above goals. Note that we'll just give the "raw" hash function here, and look separately at how to actually build hash functions into a class.
ub4 additive( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = hash+key[i]; return (hash % prime); } This takes 5n+3 instructions. There is no mixing step. The combining step handles one byte at a time. Bytes are treated commutatively, and every bit affects on average two output bits (one directly, and an average of one more due to carries). Bytes above the low-order byte of hash are affected only by carries. One of the instructions is mod, which can many cycles -- tens (on a MIPS) or hundreds (on a SPARC). It forces the use of prime table lengths, which makes it hard to dynamically increase the table size. Also, the prime can't be much bigger than one byte because the original hash value won't be much bigger than one byte. Knuth described this hash, except he required the key to be an array of words, not an array of bytes. The rotating hash, also seen in commercial code:
ub4 rotating( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = (hash<<5)^(hash>>27)^key[i]; return (hash % prime); } This is the same as the additive hash, except it has a mixing step (a barrelshift by 5) and the combining step is exclusive-or instead of addition. Speed is 8n+3 instructions. Every input bit affects exactly one output bit (because of the exclusive-or instead of addition). Because of the mixing step, before the % prime, hash is equally likely to be any 32-bit value. This lets any size prime be used. Most input bytes are not treated commutatively. This hash was also described by Knuth , except he required the key to be an array of words, not an array of bytes. Pearson's hash
char pearson( char *key, size_t len, char tab[256]) { char hash; size_t i; for (i=0, hash=0; i<len; ++i) hash=tab[hash^key[i]]; return (hash); }
1. What is hash collision?
When considering duplications technologies, other vendors and some analysts bring up the bogeyman of hash collisions.
So, what is a hash collision, and is it really a concern for data safety in a system like Permabit Enterprise Archive?
The short version is that deduplicating systems that use cryptographic hashes use those hashes to generate shorter “fingerprints” to uniquely identify each piece of data, and determine if that data already exists in the system. The trouble is, by a mathematical rule called the “pigeonhole principle”, you can’t uniquely map any possible files or file chunk to a shorter fingerprint. Statistically, there are multiple possible files that have the same hash. 2. What really happens during a hash collision? Include an image.
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys (e.g., a person's name), to their associated values (e.g., their telephone number). Thus, a hash table implements an associative array. The hash function is used to transform the key into the index (the hash) of an array element (the slot or bucket) where the corresponding value is to be sought.
Ideally, the hash function should map each possible key to a unique slot index, but this ideal is rarely achievable in practice (unless the hash keys are fixed; i.e. new entries are never added to the table after it is created). Instead, most hash table designs assume
that hash collisions—different keys that map to the same hash value—will occur and must be accommodated in some way.
3. What are the ways and methods to resolve hash collision? Explain Each
Basic hash function guidelines
What we basically want to achieve with a hash function is as follows:- we want to combine all data values that vary into the hash code;
- we want the bits of the data that vary most randomly to affect the lower bits of the hash code;
- given a set of typical random keys, we want hash codes where as many bits as possible have roughly 50-50 chance of being a 0 or 1 (in other words are "as randomly distributed as possible"), especially in the lower bits;
- we want the hash function to execute quickly: in other words, it should be a "couple of lines of code" written with a few simple arithmetic operations.
Here are some examples of data fields and how we might achieve the above goals. Note that we'll just give the "raw" hash function here, and look separately at how to actually build hash functions into a class.
Object fields | Example hash code | Comments |
---|---|---|
Two ints, each fairly randomly distributed between 0 and a fairly large number2. | x ^ y | XORing two numbers with roughly random distribution results in another number still with roughly random distribution3, but which now depends on the two values. |
Two objects whose classes already have good hash functions (for example Strings or generally other standard library classes). | x.hashCode() ^ y.hashCode() | If the two classes already have good (randomly distributed) hash code functions, then you can treat the two hash codes as "randomly distributed integers" and XOR them. |
Two ints that have a biased distribution. | (x * p) ^ y Where p is a prime number4 and/or odd number close to a power of 2, such as 17 or 33. | Multiplying one of the numbers (or, put another way, shifting plus adding/subtracting from itself) will help to ensure that the "biased" bits of one number interact with the more "random" bits of the other, so that the bias is "ironed out" over more of the bits. |
enums | Treat as "biased ints": (x.hashCode() * 17) ^ y.hashCode() | Enums typically have low values (favour the lower bits), so they should be combined with a multiplication to spread them across a wider bit range of the hash code. |
Two doubles or floats. | Wrap the primitive in a Float or Double object then call its hashCode() method. new Float(x).hashCode() ^ new Double(y).hashCode() (new Double(x).hashCode() >> 13) ^ new Double(y).hashCode() | For efficiency (but the JIT compiler may remove the need) you could directly copy the code from Double.hashCode() or Float.hashCode() directly into your hash function. See the hashCode() method of Point2D for an example of this. Hash codes of floats and doubles, in particular those representing powers of 2 (including negative powers of 2: 1/2, 1/4 etc), typically have more variation in the upper bits than in the lower ones. The HashMap's secondary hash code alleviates this problem somewhat, but it may be worth combining the hash codes by shifting one of them right by several bits so that more variation ends up in the lower bits. |
public int hashCode() { int hc = cachedHashCode; if (hc == 0) { String varstr = var1 + var2 + var3; hc = varstr.hashCode(); } return hc; }
Case 5
......Give at least three scenarios in the real world wherein HASH COLLISION can happen.
Include an image.
A distributed hash table (DHT) is a class of a decentralized distributed system that provides a lookup service similar to a hash table; (key, value) pairs are stored in a DHT, and any participating node can efficiently retrieve the value associated with a given key. Responsibility for maintaining the mapping from keys to values is distributed among the nodes, in such a way that a change in the set of participants causes a minimal amount of disruption. This allows a DHT to scale to extremely large numbers of nodes and to handle continual node arrivals, departures, and failures.
......Give at least three specific software/program/application wherein HASHING and HASH COLLISION might occur. Include the name of the software/program/application, description and a screenshot.
The additive hash, seen in commercial code:
ub4 additive( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = hash+key[i]; return (hash % prime); } This takes 5n+3 instructions. There is no mixing step. The combining step handles one byte at a time. Bytes are treated commutatively, and every bit affects on average two output bits (one directly, and an average of one more due to carries). Bytes above the low-order byte of hash are affected only by carries. One of the instructions is mod, which can many cycles -- tens (on a MIPS) or hundreds (on a SPARC). It forces the use of prime table lengths, which makes it hard to dynamically increase the table size. Also, the prime can't be much bigger than one byte because the original hash value won't be much bigger than one byte. Knuth described this hash, except he required the key to be an array of words, not an array of bytes. The rotating hash, also seen in commercial code:
ub4 rotating( char *key, size_t len, ub4 prime) { ub4 hash; size_t i; for (hash=(ub4)len, i=0; i<len; ++i) hash = (hash<<5)^(hash>>27)^key[i]; return (hash % prime); } This is the same as the additive hash, except it has a mixing step (a barrelshift by 5) and the combining step is exclusive-or instead of addition. Speed is 8n+3 instructions. Every input bit affects exactly one output bit (because of the exclusive-or instead of addition). Because of the mixing step, before the % prime, hash is equally likely to be any 32-bit value. This lets any size prime be used. Most input bytes are not treated commutatively. This hash was also described by Knuth , except he required the key to be an array of words, not an array of bytes. Pearson's hash
char pearson( char *key, size_t len, char tab[256]) { char hash; size_t i; for (i=0, hash=0; i<len; ++i) hash=tab[hash^key[i]]; return (hash); }
Lunes, Enero 9, 2012
HCSD Data Structure Case Studies
Case Study 1
To aid the fight against computer viruses and other types of malicious software, many security advisory organizations and developers of anti-virus software compile and publish lists of viruses.
The compilation of a unified list of viruses is made difficult because of naming. When a new virus appears, the rush begins to identify and understand it as well as develop appropriate counter-measures to stop its propagation. Along the way, a name is attached to the virus. As the developers of anti-virus software compete partly based on how quickly they react to the new threat, they usually study and name the viruses independently. By the time the virus is identified, many names denote the same virus.
Another source of ambiguity in names is that sometimes a virus initially identified as a completely new virus is found to be a variation of an earlier known virus.There are many viruses and it is can fix using anti viruses.
The compilation of a unified list of viruses is made difficult because of naming. When a new virus appears, the rush begins to identify and understand it as well as develop appropriate counter-measures to stop its propagation. Along the way, a name is attached to the virus. As the developers of anti-virus software compete partly based on how quickly they react to the new threat, they usually study and name the viruses independently. By the time the virus is identified, many names denote the same virus.
Another source of ambiguity in names is that sometimes a virus initially identified as a completely new virus is found to be a variation of an earlier known virus.There are many viruses and it is can fix using anti viruses.
Case Study 2
Computer hacking is the practice of modifying computer hardware and software to accomplish a goal outside of the creator’s original purpose. People who engage in computer hacking activities are often called hackers.Since the word “hack” has long been used to describe someone who is incompetent at his/her profession, some hackers claim this term is offensive and fails to give appropriate recognition to their skills.
Computer hacking is most common among teenagers and young adults, although there are many older hackers as well. Many hackers are true technology buffs who enjoy learning more about how computers work and consider computer hacking an “art” form. They often enjoy programming and have expert-level skills in one particular program. For these individuals, computer hacking is a real life application of their problem-solving skills. It’s a chance to demonstrate their abilities, not an opportunity to harm others.
Bad hackers is making a virus to indicate all cases to make a wrong doing. They build a viruses to spread to web site or to company to hack the server using their talent of programming.The all hackers are secured and it is safety to as to make them a hack to the public.
Case Study 3
a.Silicon Valley Programmer-This people are all group of programming that are testing or examine how to create a real robots.They success in there experiment but I think there is an errors to the robot cause of death by the operator.
b.I make a rules that indicate of safety first.I make a laboratory for the robots that the user is easily program the robot.First I make my own uniform that can protect the body and using programming I also chick first my codes in my data base before I run it, to make it safety.
c.I think it is hard to select witch is I make my codes fake.Because first is my name is involved in the case and I can prove to my self as a good programmer i can fix the codes using the previous codes that I use.
Computer hacking is most common among teenagers and young adults, although there are many older hackers as well. Many hackers are true technology buffs who enjoy learning more about how computers work and consider computer hacking an “art” form. They often enjoy programming and have expert-level skills in one particular program. For these individuals, computer hacking is a real life application of their problem-solving skills. It’s a chance to demonstrate their abilities, not an opportunity to harm others.
Bad hackers is making a virus to indicate all cases to make a wrong doing. They build a viruses to spread to web site or to company to hack the server using their talent of programming.The all hackers are secured and it is safety to as to make them a hack to the public.
Case Study 3
a.Silicon Valley Programmer-This people are all group of programming that are testing or examine how to create a real robots.They success in there experiment but I think there is an errors to the robot cause of death by the operator.
b.I make a rules that indicate of safety first.I make a laboratory for the robots that the user is easily program the robot.First I make my own uniform that can protect the body and using programming I also chick first my codes in my data base before I run it, to make it safety.
c.I think it is hard to select witch is I make my codes fake.Because first is my name is involved in the case and I can prove to my self as a good programmer i can fix the codes using the previous codes that I use.
Mag-subscribe sa:
Mga Post (Atom)