/* * Copyright (C) 2011 The Guava Authors * * Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except * in compliance with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software distributed under the License * is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express * or implied. See the License for the specific language governing permissions and limitations under * the License. */ package com.google.common.hash; import java.nio.charset.Charset; import com.google.common.annotations.Beta; import com.google.common.primitives.Ints; /** * A hash function is a collision-averse pure function that maps an arbitrary * block of data to a number called a hash code. * *

Definition

* *

* Unpacking this definition: * *

* *

* Summarizing the last two points: "equal yield equal always; unequal * yield unequal often." This is the most important characteristic of all * hash functions. * *

Desirable properties

* *

* A high-quality hash function strives for some subset of the following * virtues: * *

* *

Providing input to a hash function

* *

* The primary way to provide the data that your hash function should act on is * via a {@link Hasher}. Obtain a new hasher from the hash function using * {@link #newHasher}, "push" the relevant data into it using methods like * {@link Hasher#putBytes(byte[])}, and finally ask for the {@code HashCode} * when finished using {@link Hasher#hash}. (See an {@linkplain #newHasher * example} of this.) * *

* If all you want to hash is a single byte array, string or {@code long} value, * there are convenient shortcut methods defined directly on * {@link HashFunction} to make this easier. * *

* Hasher accepts primitive data types, but can also accept any Object of type * {@code * T} provided that you implement a {@link Funnel Funnel} to specify how to * "feed" data from that object into the function. (See * {@linkplain Hasher#putObject an example} of this.) * *

* Compatibility note: Throughout this API, multibyte values are always * interpreted in little-endian order. That is, hashing the byte array * {@code {0x01, 0x02, 0x03, 0x04}} is equivalent to hashing the {@code int} * value {@code * 0x04030201}. If this isn't what you need, methods such as * {@link Integer#reverseBytes} and {@link Ints#toByteArray} will help. * *

Relationship to {@link Object#hashCode}

* *

* Java's baked-in concept of hash codes is constrained to 32 bits, and provides * no separation between hash algorithms and the data they act on, so alternate * hash algorithms can't be easily substituted. Also, implementations of * {@code hashCode} tend to be poor-quality, in part because they end up * depending on other existing poor-quality {@code hashCode} * implementations, including those in many JDK classes. * *

* {@code Object.hashCode} implementations tend to be very fast, but have weak * collision prevention and no expectation of bit dispersion. This leaves * them perfectly suitable for use in hash tables, because extra collisions * cause only a slight performance hit, while poor bit dispersion is easily * corrected using a secondary hash function (which all reasonable hash table * implementations in Java use). For the many uses of hash functions beyond data * structures, however, {@code Object.hashCode} almost always falls short -- * hence this library. * * @author Kevin Bourrillion * @since 11.0 */ @Beta public interface HashFunction { /** * Begins a new hash code computation by returning an initialized, stateful * {@code * Hasher} instance that is ready to receive data. Example: * *

	 * {
	 * 	@code
	 *
	 * 	HashFunction hf = Hashing.md5();
	 * 	HashCode hc = hf.newHasher().putLong(id).putBoolean(isActive).hash();
	 * }
	 * 
*/ Hasher newHasher(); /** * Begins a new hash code computation as {@link #newHasher()}, but provides a * hint of the expected size of the input (in bytes). This is only important for * non-streaming hash functions (hash functions that need to buffer their whole * input before processing any of it). */ Hasher newHasher(int expectedInputSize); /** * Shortcut for {@code newHasher().putInt(input).hash()}; returns the hash code * for the given {@code int} value, interpreted in little-endian byte order. The * implementation might perform better than its longhand equivalent, but * should not perform worse. * * @since 12.0 */ HashCode hashInt(int input); /** * Shortcut for {@code newHasher().putLong(input).hash()}; returns the hash code * for the given {@code long} value, interpreted in little-endian byte order. * The implementation might perform better than its longhand equivalent, * but should not perform worse. */ HashCode hashLong(long input); /** * Shortcut for {@code newHasher().putBytes(input).hash()}. The implementation * might perform better than its longhand equivalent, but should not * perform worse. */ HashCode hashBytes(byte[] input); /** * Shortcut for {@code newHasher().putBytes(input, off, len).hash()}. The * implementation might perform better than its longhand equivalent, but * should not perform worse. * * @throws IndexOutOfBoundsException if {@code off < 0} or * {@code off + len > bytes.length} or * {@code len < 0} */ HashCode hashBytes(byte[] input, int off, int len); /** * Shortcut for {@code newHasher().putUnencodedChars(input).hash()}. The * implementation might perform better than its longhand equivalent, but * should not perform worse. Note that no character encoding is performed; the * low byte and high byte of each {@code char} are hashed directly (in that * order). * * @since 15.0 (since 11.0 as hashString(CharSequence)). */ HashCode hashUnencodedChars(CharSequence input); /** * Shortcut for {@code newHasher().putString(input, charset).hash()}. Characters * are encoded using the given {@link Charset}. The implementation might * perform better than its longhand equivalent, but should not perform worse. */ HashCode hashString(CharSequence input, Charset charset); /** * Shortcut for {@code newHasher().putObject(instance, funnel).hash()}. The * implementation might perform better than its longhand equivalent, but * should not perform worse. * * @since 14.0 */ HashCode hashObject(T instance, Funnel funnel); /** * Returns the number of bits (a multiple of 32) that each hash code produced by * this hash function has. */ int bits(); }