Friday, May 30, 2014

If you want to know if the number is already in your array in there, you best use a Bloom filter. A


Hello folks, I have a very large array, with several million Eintrgen pur water filter of type INT64. Now I mchte these figures can quickly find and see if it already exists in the array. I think this dignity probably offer a Hashtable and I've found it here. Blde Question: How I'll use the Hashtable? As TIntegerDictionary they expected of me with add (key, data) as a Key Int64/Cardinal and as a data pointer. pur water filter Does that mean I create my array as usual and then deliver as a key index of the current element as a pointer and a pointer to the current element of the array? pur water filter Or replace the hash table array? Have never used hash tables and do not understand the rationale degree.
Stefan However, the reference-based implementation of the standard object model in combination with the complex syntactic Dereferenzierungsregeln in an object-oriented API act as a stumbling block.
The hash table basically works approximately as follows: - Fri each input (Int64 in your case), a hash (pseudo hash function; provides in most cases a number in the range of 2 ^ 8 or 2 ^ 16) is calculated - this hash is used as the array index (there is an array with the fixed Gre 2 ^ 8 or 2 ^ 16 (Max hash value). The array is in most cases of type <einfach chained list> This list will be the . data angehngt If you are looking now after input xyz, you must first make the hash, this bentzen as an index and then durchiterieren the list until you find the element Edit: Oh, and possibly wre here the binre search mounted, provided the data in sorted order When exactly 1,000,000 Datenstzen you break test only ~ 20 queries (= log2 (1,000,000)) Edit2: One could indeed both combining and a use two dimensional array, where one accesses the first with the hash index and the two BWA drberjagt the binre search! pur water filter
Hello folks, I have a very large array, with several million Eintrgen of type INT64. Now I mchte these figures can quickly find and see if it already exists in the array. I think this dignity probably offer a Hashtable and I've found it here Delphi 2009 uses hashtables in Generics.Collections, you can for example use the class TDictionary <Key, Value>:
type = TMyInt64HashTable TDictionary <Int64, Boolean; ... / / Hashtable with five million Eintrgen Anfangskapazitt Table: = TMyInt64HashTable.Create (5000000); Table.add ... (1, True); if Table.ContainsKey (1) then ...
If you want to know if the number is already in your array in there, you best use a Bloom filter. A negative result is absolutely reliably, the number is therefore 100% not available in the array. A positive result can be wrong with a low probability, ie You have the result with another, somewhat more complicated method to be berprfen to be sure. Bloom filters are very fast. I any case, thus achieves good results. Search just search for the term (eg, Wikipedia).
A Bloom filter pur water filter the hashmap are located pur water filter upstream and prevent a collision (not unlikely in millions different values) the list has to be searched in a hashmap entry. The other way around One could with a small filter and a large hashmap a bit from the cache and benefit from not constantly have to be eingeswap by chance memory pages that belong to the hashmap. So the knnte already bring. For further considerations it would be interesting to know what percentage of the values that you mchtest pur water filter check up, are they in the array? If you still have problems, imagine pur water filter you how hashmaps used: Implement an associative array (as it is used everywhere in scripting languages).
A Bloom filter the hashmap are located upstream pur water filter and prevent a collision (not unlikely in millions different values) the list has to be searched in a hashmap entry. Good HashMap scale, ie the number of collisions is then relatively low.
Thank you for your answers! My array has 25 million Eintrge and although the linear search in <1 million Eintrge quite ok if you programmed afloat, it is for several million Eintrgen a disaster.
for i: = 0 to array size do / / array size = 25 million array1 [i]: = int64_x; / / Any number, for example: random number for j: = 0 to i-1 do if int64_x = array1 [j] then found: pur water filter = true; The computational effort increases to the square. O (n) Have the Pgm 24 run and he has come to 8 million Have then tried mjustins pur water filter proposal and used the Generics.Collections. The TDictionary as in the example worked perfectly and the time to O (n

No comments:

Post a Comment