Wednesday, May 28, 2014

Note that the index is used


Cybercluster 2.1: PostgreSQL Clustering PL/pgSQL_sec: Verschlüsselte Stored Procedures water filter bottle pgneural – Künstliche Intelligenz für PostgreSQL pg_matlab: Matlab / PostgreSQL Integration Pgwatch Cybertec Enterprise PostgreSQL Monitor Replikation
Einführung in PostgreSQL PostgreSQL Administration und Performance Tuning PostgreSQL Kompakt PostgreSQL water filter bottle für Applikationsbetreuer Business Intelligence und Massendatenanalyse Intensivseminar PostgreSQL Replikation Advanced PostgreSQL Optimierung PostGIS Referenzen Media
Is there anybody out there who has not scratched his head when it comes to indexing? I think once in a while we all ask ourselves how to index a very large table with dozens of columns. This is not PostgreSQL specific but rather something which will happen in any kind of database system.
The natural question which arises is: Which columns will REALLY need an index? In many cases programmers will know which ones will be searched most often but what if the combinations used to query are totally random? What if filtering water filter bottle on the 10th column is as likely as filtering on the 15th or on the 29th column? To fix the problem it is sometimes simply not possible to index ALL columns - it is simply by far too expensive to index everything and write performance would be degraded ways too much. This is not just true for PostgreSQL but for any other relational database out there.
Over the weekend I had some time to take a closer water filter bottle look at Oleg Bartunov's "bloom" filter implementation. The bloom filter package has been written some years ago but was never ported to a recent version of PostgreSQL. Fortunately it was not too hard to teach the module some PostgreSQL 9.2 so I gave it a try. The results water filter bottle are as far as performance water filter bottle and space are concerned impressive.
How does it work? Internally a bloom filter is based on a bit array which is calculated for a set of columns inside a row. It allows fast and efficient pre-filtering of data. The main advantage is that the index itself is really small.
The index is then used to fetch data for any kind of combination: SET enable_seqscan TO off; EXPLAIN ANALYZE SELECT * FROM tab WHERE c=20 AND b=15; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on tab (cost=0.00..4.02 rows=1 width=12) (actual time=0.972..0.972 rows=0 loops=1) Recheck Cond: ((a = 20) AND (b = 15)) Rows Removed by Index Recheck: 1 -> Bitmap Index Scan on bloomidx (cost=0.00..0.00 rows=1 width=0) (actual time=0.963..0.963 water filter bottle rows=1 loops=1) water filter bottle Index Cond: ((c = 20) AND (b = 15)) Total runtime: 1.056 ms (6 rows)
Note that the index is used – even without a filter on the "a" column. In case of a btree this would not be the case. A bloom filter can do that which makes a lot of sense if you got many different columns.
Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is consultant and CEO of the company water filter bottle "Cybertec Schönig & Schönig GmbH" (www.postgresq-support.de), which has served countless water filter bottle customers around the globe.
Our CEO @postgresql_007 is giving a talk on the #Mailserver #Konferenz in #Berlin . Topic: PostgreSQL High Security. Who participates? 2 weeks ago #PostgreSQL learning in 5 minutes! We updated our "...in 5 minutes" section with "fuzzy string search" - have a look! http://t.co/IhwqfIURYs 3 weeks ago RT @pgconfeu : PostgreSQL Conference Europe 2014 will be held on Oct 21-24 in Madrid, Spain! water filter bottle http://t.co/lMtugAxtv9 3 weeks ago Bald ist es wieder soweit, unsere nächsten #PostgreSQL Kurse stehen am Programm! Jetzt anmelden und Platz sichern! http://t.co/oTtCo0I959 1 month ago
Neuesten blogs Casting integer to IP 09/05/2014 The power of response times 24/04/2014 Regular expressions unleashed 15/04/2014 Bypassing the transaction log 01/04/2014 wal_level: What is the difference? 26/03/2014 Logging – the hidden speedbrakes 08/03/2014 max_connections – Performance impacts 04/03/2014 Adjusting maintenance_work_mem 27/02/2014 PostgreSQL 9.3 – Shared Buffers Performance (1) 24/02/2014 Detecting fraud: Benford’s law 07/01/2014
Home PostgreSQL Cloud Services Produkte Replikation Support Kurse Referenzen Media Blog Kontakt Impressum

No comments:

Post a Comment