Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL The Next CEO of...

Won the lottery - how do I keep the money?

What is the result of assigning to std::vector<T>::begin()?

Non-deterministic sum of floats

Why did we only see the N-1 starfighters in one film?

What flight has the highest ratio of time difference to flight time?

Interfacing a button to MCU (and PC) with 50m long cable

Complex fractions

Workaholic Formal/Informal

Novel about a guy who is possessed by the divine essence and the world ends?

Make solar eclipses exceedingly rare, but still have new moons

If a black hole is created from light, can this black hole then move at speed of light?

Why do airplanes bank sharply to the right after air-to-air refueling?

FBX seems to be empty when imported into Blender

Can I equip Skullclamp on a creature I am sacrificing?

Inappropriate reference requests from Journal reviewers

Received an invoice from my ex-employer billing me for training; how to handle?

Help understanding this unsettling image of Titan, Epimetheus, and Saturn's rings?

How to transpose the 1st and -1th levels of arbitrarily nested array?

How does the mv command work with external drives?

SOQL: Aggregate, Grouping By and WHERE Clauses not working

Why didn't Khan get resurrected in the Genesis Explosion?

What is ( CFMCC ) on ILS approach chart?

Why does standard notation not preserve intervals (visually)

Is there a way to save my career from absolute disaster?



Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL



The Next CEO of Stack OverflowPostgres won't use btree index in left-anchored LIKE queryHow is LIKE implemented?How to create an index to speed up an aggregate LIKE query on an expression?Why would you index text_pattern_ops on a text column?Get partial match from GIN indexed TSVECTOR columnFull Text Search With PostgreSQLPostgreSQL LIKE query on ARRAY fieldFaster query with pattern-matching on multiple text fieldsBest index for similarity functionPostgreSQL full text search on many columnsDropping a group of schemas with similar name patternsSELECT INTO with regexp_replace() doesn't write changes into newly generated tableNeed help phrasing regular expression to find particular T-SQL patternPattern matching in PostgreSQL?Can Postgres index regular expressions stored in a column?PostgreSQL LIKE query on ARRAY fieldPattern matching and replacement in oracleRepeating pattern X amount of times in LIKEHow to apply ORDER BY and LIMIT in combination with an aggregate function?Query optimization with multi-column variant matching












84















I had to write a simple query where I go looking for people's name that start with a B or a D :



SELECT s.name 
FROM spelers s
WHERE s.name LIKE 'B%' OR s.name LIKE 'D%'
ORDER BY 1


I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?










share|improve this question

























  • Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

    – Martin Smith
    Jan 15 '12 at 11:29













  • I want to write for performance, s.name is not indexed.

    – Lucas Kauffman
    Jan 15 '12 at 11:36






  • 8





    Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

    – Martin Smith
    Jan 15 '12 at 11:39
















84















I had to write a simple query where I go looking for people's name that start with a B or a D :



SELECT s.name 
FROM spelers s
WHERE s.name LIKE 'B%' OR s.name LIKE 'D%'
ORDER BY 1


I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?










share|improve this question

























  • Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

    – Martin Smith
    Jan 15 '12 at 11:29













  • I want to write for performance, s.name is not indexed.

    – Lucas Kauffman
    Jan 15 '12 at 11:36






  • 8





    Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

    – Martin Smith
    Jan 15 '12 at 11:39














84












84








84


64






I had to write a simple query where I go looking for people's name that start with a B or a D :



SELECT s.name 
FROM spelers s
WHERE s.name LIKE 'B%' OR s.name LIKE 'D%'
ORDER BY 1


I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?










share|improve this question
















I had to write a simple query where I go looking for people's name that start with a B or a D :



SELECT s.name 
FROM spelers s
WHERE s.name LIKE 'B%' OR s.name LIKE 'D%'
ORDER BY 1


I was wondering if there is a way to rewrite this to become more performant. So I can avoid or and / or like?







postgresql performance index regular-expression pattern-matching






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jul 24 '13 at 21:09









Daniel Serodio

3541513




3541513










asked Jan 15 '12 at 11:24









Lucas KauffmanLucas Kauffman

6381918




6381918













  • Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

    – Martin Smith
    Jan 15 '12 at 11:29













  • I want to write for performance, s.name is not indexed.

    – Lucas Kauffman
    Jan 15 '12 at 11:36






  • 8





    Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

    – Martin Smith
    Jan 15 '12 at 11:39



















  • Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

    – Martin Smith
    Jan 15 '12 at 11:29













  • I want to write for performance, s.name is not indexed.

    – Lucas Kauffman
    Jan 15 '12 at 11:36






  • 8





    Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

    – Martin Smith
    Jan 15 '12 at 11:39

















Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

– Martin Smith
Jan 15 '12 at 11:29







Why are you trying to rewrite? Performance? Neatness? Is s.name indexed?

– Martin Smith
Jan 15 '12 at 11:29















I want to write for performance, s.name is not indexed.

– Lucas Kauffman
Jan 15 '12 at 11:36





I want to write for performance, s.name is not indexed.

– Lucas Kauffman
Jan 15 '12 at 11:36




8




8





Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

– Martin Smith
Jan 15 '12 at 11:39





Well as you are searching without leading wild cards and not selecting any additional columns an index on name could be useful here if you care about performance.

– Martin Smith
Jan 15 '12 at 11:39










8 Answers
8






active

oldest

votes


















143














Your query is pretty much the optimum. Syntax won't get much shorter, query won't get much faster:



SELECT name
FROM spelers
WHERE name LIKE 'B%' OR name LIKE 'D%'
ORDER BY 1;


If you really want to shorten the syntax, use a regular expression with branches:



...
WHERE name ~ '^(B|D).*'


Or slightly faster, with a character class:



...
WHERE name ~ '^[BD].*'


A quick test without index yields faster results than for SIMILAR TO in either case for me.

With an appropriate B-Tree index in place, LIKE wins this race by orders of magnitude.



Read the basics about pattern matching in the manual.



Index for superior performance



If you are concerned with performance, create an index like this for bigger tables:



CREATE INDEX spelers_name_special_idx ON spelers (name text_pattern_ops);


Makes this kind of query faster by orders of magnitude. Special considerations apply for locale-specific sort order. Read more about operator classes in the manual. If you are using the standard "C" locale (most people don't), a plain index (with default operator class) will do.



Such an index is only good for left-anchored patterns (matching from the start of the string).



SIMILAR TO or regular expressions with basic left-anchored expressions can use this index, too. But not with branches (B|D) or character classes [BD] (at least in my tests on PostgreSQL 9.0).



Trigram matches or text search use special GIN or GiST indexes.



Overview of pattern matching operators




  • LIKE (~~) is simple and fast but limited in its capabilities.
    ILIKE (~~*) the case insensitive variant.

    pg_trgm extends index support for both.


  • ~ (regular expression match) is powerful but more complex and may be slow for anything more than basic expressions.


  • SIMILAR TO is just pointless. A peculiar halfbreed of LIKE and regular expressions. I never use it. Explanation below.


  • % is the "similarity" operator, provided by the additional module pg_trgm.


  • @@ is the text search operator. See below.



pg_trgm - trigram matching



Beginning with PostgreSQL 9.1 you can facilitate the extension pg_trgm to provide index support for any LIKE / ILIKE pattern (and simple regexp patterns with ~) using a GIN or GiST index.



Details, example and links:




  • How is LIKE implemented?


pg_trgm also provides the "similarity" operator %



Text search



Is a special type of pattern matching with separate infrastructure and index types. It uses dictionaries and stemming and is a great tool to find words in documents, especially for natural languages.



Prefix matching is also supported:




  • Get partial match from GIN indexed TSVECTOR column


As well as phrase search since Postgres 9.6:




  • How to search hyphenated words in PostgreSQL full text search?


Consider the introduction in the manual and the overview of operators and functions.



Additional tools for fuzzy string matching



The additional module fuzzystrmatch offers some more options, but performance is generally inferior to all of the above.



In particular, various implementations of the levenshtein() function may be instrumental.



Why are regular expressions (~) always faster than SIMILAR TO?



The answer is simple. SIMILAR TO expressions are rewritten into regular expressions internally. So, for every SIMILAR TO expression, there is at least one faster regular expression (that saves the overhead of rewriting the expression). There is no performance gain in using SIMILAR TO ever.



And simple expressions that can be done with LIKE (~~) are faster with LIKE anyway.



SIMILAR TO is only supported in PostgreSQL because it ended up in early drafts of the SQL standard. They still haven't gotten rid of it. But there are plans to remove it and include regexp matches instead - or so I heard.



EXPLAIN ANALYZE reveals it. Just try with any table yourself!



EXPLAIN ANALYZE SELECT * FROM spelers WHERE name SIMILAR TO 'B%';


Reveals:



...  
Seq Scan on spelers (cost= ...
Filter: (name ~ '^(?:B.*)$'::text)


SIMILAR TO has been rewritten with a regular expression (~).



Ultimate performance for this particular case



But EXPLAIN ANALYZE reveals more. Try, with the afore-mentioned index in place:



EXPLAIN ANALYZE SELECT * FROM spelers WHERE name ~ '^B.*;


Reveals:



...
-> Bitmap Heap Scan on spelers (cost= ...
Filter: (name ~ '^B.*'::text)
-> Bitmap Index Scan on spelers_name_text_pattern_ops_idx (cost= ...
Index Cond: ((prod ~>=~ 'B'::text) AND (prod ~<~ 'C'::text))


Internally, with an index that is not locale-aware (text_pattern_ops or using locale C) simple left-anchored expressions are rewritten with these text pattern operators: ~>=~, ~<=~, ~>~, ~<~. This is the case for ~, ~~ or SIMILAR TO alike.



The same is true for indexes on varchar types with varchar_pattern_ops or char with bpchar_pattern_ops.



So, applied to the original question, this is the fastest possible way:



SELECT name
FROM spelers
WHERE name ~>=~ 'B' AND name ~<~ 'C'
OR name ~>=~ 'D' AND name ~<~ 'E'
ORDER BY 1;


Of course, if you should happen to search for adjacent initials, you can simplify further:



WHERE  name ~>=~ 'B' AND name ~<~ 'D'   -- strings starting with B or C


The gain over plain use of ~ or ~~ is tiny. If performance isn't your paramount requirement, you should just stick with the standard operators - arriving at what you already have in the question.






share|improve this answer


























  • The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

    – Martin Smith
    Jan 15 '12 at 11:43








  • 2





    @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

    – Erwin Brandstetter
    Jan 15 '12 at 11:46













  • Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

    – Martin Smith
    Jan 15 '12 at 11:59






  • 1





    @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

    – Erwin Brandstetter
    Jan 15 '12 at 12:29








  • 2





    @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

    – Erwin Brandstetter
    Jan 17 '12 at 22:44



















11














How about adding a column to the table. Depending on your actual requirements:



person_name_start_with_B_or_D (Boolean)

person_name_start_with_char CHAR(1)

person_name_start_with VARCHAR(30)


PostgreSQL doesn't support computed columns in base tables a la SQL Server but the new column can be maintained via trigger. Obviously, this new column would be indexed.



Alternatively, an index on an expression would give you the same, cheaper. E.g.:



CREATE INDEX spelers_name_initial_idx ON spelers (left(name, 1)); 


Queries that match the expression in their conditions can utilize this index.



This way, the performance hit is taken when the data is created or amended, so may only be appropriate for a low activity environment (i.e. much fewer writes than reads).






share|improve this answer

































    8














    You could try



    SELECT s.name
    FROM spelers s
    WHERE s.name SIMILAR TO '(B|D)%'
    ORDER BY s.name


    I've no idea whether or not either the above or your original expression are sargable in Postgres though.



    If you create the suggested index would also be interested to hear how this compares with the other options.



    SELECT name
    FROM spelers
    WHERE name >= 'B' AND name < 'C'
    UNION ALL
    SELECT name
    FROM spelers
    WHERE name >= 'D' AND name < 'E'
    ORDER BY name





    share|improve this answer





















    • 1





      It worked and I got a cost of 1.19 where I had 1.25. Thanks !

      – Lucas Kauffman
      Jan 15 '12 at 11:41



















    2














    What I have done in the past, faced with a similar performance issue, is to increment the ASCII character of the last letter, and do a BETWEEN. You then get the best performance, for a subset of the LIKE functionality. Of course, it only works in certain situations, but for ultra-large datasets where you're searching on a name for instance, it makes performance go from abysmal to acceptable.






    share|improve this answer































      2














      Very old question, but I found another fast solution to this problem:



      SELECT s.name 
      FROM spelers s
      WHERE ascii(s.name) in (ascii('B'),ascii('D'))
      ORDER BY 1


      Since function ascii() looks only at first character of the string.






      share|improve this answer



















      • 1





        Does this use an index on (name)?

        – ypercubeᵀᴹ
        Nov 25 '17 at 12:56



















      2














      For checking of initials, I often use casting to "char" (with the double quotes). It's not portable, but very fast. Internally, it simply detoasts the text and returns the first character, and "char" comparison operations are very fast because the type is 1-byte fixed length:



      SELECT s.name 
      FROM spelers s
      WHERE s.name::"char" =ANY( ARRAY[ "char" 'B', 'D' ] )
      ORDER BY 1


      Note that casting to "char" is faster than the ascii() slution by @Sole021, but it is not UTF8 compatible (or any other encoding for that matter), returning simply the first byte, so should only be used in cases where the comparison is against plain old 7-bit ASCII characters.






      share|improve this answer

































        1














        There are two methods not mentioned yet for dealing with such cases:





        1. partial (or partitioned - if created for full range manually) index - most useful when only a subset of data is required (for example during some maintenance or temporary for some reporting):



          CREATE INDEX ON spelers WHERE name LIKE 'B%'


        2. partitioning the table itself (using the first character as partitioning key) - this technique is especially worth considering in PostgreSQL 10+ (less painful partitioning) and 11+ (partition pruning during query execution).



        Moreover, if the data in a table is sorted, one can benefit from using BRIN index (over the first character).






        share|improve this answer

































          -4














          Probably faster to do a single character comparison:



          SUBSTR(s.name,1,1)='B' OR SUBSTR(s.name,1,1)='D'





          share|improve this answer





















          • 1





            Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

            – ypercubeᵀᴹ
            Jan 13 '16 at 15:55












          Your Answer








          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "182"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f10694%2fpattern-matching-with-like-similar-to-or-regular-expressions-in-postgresql%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          8 Answers
          8






          active

          oldest

          votes








          8 Answers
          8






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          143














          Your query is pretty much the optimum. Syntax won't get much shorter, query won't get much faster:



          SELECT name
          FROM spelers
          WHERE name LIKE 'B%' OR name LIKE 'D%'
          ORDER BY 1;


          If you really want to shorten the syntax, use a regular expression with branches:



          ...
          WHERE name ~ '^(B|D).*'


          Or slightly faster, with a character class:



          ...
          WHERE name ~ '^[BD].*'


          A quick test without index yields faster results than for SIMILAR TO in either case for me.

          With an appropriate B-Tree index in place, LIKE wins this race by orders of magnitude.



          Read the basics about pattern matching in the manual.



          Index for superior performance



          If you are concerned with performance, create an index like this for bigger tables:



          CREATE INDEX spelers_name_special_idx ON spelers (name text_pattern_ops);


          Makes this kind of query faster by orders of magnitude. Special considerations apply for locale-specific sort order. Read more about operator classes in the manual. If you are using the standard "C" locale (most people don't), a plain index (with default operator class) will do.



          Such an index is only good for left-anchored patterns (matching from the start of the string).



          SIMILAR TO or regular expressions with basic left-anchored expressions can use this index, too. But not with branches (B|D) or character classes [BD] (at least in my tests on PostgreSQL 9.0).



          Trigram matches or text search use special GIN or GiST indexes.



          Overview of pattern matching operators




          • LIKE (~~) is simple and fast but limited in its capabilities.
            ILIKE (~~*) the case insensitive variant.

            pg_trgm extends index support for both.


          • ~ (regular expression match) is powerful but more complex and may be slow for anything more than basic expressions.


          • SIMILAR TO is just pointless. A peculiar halfbreed of LIKE and regular expressions. I never use it. Explanation below.


          • % is the "similarity" operator, provided by the additional module pg_trgm.


          • @@ is the text search operator. See below.



          pg_trgm - trigram matching



          Beginning with PostgreSQL 9.1 you can facilitate the extension pg_trgm to provide index support for any LIKE / ILIKE pattern (and simple regexp patterns with ~) using a GIN or GiST index.



          Details, example and links:




          • How is LIKE implemented?


          pg_trgm also provides the "similarity" operator %



          Text search



          Is a special type of pattern matching with separate infrastructure and index types. It uses dictionaries and stemming and is a great tool to find words in documents, especially for natural languages.



          Prefix matching is also supported:




          • Get partial match from GIN indexed TSVECTOR column


          As well as phrase search since Postgres 9.6:




          • How to search hyphenated words in PostgreSQL full text search?


          Consider the introduction in the manual and the overview of operators and functions.



          Additional tools for fuzzy string matching



          The additional module fuzzystrmatch offers some more options, but performance is generally inferior to all of the above.



          In particular, various implementations of the levenshtein() function may be instrumental.



          Why are regular expressions (~) always faster than SIMILAR TO?



          The answer is simple. SIMILAR TO expressions are rewritten into regular expressions internally. So, for every SIMILAR TO expression, there is at least one faster regular expression (that saves the overhead of rewriting the expression). There is no performance gain in using SIMILAR TO ever.



          And simple expressions that can be done with LIKE (~~) are faster with LIKE anyway.



          SIMILAR TO is only supported in PostgreSQL because it ended up in early drafts of the SQL standard. They still haven't gotten rid of it. But there are plans to remove it and include regexp matches instead - or so I heard.



          EXPLAIN ANALYZE reveals it. Just try with any table yourself!



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name SIMILAR TO 'B%';


          Reveals:



          ...  
          Seq Scan on spelers (cost= ...
          Filter: (name ~ '^(?:B.*)$'::text)


          SIMILAR TO has been rewritten with a regular expression (~).



          Ultimate performance for this particular case



          But EXPLAIN ANALYZE reveals more. Try, with the afore-mentioned index in place:



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name ~ '^B.*;


          Reveals:



          ...
          -> Bitmap Heap Scan on spelers (cost= ...
          Filter: (name ~ '^B.*'::text)
          -> Bitmap Index Scan on spelers_name_text_pattern_ops_idx (cost= ...
          Index Cond: ((prod ~>=~ 'B'::text) AND (prod ~<~ 'C'::text))


          Internally, with an index that is not locale-aware (text_pattern_ops or using locale C) simple left-anchored expressions are rewritten with these text pattern operators: ~>=~, ~<=~, ~>~, ~<~. This is the case for ~, ~~ or SIMILAR TO alike.



          The same is true for indexes on varchar types with varchar_pattern_ops or char with bpchar_pattern_ops.



          So, applied to the original question, this is the fastest possible way:



          SELECT name
          FROM spelers
          WHERE name ~>=~ 'B' AND name ~<~ 'C'
          OR name ~>=~ 'D' AND name ~<~ 'E'
          ORDER BY 1;


          Of course, if you should happen to search for adjacent initials, you can simplify further:



          WHERE  name ~>=~ 'B' AND name ~<~ 'D'   -- strings starting with B or C


          The gain over plain use of ~ or ~~ is tiny. If performance isn't your paramount requirement, you should just stick with the standard operators - arriving at what you already have in the question.






          share|improve this answer


























          • The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

            – Martin Smith
            Jan 15 '12 at 11:43








          • 2





            @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

            – Erwin Brandstetter
            Jan 15 '12 at 11:46













          • Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

            – Martin Smith
            Jan 15 '12 at 11:59






          • 1





            @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

            – Erwin Brandstetter
            Jan 15 '12 at 12:29








          • 2





            @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

            – Erwin Brandstetter
            Jan 17 '12 at 22:44
















          143














          Your query is pretty much the optimum. Syntax won't get much shorter, query won't get much faster:



          SELECT name
          FROM spelers
          WHERE name LIKE 'B%' OR name LIKE 'D%'
          ORDER BY 1;


          If you really want to shorten the syntax, use a regular expression with branches:



          ...
          WHERE name ~ '^(B|D).*'


          Or slightly faster, with a character class:



          ...
          WHERE name ~ '^[BD].*'


          A quick test without index yields faster results than for SIMILAR TO in either case for me.

          With an appropriate B-Tree index in place, LIKE wins this race by orders of magnitude.



          Read the basics about pattern matching in the manual.



          Index for superior performance



          If you are concerned with performance, create an index like this for bigger tables:



          CREATE INDEX spelers_name_special_idx ON spelers (name text_pattern_ops);


          Makes this kind of query faster by orders of magnitude. Special considerations apply for locale-specific sort order. Read more about operator classes in the manual. If you are using the standard "C" locale (most people don't), a plain index (with default operator class) will do.



          Such an index is only good for left-anchored patterns (matching from the start of the string).



          SIMILAR TO or regular expressions with basic left-anchored expressions can use this index, too. But not with branches (B|D) or character classes [BD] (at least in my tests on PostgreSQL 9.0).



          Trigram matches or text search use special GIN or GiST indexes.



          Overview of pattern matching operators




          • LIKE (~~) is simple and fast but limited in its capabilities.
            ILIKE (~~*) the case insensitive variant.

            pg_trgm extends index support for both.


          • ~ (regular expression match) is powerful but more complex and may be slow for anything more than basic expressions.


          • SIMILAR TO is just pointless. A peculiar halfbreed of LIKE and regular expressions. I never use it. Explanation below.


          • % is the "similarity" operator, provided by the additional module pg_trgm.


          • @@ is the text search operator. See below.



          pg_trgm - trigram matching



          Beginning with PostgreSQL 9.1 you can facilitate the extension pg_trgm to provide index support for any LIKE / ILIKE pattern (and simple regexp patterns with ~) using a GIN or GiST index.



          Details, example and links:




          • How is LIKE implemented?


          pg_trgm also provides the "similarity" operator %



          Text search



          Is a special type of pattern matching with separate infrastructure and index types. It uses dictionaries and stemming and is a great tool to find words in documents, especially for natural languages.



          Prefix matching is also supported:




          • Get partial match from GIN indexed TSVECTOR column


          As well as phrase search since Postgres 9.6:




          • How to search hyphenated words in PostgreSQL full text search?


          Consider the introduction in the manual and the overview of operators and functions.



          Additional tools for fuzzy string matching



          The additional module fuzzystrmatch offers some more options, but performance is generally inferior to all of the above.



          In particular, various implementations of the levenshtein() function may be instrumental.



          Why are regular expressions (~) always faster than SIMILAR TO?



          The answer is simple. SIMILAR TO expressions are rewritten into regular expressions internally. So, for every SIMILAR TO expression, there is at least one faster regular expression (that saves the overhead of rewriting the expression). There is no performance gain in using SIMILAR TO ever.



          And simple expressions that can be done with LIKE (~~) are faster with LIKE anyway.



          SIMILAR TO is only supported in PostgreSQL because it ended up in early drafts of the SQL standard. They still haven't gotten rid of it. But there are plans to remove it and include regexp matches instead - or so I heard.



          EXPLAIN ANALYZE reveals it. Just try with any table yourself!



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name SIMILAR TO 'B%';


          Reveals:



          ...  
          Seq Scan on spelers (cost= ...
          Filter: (name ~ '^(?:B.*)$'::text)


          SIMILAR TO has been rewritten with a regular expression (~).



          Ultimate performance for this particular case



          But EXPLAIN ANALYZE reveals more. Try, with the afore-mentioned index in place:



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name ~ '^B.*;


          Reveals:



          ...
          -> Bitmap Heap Scan on spelers (cost= ...
          Filter: (name ~ '^B.*'::text)
          -> Bitmap Index Scan on spelers_name_text_pattern_ops_idx (cost= ...
          Index Cond: ((prod ~>=~ 'B'::text) AND (prod ~<~ 'C'::text))


          Internally, with an index that is not locale-aware (text_pattern_ops or using locale C) simple left-anchored expressions are rewritten with these text pattern operators: ~>=~, ~<=~, ~>~, ~<~. This is the case for ~, ~~ or SIMILAR TO alike.



          The same is true for indexes on varchar types with varchar_pattern_ops or char with bpchar_pattern_ops.



          So, applied to the original question, this is the fastest possible way:



          SELECT name
          FROM spelers
          WHERE name ~>=~ 'B' AND name ~<~ 'C'
          OR name ~>=~ 'D' AND name ~<~ 'E'
          ORDER BY 1;


          Of course, if you should happen to search for adjacent initials, you can simplify further:



          WHERE  name ~>=~ 'B' AND name ~<~ 'D'   -- strings starting with B or C


          The gain over plain use of ~ or ~~ is tiny. If performance isn't your paramount requirement, you should just stick with the standard operators - arriving at what you already have in the question.






          share|improve this answer


























          • The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

            – Martin Smith
            Jan 15 '12 at 11:43








          • 2





            @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

            – Erwin Brandstetter
            Jan 15 '12 at 11:46













          • Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

            – Martin Smith
            Jan 15 '12 at 11:59






          • 1





            @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

            – Erwin Brandstetter
            Jan 15 '12 at 12:29








          • 2





            @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

            – Erwin Brandstetter
            Jan 17 '12 at 22:44














          143












          143








          143







          Your query is pretty much the optimum. Syntax won't get much shorter, query won't get much faster:



          SELECT name
          FROM spelers
          WHERE name LIKE 'B%' OR name LIKE 'D%'
          ORDER BY 1;


          If you really want to shorten the syntax, use a regular expression with branches:



          ...
          WHERE name ~ '^(B|D).*'


          Or slightly faster, with a character class:



          ...
          WHERE name ~ '^[BD].*'


          A quick test without index yields faster results than for SIMILAR TO in either case for me.

          With an appropriate B-Tree index in place, LIKE wins this race by orders of magnitude.



          Read the basics about pattern matching in the manual.



          Index for superior performance



          If you are concerned with performance, create an index like this for bigger tables:



          CREATE INDEX spelers_name_special_idx ON spelers (name text_pattern_ops);


          Makes this kind of query faster by orders of magnitude. Special considerations apply for locale-specific sort order. Read more about operator classes in the manual. If you are using the standard "C" locale (most people don't), a plain index (with default operator class) will do.



          Such an index is only good for left-anchored patterns (matching from the start of the string).



          SIMILAR TO or regular expressions with basic left-anchored expressions can use this index, too. But not with branches (B|D) or character classes [BD] (at least in my tests on PostgreSQL 9.0).



          Trigram matches or text search use special GIN or GiST indexes.



          Overview of pattern matching operators




          • LIKE (~~) is simple and fast but limited in its capabilities.
            ILIKE (~~*) the case insensitive variant.

            pg_trgm extends index support for both.


          • ~ (regular expression match) is powerful but more complex and may be slow for anything more than basic expressions.


          • SIMILAR TO is just pointless. A peculiar halfbreed of LIKE and regular expressions. I never use it. Explanation below.


          • % is the "similarity" operator, provided by the additional module pg_trgm.


          • @@ is the text search operator. See below.



          pg_trgm - trigram matching



          Beginning with PostgreSQL 9.1 you can facilitate the extension pg_trgm to provide index support for any LIKE / ILIKE pattern (and simple regexp patterns with ~) using a GIN or GiST index.



          Details, example and links:




          • How is LIKE implemented?


          pg_trgm also provides the "similarity" operator %



          Text search



          Is a special type of pattern matching with separate infrastructure and index types. It uses dictionaries and stemming and is a great tool to find words in documents, especially for natural languages.



          Prefix matching is also supported:




          • Get partial match from GIN indexed TSVECTOR column


          As well as phrase search since Postgres 9.6:




          • How to search hyphenated words in PostgreSQL full text search?


          Consider the introduction in the manual and the overview of operators and functions.



          Additional tools for fuzzy string matching



          The additional module fuzzystrmatch offers some more options, but performance is generally inferior to all of the above.



          In particular, various implementations of the levenshtein() function may be instrumental.



          Why are regular expressions (~) always faster than SIMILAR TO?



          The answer is simple. SIMILAR TO expressions are rewritten into regular expressions internally. So, for every SIMILAR TO expression, there is at least one faster regular expression (that saves the overhead of rewriting the expression). There is no performance gain in using SIMILAR TO ever.



          And simple expressions that can be done with LIKE (~~) are faster with LIKE anyway.



          SIMILAR TO is only supported in PostgreSQL because it ended up in early drafts of the SQL standard. They still haven't gotten rid of it. But there are plans to remove it and include regexp matches instead - or so I heard.



          EXPLAIN ANALYZE reveals it. Just try with any table yourself!



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name SIMILAR TO 'B%';


          Reveals:



          ...  
          Seq Scan on spelers (cost= ...
          Filter: (name ~ '^(?:B.*)$'::text)


          SIMILAR TO has been rewritten with a regular expression (~).



          Ultimate performance for this particular case



          But EXPLAIN ANALYZE reveals more. Try, with the afore-mentioned index in place:



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name ~ '^B.*;


          Reveals:



          ...
          -> Bitmap Heap Scan on spelers (cost= ...
          Filter: (name ~ '^B.*'::text)
          -> Bitmap Index Scan on spelers_name_text_pattern_ops_idx (cost= ...
          Index Cond: ((prod ~>=~ 'B'::text) AND (prod ~<~ 'C'::text))


          Internally, with an index that is not locale-aware (text_pattern_ops or using locale C) simple left-anchored expressions are rewritten with these text pattern operators: ~>=~, ~<=~, ~>~, ~<~. This is the case for ~, ~~ or SIMILAR TO alike.



          The same is true for indexes on varchar types with varchar_pattern_ops or char with bpchar_pattern_ops.



          So, applied to the original question, this is the fastest possible way:



          SELECT name
          FROM spelers
          WHERE name ~>=~ 'B' AND name ~<~ 'C'
          OR name ~>=~ 'D' AND name ~<~ 'E'
          ORDER BY 1;


          Of course, if you should happen to search for adjacent initials, you can simplify further:



          WHERE  name ~>=~ 'B' AND name ~<~ 'D'   -- strings starting with B or C


          The gain over plain use of ~ or ~~ is tiny. If performance isn't your paramount requirement, you should just stick with the standard operators - arriving at what you already have in the question.






          share|improve this answer















          Your query is pretty much the optimum. Syntax won't get much shorter, query won't get much faster:



          SELECT name
          FROM spelers
          WHERE name LIKE 'B%' OR name LIKE 'D%'
          ORDER BY 1;


          If you really want to shorten the syntax, use a regular expression with branches:



          ...
          WHERE name ~ '^(B|D).*'


          Or slightly faster, with a character class:



          ...
          WHERE name ~ '^[BD].*'


          A quick test without index yields faster results than for SIMILAR TO in either case for me.

          With an appropriate B-Tree index in place, LIKE wins this race by orders of magnitude.



          Read the basics about pattern matching in the manual.



          Index for superior performance



          If you are concerned with performance, create an index like this for bigger tables:



          CREATE INDEX spelers_name_special_idx ON spelers (name text_pattern_ops);


          Makes this kind of query faster by orders of magnitude. Special considerations apply for locale-specific sort order. Read more about operator classes in the manual. If you are using the standard "C" locale (most people don't), a plain index (with default operator class) will do.



          Such an index is only good for left-anchored patterns (matching from the start of the string).



          SIMILAR TO or regular expressions with basic left-anchored expressions can use this index, too. But not with branches (B|D) or character classes [BD] (at least in my tests on PostgreSQL 9.0).



          Trigram matches or text search use special GIN or GiST indexes.



          Overview of pattern matching operators




          • LIKE (~~) is simple and fast but limited in its capabilities.
            ILIKE (~~*) the case insensitive variant.

            pg_trgm extends index support for both.


          • ~ (regular expression match) is powerful but more complex and may be slow for anything more than basic expressions.


          • SIMILAR TO is just pointless. A peculiar halfbreed of LIKE and regular expressions. I never use it. Explanation below.


          • % is the "similarity" operator, provided by the additional module pg_trgm.


          • @@ is the text search operator. See below.



          pg_trgm - trigram matching



          Beginning with PostgreSQL 9.1 you can facilitate the extension pg_trgm to provide index support for any LIKE / ILIKE pattern (and simple regexp patterns with ~) using a GIN or GiST index.



          Details, example and links:




          • How is LIKE implemented?


          pg_trgm also provides the "similarity" operator %



          Text search



          Is a special type of pattern matching with separate infrastructure and index types. It uses dictionaries and stemming and is a great tool to find words in documents, especially for natural languages.



          Prefix matching is also supported:




          • Get partial match from GIN indexed TSVECTOR column


          As well as phrase search since Postgres 9.6:




          • How to search hyphenated words in PostgreSQL full text search?


          Consider the introduction in the manual and the overview of operators and functions.



          Additional tools for fuzzy string matching



          The additional module fuzzystrmatch offers some more options, but performance is generally inferior to all of the above.



          In particular, various implementations of the levenshtein() function may be instrumental.



          Why are regular expressions (~) always faster than SIMILAR TO?



          The answer is simple. SIMILAR TO expressions are rewritten into regular expressions internally. So, for every SIMILAR TO expression, there is at least one faster regular expression (that saves the overhead of rewriting the expression). There is no performance gain in using SIMILAR TO ever.



          And simple expressions that can be done with LIKE (~~) are faster with LIKE anyway.



          SIMILAR TO is only supported in PostgreSQL because it ended up in early drafts of the SQL standard. They still haven't gotten rid of it. But there are plans to remove it and include regexp matches instead - or so I heard.



          EXPLAIN ANALYZE reveals it. Just try with any table yourself!



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name SIMILAR TO 'B%';


          Reveals:



          ...  
          Seq Scan on spelers (cost= ...
          Filter: (name ~ '^(?:B.*)$'::text)


          SIMILAR TO has been rewritten with a regular expression (~).



          Ultimate performance for this particular case



          But EXPLAIN ANALYZE reveals more. Try, with the afore-mentioned index in place:



          EXPLAIN ANALYZE SELECT * FROM spelers WHERE name ~ '^B.*;


          Reveals:



          ...
          -> Bitmap Heap Scan on spelers (cost= ...
          Filter: (name ~ '^B.*'::text)
          -> Bitmap Index Scan on spelers_name_text_pattern_ops_idx (cost= ...
          Index Cond: ((prod ~>=~ 'B'::text) AND (prod ~<~ 'C'::text))


          Internally, with an index that is not locale-aware (text_pattern_ops or using locale C) simple left-anchored expressions are rewritten with these text pattern operators: ~>=~, ~<=~, ~>~, ~<~. This is the case for ~, ~~ or SIMILAR TO alike.



          The same is true for indexes on varchar types with varchar_pattern_ops or char with bpchar_pattern_ops.



          So, applied to the original question, this is the fastest possible way:



          SELECT name
          FROM spelers
          WHERE name ~>=~ 'B' AND name ~<~ 'C'
          OR name ~>=~ 'D' AND name ~<~ 'E'
          ORDER BY 1;


          Of course, if you should happen to search for adjacent initials, you can simplify further:



          WHERE  name ~>=~ 'B' AND name ~<~ 'D'   -- strings starting with B or C


          The gain over plain use of ~ or ~~ is tiny. If performance isn't your paramount requirement, you should just stick with the standard operators - arriving at what you already have in the question.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 3 mins ago

























          answered Jan 15 '12 at 11:38









          Erwin BrandstetterErwin Brandstetter

          95.1k9185299




          95.1k9185299













          • The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

            – Martin Smith
            Jan 15 '12 at 11:43








          • 2





            @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

            – Erwin Brandstetter
            Jan 15 '12 at 11:46













          • Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

            – Martin Smith
            Jan 15 '12 at 11:59






          • 1





            @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

            – Erwin Brandstetter
            Jan 15 '12 at 12:29








          • 2





            @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

            – Erwin Brandstetter
            Jan 17 '12 at 22:44



















          • The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

            – Martin Smith
            Jan 15 '12 at 11:43








          • 2





            @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

            – Erwin Brandstetter
            Jan 15 '12 at 11:46













          • Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

            – Martin Smith
            Jan 15 '12 at 11:59






          • 1





            @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

            – Erwin Brandstetter
            Jan 15 '12 at 12:29








          • 2





            @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

            – Erwin Brandstetter
            Jan 17 '12 at 22:44

















          The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

          – Martin Smith
          Jan 15 '12 at 11:43







          The OP doesn't have an index on name but do you happen to know, if they did, would their original query involve 2 range seeks and similar a scan?

          – Martin Smith
          Jan 15 '12 at 11:43






          2




          2





          @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

          – Erwin Brandstetter
          Jan 15 '12 at 11:46







          @MartinSmith: A quick test with EXPLAIN ANALYZE shows 2 bitmap index scans. Multiple bitmap index scans can be combined rather quickly.

          – Erwin Brandstetter
          Jan 15 '12 at 11:46















          Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

          – Martin Smith
          Jan 15 '12 at 11:59





          Thanks. So would there be any milage with replacing the OR with UNION ALL or replacing name LIKE 'B%' with name >= 'B' AND name <'C' in Postgres?

          – Martin Smith
          Jan 15 '12 at 11:59




          1




          1





          @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

          – Erwin Brandstetter
          Jan 15 '12 at 12:29







          @MartinSmith: UNION won't but, yes, combining the ranges into one WHERE clause will speed up the query. I have added more to my answer. Of course, you have to take your locale into account. Locale-aware search is always slower.

          – Erwin Brandstetter
          Jan 15 '12 at 12:29






          2




          2





          @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

          – Erwin Brandstetter
          Jan 17 '12 at 22:44





          @a_horse_with_no_name: I expect not. The new capabilities of pg_tgrm with GIN indexes are a treat for generic text search. A search anchored at the start is already faster than that.

          – Erwin Brandstetter
          Jan 17 '12 at 22:44













          11














          How about adding a column to the table. Depending on your actual requirements:



          person_name_start_with_B_or_D (Boolean)

          person_name_start_with_char CHAR(1)

          person_name_start_with VARCHAR(30)


          PostgreSQL doesn't support computed columns in base tables a la SQL Server but the new column can be maintained via trigger. Obviously, this new column would be indexed.



          Alternatively, an index on an expression would give you the same, cheaper. E.g.:



          CREATE INDEX spelers_name_initial_idx ON spelers (left(name, 1)); 


          Queries that match the expression in their conditions can utilize this index.



          This way, the performance hit is taken when the data is created or amended, so may only be appropriate for a low activity environment (i.e. much fewer writes than reads).






          share|improve this answer






























            11














            How about adding a column to the table. Depending on your actual requirements:



            person_name_start_with_B_or_D (Boolean)

            person_name_start_with_char CHAR(1)

            person_name_start_with VARCHAR(30)


            PostgreSQL doesn't support computed columns in base tables a la SQL Server but the new column can be maintained via trigger. Obviously, this new column would be indexed.



            Alternatively, an index on an expression would give you the same, cheaper. E.g.:



            CREATE INDEX spelers_name_initial_idx ON spelers (left(name, 1)); 


            Queries that match the expression in their conditions can utilize this index.



            This way, the performance hit is taken when the data is created or amended, so may only be appropriate for a low activity environment (i.e. much fewer writes than reads).






            share|improve this answer




























              11












              11








              11







              How about adding a column to the table. Depending on your actual requirements:



              person_name_start_with_B_or_D (Boolean)

              person_name_start_with_char CHAR(1)

              person_name_start_with VARCHAR(30)


              PostgreSQL doesn't support computed columns in base tables a la SQL Server but the new column can be maintained via trigger. Obviously, this new column would be indexed.



              Alternatively, an index on an expression would give you the same, cheaper. E.g.:



              CREATE INDEX spelers_name_initial_idx ON spelers (left(name, 1)); 


              Queries that match the expression in their conditions can utilize this index.



              This way, the performance hit is taken when the data is created or amended, so may only be appropriate for a low activity environment (i.e. much fewer writes than reads).






              share|improve this answer















              How about adding a column to the table. Depending on your actual requirements:



              person_name_start_with_B_or_D (Boolean)

              person_name_start_with_char CHAR(1)

              person_name_start_with VARCHAR(30)


              PostgreSQL doesn't support computed columns in base tables a la SQL Server but the new column can be maintained via trigger. Obviously, this new column would be indexed.



              Alternatively, an index on an expression would give you the same, cheaper. E.g.:



              CREATE INDEX spelers_name_initial_idx ON spelers (left(name, 1)); 


              Queries that match the expression in their conditions can utilize this index.



              This way, the performance hit is taken when the data is created or amended, so may only be appropriate for a low activity environment (i.e. much fewer writes than reads).







              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited Jan 16 '12 at 14:28









              Erwin Brandstetter

              95.1k9185299




              95.1k9185299










              answered Jan 16 '12 at 8:40









              onedaywhenonedaywhen

              2,32411221




              2,32411221























                  8














                  You could try



                  SELECT s.name
                  FROM spelers s
                  WHERE s.name SIMILAR TO '(B|D)%'
                  ORDER BY s.name


                  I've no idea whether or not either the above or your original expression are sargable in Postgres though.



                  If you create the suggested index would also be interested to hear how this compares with the other options.



                  SELECT name
                  FROM spelers
                  WHERE name >= 'B' AND name < 'C'
                  UNION ALL
                  SELECT name
                  FROM spelers
                  WHERE name >= 'D' AND name < 'E'
                  ORDER BY name





                  share|improve this answer





















                  • 1





                    It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                    – Lucas Kauffman
                    Jan 15 '12 at 11:41
















                  8














                  You could try



                  SELECT s.name
                  FROM spelers s
                  WHERE s.name SIMILAR TO '(B|D)%'
                  ORDER BY s.name


                  I've no idea whether or not either the above or your original expression are sargable in Postgres though.



                  If you create the suggested index would also be interested to hear how this compares with the other options.



                  SELECT name
                  FROM spelers
                  WHERE name >= 'B' AND name < 'C'
                  UNION ALL
                  SELECT name
                  FROM spelers
                  WHERE name >= 'D' AND name < 'E'
                  ORDER BY name





                  share|improve this answer





















                  • 1





                    It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                    – Lucas Kauffman
                    Jan 15 '12 at 11:41














                  8












                  8








                  8







                  You could try



                  SELECT s.name
                  FROM spelers s
                  WHERE s.name SIMILAR TO '(B|D)%'
                  ORDER BY s.name


                  I've no idea whether or not either the above or your original expression are sargable in Postgres though.



                  If you create the suggested index would also be interested to hear how this compares with the other options.



                  SELECT name
                  FROM spelers
                  WHERE name >= 'B' AND name < 'C'
                  UNION ALL
                  SELECT name
                  FROM spelers
                  WHERE name >= 'D' AND name < 'E'
                  ORDER BY name





                  share|improve this answer















                  You could try



                  SELECT s.name
                  FROM spelers s
                  WHERE s.name SIMILAR TO '(B|D)%'
                  ORDER BY s.name


                  I've no idea whether or not either the above or your original expression are sargable in Postgres though.



                  If you create the suggested index would also be interested to hear how this compares with the other options.



                  SELECT name
                  FROM spelers
                  WHERE name >= 'B' AND name < 'C'
                  UNION ALL
                  SELECT name
                  FROM spelers
                  WHERE name >= 'D' AND name < 'E'
                  ORDER BY name






                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited Jan 15 '12 at 12:06

























                  answered Jan 15 '12 at 11:37









                  Martin SmithMartin Smith

                  64k10173257




                  64k10173257








                  • 1





                    It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                    – Lucas Kauffman
                    Jan 15 '12 at 11:41














                  • 1





                    It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                    – Lucas Kauffman
                    Jan 15 '12 at 11:41








                  1




                  1





                  It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                  – Lucas Kauffman
                  Jan 15 '12 at 11:41





                  It worked and I got a cost of 1.19 where I had 1.25. Thanks !

                  – Lucas Kauffman
                  Jan 15 '12 at 11:41











                  2














                  What I have done in the past, faced with a similar performance issue, is to increment the ASCII character of the last letter, and do a BETWEEN. You then get the best performance, for a subset of the LIKE functionality. Of course, it only works in certain situations, but for ultra-large datasets where you're searching on a name for instance, it makes performance go from abysmal to acceptable.






                  share|improve this answer




























                    2














                    What I have done in the past, faced with a similar performance issue, is to increment the ASCII character of the last letter, and do a BETWEEN. You then get the best performance, for a subset of the LIKE functionality. Of course, it only works in certain situations, but for ultra-large datasets where you're searching on a name for instance, it makes performance go from abysmal to acceptable.






                    share|improve this answer


























                      2












                      2








                      2







                      What I have done in the past, faced with a similar performance issue, is to increment the ASCII character of the last letter, and do a BETWEEN. You then get the best performance, for a subset of the LIKE functionality. Of course, it only works in certain situations, but for ultra-large datasets where you're searching on a name for instance, it makes performance go from abysmal to acceptable.






                      share|improve this answer













                      What I have done in the past, faced with a similar performance issue, is to increment the ASCII character of the last letter, and do a BETWEEN. You then get the best performance, for a subset of the LIKE functionality. Of course, it only works in certain situations, but for ultra-large datasets where you're searching on a name for instance, it makes performance go from abysmal to acceptable.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered Jan 19 '12 at 18:23









                      Mel PaddenMel Padden

                      29715




                      29715























                          2














                          Very old question, but I found another fast solution to this problem:



                          SELECT s.name 
                          FROM spelers s
                          WHERE ascii(s.name) in (ascii('B'),ascii('D'))
                          ORDER BY 1


                          Since function ascii() looks only at first character of the string.






                          share|improve this answer



















                          • 1





                            Does this use an index on (name)?

                            – ypercubeᵀᴹ
                            Nov 25 '17 at 12:56
















                          2














                          Very old question, but I found another fast solution to this problem:



                          SELECT s.name 
                          FROM spelers s
                          WHERE ascii(s.name) in (ascii('B'),ascii('D'))
                          ORDER BY 1


                          Since function ascii() looks only at first character of the string.






                          share|improve this answer



















                          • 1





                            Does this use an index on (name)?

                            – ypercubeᵀᴹ
                            Nov 25 '17 at 12:56














                          2












                          2








                          2







                          Very old question, but I found another fast solution to this problem:



                          SELECT s.name 
                          FROM spelers s
                          WHERE ascii(s.name) in (ascii('B'),ascii('D'))
                          ORDER BY 1


                          Since function ascii() looks only at first character of the string.






                          share|improve this answer













                          Very old question, but I found another fast solution to this problem:



                          SELECT s.name 
                          FROM spelers s
                          WHERE ascii(s.name) in (ascii('B'),ascii('D'))
                          ORDER BY 1


                          Since function ascii() looks only at first character of the string.







                          share|improve this answer












                          share|improve this answer



                          share|improve this answer










                          answered Nov 25 '17 at 12:55









                          Sole021Sole021

                          211




                          211








                          • 1





                            Does this use an index on (name)?

                            – ypercubeᵀᴹ
                            Nov 25 '17 at 12:56














                          • 1





                            Does this use an index on (name)?

                            – ypercubeᵀᴹ
                            Nov 25 '17 at 12:56








                          1




                          1





                          Does this use an index on (name)?

                          – ypercubeᵀᴹ
                          Nov 25 '17 at 12:56





                          Does this use an index on (name)?

                          – ypercubeᵀᴹ
                          Nov 25 '17 at 12:56











                          2














                          For checking of initials, I often use casting to "char" (with the double quotes). It's not portable, but very fast. Internally, it simply detoasts the text and returns the first character, and "char" comparison operations are very fast because the type is 1-byte fixed length:



                          SELECT s.name 
                          FROM spelers s
                          WHERE s.name::"char" =ANY( ARRAY[ "char" 'B', 'D' ] )
                          ORDER BY 1


                          Note that casting to "char" is faster than the ascii() slution by @Sole021, but it is not UTF8 compatible (or any other encoding for that matter), returning simply the first byte, so should only be used in cases where the comparison is against plain old 7-bit ASCII characters.






                          share|improve this answer






























                            2














                            For checking of initials, I often use casting to "char" (with the double quotes). It's not portable, but very fast. Internally, it simply detoasts the text and returns the first character, and "char" comparison operations are very fast because the type is 1-byte fixed length:



                            SELECT s.name 
                            FROM spelers s
                            WHERE s.name::"char" =ANY( ARRAY[ "char" 'B', 'D' ] )
                            ORDER BY 1


                            Note that casting to "char" is faster than the ascii() slution by @Sole021, but it is not UTF8 compatible (or any other encoding for that matter), returning simply the first byte, so should only be used in cases where the comparison is against plain old 7-bit ASCII characters.






                            share|improve this answer




























                              2












                              2








                              2







                              For checking of initials, I often use casting to "char" (with the double quotes). It's not portable, but very fast. Internally, it simply detoasts the text and returns the first character, and "char" comparison operations are very fast because the type is 1-byte fixed length:



                              SELECT s.name 
                              FROM spelers s
                              WHERE s.name::"char" =ANY( ARRAY[ "char" 'B', 'D' ] )
                              ORDER BY 1


                              Note that casting to "char" is faster than the ascii() slution by @Sole021, but it is not UTF8 compatible (or any other encoding for that matter), returning simply the first byte, so should only be used in cases where the comparison is against plain old 7-bit ASCII characters.






                              share|improve this answer















                              For checking of initials, I often use casting to "char" (with the double quotes). It's not portable, but very fast. Internally, it simply detoasts the text and returns the first character, and "char" comparison operations are very fast because the type is 1-byte fixed length:



                              SELECT s.name 
                              FROM spelers s
                              WHERE s.name::"char" =ANY( ARRAY[ "char" 'B', 'D' ] )
                              ORDER BY 1


                              Note that casting to "char" is faster than the ascii() slution by @Sole021, but it is not UTF8 compatible (or any other encoding for that matter), returning simply the first byte, so should only be used in cases where the comparison is against plain old 7-bit ASCII characters.







                              share|improve this answer














                              share|improve this answer



                              share|improve this answer








                              edited Apr 10 '18 at 1:14

























                              answered Apr 10 '18 at 1:01









                              Ziggy Crueltyfree ZeitgeisterZiggy Crueltyfree Zeitgeister

                              4,2651819




                              4,2651819























                                  1














                                  There are two methods not mentioned yet for dealing with such cases:





                                  1. partial (or partitioned - if created for full range manually) index - most useful when only a subset of data is required (for example during some maintenance or temporary for some reporting):



                                    CREATE INDEX ON spelers WHERE name LIKE 'B%'


                                  2. partitioning the table itself (using the first character as partitioning key) - this technique is especially worth considering in PostgreSQL 10+ (less painful partitioning) and 11+ (partition pruning during query execution).



                                  Moreover, if the data in a table is sorted, one can benefit from using BRIN index (over the first character).






                                  share|improve this answer






























                                    1














                                    There are two methods not mentioned yet for dealing with such cases:





                                    1. partial (or partitioned - if created for full range manually) index - most useful when only a subset of data is required (for example during some maintenance or temporary for some reporting):



                                      CREATE INDEX ON spelers WHERE name LIKE 'B%'


                                    2. partitioning the table itself (using the first character as partitioning key) - this technique is especially worth considering in PostgreSQL 10+ (less painful partitioning) and 11+ (partition pruning during query execution).



                                    Moreover, if the data in a table is sorted, one can benefit from using BRIN index (over the first character).






                                    share|improve this answer




























                                      1












                                      1








                                      1







                                      There are two methods not mentioned yet for dealing with such cases:





                                      1. partial (or partitioned - if created for full range manually) index - most useful when only a subset of data is required (for example during some maintenance or temporary for some reporting):



                                        CREATE INDEX ON spelers WHERE name LIKE 'B%'


                                      2. partitioning the table itself (using the first character as partitioning key) - this technique is especially worth considering in PostgreSQL 10+ (less painful partitioning) and 11+ (partition pruning during query execution).



                                      Moreover, if the data in a table is sorted, one can benefit from using BRIN index (over the first character).






                                      share|improve this answer















                                      There are two methods not mentioned yet for dealing with such cases:





                                      1. partial (or partitioned - if created for full range manually) index - most useful when only a subset of data is required (for example during some maintenance or temporary for some reporting):



                                        CREATE INDEX ON spelers WHERE name LIKE 'B%'


                                      2. partitioning the table itself (using the first character as partitioning key) - this technique is especially worth considering in PostgreSQL 10+ (less painful partitioning) and 11+ (partition pruning during query execution).



                                      Moreover, if the data in a table is sorted, one can benefit from using BRIN index (over the first character).







                                      share|improve this answer














                                      share|improve this answer



                                      share|improve this answer








                                      edited May 18 '18 at 23:28

























                                      answered May 18 '18 at 23:16









                                      Tomasz PalaTomasz Pala

                                      514




                                      514























                                          -4














                                          Probably faster to do a single character comparison:



                                          SUBSTR(s.name,1,1)='B' OR SUBSTR(s.name,1,1)='D'





                                          share|improve this answer





















                                          • 1





                                            Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                            – ypercubeᵀᴹ
                                            Jan 13 '16 at 15:55
















                                          -4














                                          Probably faster to do a single character comparison:



                                          SUBSTR(s.name,1,1)='B' OR SUBSTR(s.name,1,1)='D'





                                          share|improve this answer





















                                          • 1





                                            Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                            – ypercubeᵀᴹ
                                            Jan 13 '16 at 15:55














                                          -4












                                          -4








                                          -4







                                          Probably faster to do a single character comparison:



                                          SUBSTR(s.name,1,1)='B' OR SUBSTR(s.name,1,1)='D'





                                          share|improve this answer















                                          Probably faster to do a single character comparison:



                                          SUBSTR(s.name,1,1)='B' OR SUBSTR(s.name,1,1)='D'






                                          share|improve this answer














                                          share|improve this answer



                                          share|improve this answer








                                          edited Jan 13 '16 at 15:54









                                          ypercubeᵀᴹ

                                          77.8k11135218




                                          77.8k11135218










                                          answered Jan 13 '16 at 15:13









                                          user2653985user2653985

                                          1




                                          1








                                          • 1





                                            Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                            – ypercubeᵀᴹ
                                            Jan 13 '16 at 15:55














                                          • 1





                                            Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                            – ypercubeᵀᴹ
                                            Jan 13 '16 at 15:55








                                          1




                                          1





                                          Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                          – ypercubeᵀᴹ
                                          Jan 13 '16 at 15:55





                                          Not really. column LIKE 'B%' will be more efficient than using substring function on the column.

                                          – ypercubeᵀᴹ
                                          Jan 13 '16 at 15:55


















                                          draft saved

                                          draft discarded




















































                                          Thanks for contributing an answer to Database Administrators Stack Exchange!


                                          • Please be sure to answer the question. Provide details and share your research!

                                          But avoid



                                          • Asking for help, clarification, or responding to other answers.

                                          • Making statements based on opinion; back them up with references or personal experience.


                                          To learn more, see our tips on writing great answers.




                                          draft saved


                                          draft discarded














                                          StackExchange.ready(
                                          function () {
                                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f10694%2fpattern-matching-with-like-similar-to-or-regular-expressions-in-postgresql%23new-answer', 'question_page');
                                          }
                                          );

                                          Post as a guest















                                          Required, but never shown





















































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown

































                                          Required, but never shown














                                          Required, but never shown












                                          Required, but never shown







                                          Required, but never shown







                                          Popular posts from this blog

                                          Parapolítica Índice Antecedentes El escándalo Proceso judicial Consecuencias Véase...

                                          How to remove border from elements in the last row?Targeting flex items on the last rowHow to vertically wrap...

                                          Tecnologías entrañables Índice Antecedentes Desarrollo Tecnologías Entrañables en la...