postgresql - Is there a logically equivalent and efficient version of this query without using a CTE? -


i have query on postgresql 9.2 system takes 20s in it's normal form takes ~120ms when using cte.

i simplified both queries brevity.

here normal form (takes 20s):

select * tablea (columna = 1 or columnb = 2) ,     atype = 35 ,     aid in (1, 2, 3) order modified_at desc limit 25; 

here explain query: http://explain.depesz.com/s/2v8

the cte form (about 120ms):

with raw (     select *     tablea     (columna = 1 or columnb = 2) ,         atype = 35 ,         aid in (1, 2, 3) ) select * raw order modified_at desc limit 25; 

here explain cte: http://explain.depesz.com/s/uxy

simply moving order by outer part of query reduces cost 99%.

i have 2 questions: 1) there way construct first query without using cte in such way logically equivalent more performant , 2) difference in performance how planner determining how fetch data?

regarding questions above, there additional statistics or other planner hints improve performance of first query?


edit: taking away limit causes query use heap scan opposed index scan backwards. without limit query completes in 40ms.

after seeing effect of limit tried limit 1, limit 2, etc. query performs in under 100ms when using limit 1 , 10s+ limit > 1.

after thinking more, question 2 boils down why planner use index scan backwards in 1 case , bitmap heap scan + sort in logically equivalent case? , how can "help" planner use efficient plan in both cases?


update: accepted craig's answer because comprehensive , helpful. way ended solving problem using query practically equivalent though not logically equivalent. @ root of issue index scan backwards of index on modified_at. in order inform planner not idea add predicate of form where modified_at >= now() - interval '1 year'. included enough data application prevented planner going down backwards index scan path.

this lower impact solution prevented need rewrite queries using either sub query or cte. ymmv.

here's why happening, following explanation current until @ least 9.3 (if you're reading , on newer version, check make sure hasn't changed):

postgresql doesn't optimize across cte boundaries. each cte clause run in isolation , results consumed other parts of query. query like:

with blah (     select * some_table ) select * blah id = 4; 

will cause full inner query executed. postgresql won't "push down" id = 4 qualification inner query. ctes "optimization fences" in regard, can both or bad; lets override planner when want to, prevents using ctes simple syntactic cleanup nested from subquery chain if need push-down.

if rephrase above as:

select * (select * some_table) blah id = 4; 

using sub-query in from instead of cte, pg push qual down subquery , it'll run nice , quickly.

as have discovered, can work benefit when query planner makes poor decision. appears in case backward index scan of table immensely more expensive bitmap or index scan of 2 smaller indexes followed filter , sort, planner doesn't think plans query scan index.

when use cte, can't push order by inner query, you're overriding plan , forcing use thinks inferior execution plan - 1 turns out better.

there's nasty workaround can used these situations called offset 0 hack, should use if can't figure out way make planner right thing - , if have use it, please boil down self-contained test case , report postgresql mailing list possible query planner bug.

instead, recommend first looking @ why planner making wrong decision.

the first candidate stats / estimates problems, , sure enough when @ problematic query plan there's factor of 3500 mis-estimation of expected result rows. that's big, not impossibly big, though it's more interesting 1 row planner expecting non-trivial row set. doesn't much, though; if row count lower expected means choosing use index better choice expected.

the main issue looks it's not using smaller, more selective indexes sierra_kilo , papa_lima because sees order by , thinks it'll save more time doing backward index scan , avoiding sort does. makes sense given there's 1 matching row sort! if got expected 3500 rows might've made more sense avoid sort, though that's still small rowset sort in memory.

do set parameters enable_seqscan, etc? if do, unset them; they're testing only , totally inappropriate production use. if aren't using enable_ params think it's worth raising on postgresql mailing list pgsql-perform. anonymized plans make bit difficult, though, since there's no gurantee identifiers 1 plan refer same objects in other plan, , don't match wrote in query on question. you'll want produce hand-done version matches before asking on mailing list.

there's chance you'll need provide real values help. if don't want on public mailing list, there's option available. (i should note work 1 of them, per profile).


Comments

Popular posts from this blog

javascript - Count length of each class -

What design pattern is this code in Javascript? -

hadoop - Restrict secondarynamenode to be installed and run on any other node in the cluster -