Хабр Курсы для всех
РЕКЛАМА
Практикум, Хекслет, SkyPro, авторские курсы — собрали всех и попросили скидки. Осталось выбрать!
Nested Loop (cost=2.42..95.38 rows=11 width=5) (actual time=4.633..11.737 rows=5 loops=1) Buffers: shared hit=26 read=23 CTE r -> Recursive Union (cost=0.29..2.42 rows=11 width=48) (actual time=4.184..10.575 rows=5 loops=1) Buffers: shared hit=10 read=19 CTE b -> Result (cost=0.28..0.29 rows=1 width=0) (actual time=2.340..2.340 rows=1 loops=1) Buffers: shared hit=2 read=9 InitPlan 1 (returns $1) -> Limit (cost=0.19..0.24 rows=1 width=4) (actual time=1.121..1.122 rows=1 loops=1) Buffers: shared hit=2 read=4 -> Index Scan Backward using ttbl_pkey on ttbl t (cost=0.00..47048.93 rows=1000990 width=4) (actual time=0.830..1.116 rows=5 loops=1) Filter: (is_active AND (id <> ALL ('{1,3,10,89,99,22,24,25,28,30}'::integer[]))) Buffers: shared hit=2 read=4 InitPlan 2 (returns $2) -> Limit (cost=0.00..0.05 rows=1 width=4) (actual time=1.207..1.207 rows=1 loops=1) Buffers: shared read=5 -> Index Scan using ttbl_pkey on ttbl t (cost=0.00..49551.43 rows=1000990 width=4) (actual time=1.205..1.205 rows=1 loops=1) Index Cond: (id IS NOT NULL) Filter: (is_active AND (id <> ALL ('{1,3,10,89,99,22,24,25,28,30}'::integer[]))) Buffers: shared read=5 -> Subquery Scan on "*SELECT* 1" (cost=0.00..0.18 rows=1 width=12) (actual time=4.179..4.181 rows=1 loops=1) Buffers: shared hit=3 read=15 -> Limit (cost=0.00..0.17 rows=1 width=12) (actual time=4.178..4.179 rows=1 loops=1) Buffers: shared hit=3 read=15 -> Nested Loop (cost=0.00..55313.90 rows=333663 width=12) (actual time=4.175..4.175 rows=1 loops=1) Join Filter: (t.id >= (b.min + ((((b.max - b.min))::double precision * random()))::integer)) Buffers: shared hit=3 read=15 -> CTE Scan on b (cost=0.00..0.02 rows=1 width=8) (actual time=2.342..2.342 rows=1 loops=1) Buffers: shared hit=2 read=9 -> Seq Scan on ttbl t (cost=0.00..26952.50 rows=1000990 width=4) (actual time=0.015..0.907 rows=1368 loops=1) Filter: (is_active AND (id <> ALL ('{1,3,10,89,99,22,24,25,28,30}'::integer[]))) Buffers: shared hit=1 read=6 -> Limit (cost=0.00..0.17 rows=1 width=48) (actual time=1.273..1.274 rows=1 loops=5) Buffers: shared hit=7 read=4 -> Nested Loop (cost=0.00..173819.35 rows=1000980 width=48) (actual time=1.272..1.272 rows=1 loops=5) Join Filter: ((t.id <> ALL (r.a)) AND (t.id > (r.min + ((((r.max - r.min))::double precision * random()))::integer))) Buffers: shared hit=7 read=4 -> WorkTable Scan on r (cost=0.00..0.25 rows=3 width=44) (actual time=0.005..0.005 rows=1 loops=5) Filter: ((n + 1) < 5) -> Materialize (cost=0.00..35868.45 rows=1000990 width=4) (actual time=0.007..0.740 rows=1012 loops=4) Buffers: shared hit=7 read=4 -> Seq Scan on ttbl t (cost=0.00..26952.50 rows=1000990 width=4) (actual time=0.019..1.442 rows=2335 loops=1) Filter: (is_active AND (id <> ALL ('{1,3,10,89,99,22,24,25,28,30}'::integer[]))) Buffers: shared hit=7 read=4 -> CTE Scan on r (cost=0.00..0.22 rows=11 width=4) (actual time=4.189..10.591 rows=5 loops=1) Buffers: shared hit=10 read=19 -> Index Scan using ttbl_pkey on ttbl t (cost=0.00..8.42 rows=1 width=5) (actual time=0.221..0.223 rows=1 loops=5) Index Cond: (id = r.id) Buffers: shared hit=16 read=4 Total runtime: 12.185 ms
SELECT FirstName, LastName
FROM Person
TABLESAMPLE (100 ROWS) ;
TABLESAMPLE (10 PERCENT)
SELECT *
FROM ttbl
WHERE id not in(1, 3, 10, 89, 99, 22, 24, 25, 28, 30)
OFFSET (random()* 1000)::int
LIMIT 1@GrabConfig(systemClassLoader=true)
@Grab('postgresql:postgresql:9.0-801.jdbc4')
import groovy.sql.Sql
def sql = Sql.newInstance('jdbc:postgresql://localhost/test', 'postgres', '123', 'org.postgresql.Driver')
def random = new Random()
def t0 = System.currentTimeMillis()
def v = []
def (min, max) = sql.firstRow("select min(id), max(id) from ttbl")
def nums = (0..100).collect { (min + random.nextInt(max - min + 1)) as String }
def strNums = nums.join(',')
sql.eachRow("select id from ttbl where id in ($strNums)".toString()) {
v += it.id
}
println v
def t1 = System.currentTimeMillis()
println "Time: ${t1 - t0}"
Вот эта версия работает на совершенно любом распределении и работает корректно.
Но на сильно разреженных таблицах конечно не быстрая:
тело запросаWITH RECURSIVE range AS ( SELECT min(id) AS min, max(id) AS max FROM table1 ), r AS ( SELECT array[]::integer[] AS res UNION ALL SELECT CASE WHEN (id IS NULL) THEN res ELSE res||id END FROM (SELECT (min + (max - min) * random())::int AS rnd,res FROM r JOIN range ON true) AS _r LEFT JOIN table1 ON id=rnd AND id <> all(res) WHERE id IS NULL OR coalesce(array_length(res,1),0)<=10 ) SELECT unnest(res) FROM ( SELECT res FROM r ORDER BY array_length(res,1) DESC NULLS LAST LIMIT 1 ) AS t;
WITH RECURSIVE
minID AS (SELECT min(id) minId FROM table1),
maxID AS (SELECT max(id) maxId FROM table1),
r as (
select first 1 id, minId, maxId, cast('~'||id||'~' as varchar(100)) a, 0 n from table1, minID, maxId
where id > minId + (maxId - minId) * rand()
union all
select first 1 m.id, minId, maxId, a||m.id||'~', n + 1 from table1 m, r
where m.id > minId + (maxId - minId) * rand() and r.a not containing '~'||m.id||'~' and r.n + 1 < 3
)
SELECT t.id FROM table1 t, r WHERE r.id = t.id;
// Сгенерировать $number случайных чисел от 1 до $count (количеств записей в таблице)
$random = array();
while (count($random) < $number ) {
$r = rand(1, $count);
if(array_search($r, $random) === false) $random[] = $r;
}
$random = implode(',', $random);
SET @r=0;
SELECT *, @r:=@r+1 AS r FROM table1 HAVING r IN ($random)
WHERE id=(SELECT (min+range*random())::int) AND NOT id=ANY(res))WHERE id IN (SELECT (min+range*random())::int FROM generate_series(1,1000)) AND NOT id=ANY(res))WITH RECURSIVE
r AS (
SELECT array[]::integer[] AS res,min(id) AS min, max(id)-min(id) AS range FROM table1
UNION ALL
SELECT res||ARRAY(SELECT id FROM table1 WHERE id IN (SELECT (min+range*random())::int FROM generate_series(1,1000)) AND NOT id=ANY(res)), min, range
FROM r
WHERE
coalesce(array_length(res,1),0)<1000
)
SELECT unnest(res) FROM (
SELECT res FROM r ORDER BY array_length(res,1) DESC NULLS LAST LIMIT 1
) AS t LIMIT 1000;
SELECT res||ARRAY(SELECT id FROM table1 WHERE id IN (SELECT (min+range*random())::int FROM generate_series(1,1000)) AND NOT id=ANY(res)), min, range
FROM r
WHERE coalesce(array_length(res,1),0)<1000
… версия которая на коротких списках работает приблизительно так же как и предыдущая а вот на списках в 1000 случайных работает в 3 раза быстрее и главное требует на 3 порядка меньше памяти для работы (предыдущая версия требовала на 1000 случайных значений при 0.01 заполнении больше 200MB текущая в 1MB влезает и работает за 150ms)
Postgres. Выборка N случайных записей