我有以下查询,我想从数据库中获取 100 个项目,但 host_id
多次出现在 urls
表中,我想每个 host_id
从该表中最多获取 10 个唯一行。
select *
from urls
join hosts using(host_id)
Ở đâu
(
last_run_date is null
or last_run_date <= date_sub(curdate(), interval 30 day)
)
and ignore_url != 1
limit 100
所以,我想:
- 最大结果 = 100
- 每个主机的最大行数 = 10
我不确定我需要做什么才能完成这项任务。有没有办法在没有子查询的情况下做到这一点?
主机表
CREATE TABLE `hosts` (
`host_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`host` VARCHAR(50) NOT NULL,
`last_fetched` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`ignore_host` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`host_id`),
UNIQUE INDEX `host` (`host`)
)
网址表
CREATE TABLE `urls` (
`url_id` INT(10) UNSIGNED NOT NULL AUTO_INCREMENT,
`parent_url_id` INT(10) UNSIGNED NOT NULL,
`scheme` VARCHAR(5) NOT NULL,
`host_id` INT(10) UNSIGNED NOT NULL,
`path` VARCHAR(500) NOT NULL,
`query` VARCHAR(500) NOT NULL,
`date_found` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
`last_run_date` DATETIME NULL DEFAULT NULL,
`ignore_url` TINYINT(1) UNSIGNED NOT NULL,
PRIMARY KEY (`url_id`),
UNIQUE INDEX `host_path_query` (`host_id`, `path`, `query`)
)
就是这样(我希望)
我无法测试我的真实情况。我没有数据。请测试一下并给我一点 ping。
LỰA CHỌN *
TỪ (
LỰA CHỌN
@nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
u.*,
@lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost
TỪ
urls u,
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) NHƯ t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
好的,
第一:
我只选择包含您的查询的行,然后对其进行排序通过host_id和时间
LỰA CHỌN
u.*
TỪ
urls u
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
thứ hai
我添加变量nrVàlasthost并在选择中设置它。现在我对每一行进行计数,如果 host_id 发生更改,则将其重置为 1。所以我得到一个每个 host_id 从 1 到 n 的行列表
选择 @nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr, 你.*, @lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost
从 网址你, (选择@nr:= 4,@lasthost:= -1)作为tmp 在哪里 ( 最后运行日期为空 或者last_run_date <= date_sub(curdate(), INTERVAL 30 DAY) ) 并且ignore_url!= 1 按主机 ID、上次运行日期排序
第三
我把这个查询放在一个新的选择中,这样我就可以加入你的第二个表,并将结果限制为仅少于 11 行,并将结果限制为 100
LỰA CHỌN *
TỪ (
LỰA CHỌN
@nr:=IF(@lasthost = host_id, @nr+1, 1) AS nr,
u.*,
@lasthost:=IF(@lasthost = host_id, @lasthost, host_id) AS lasthost
TỪ
urls u,
( SELECT @nr:=4, @lasthost:=-1 ) AS tmp
WHERE (
last_run_date IS NULL
OR last_run_date <= date_sub(curdate(), INTERVAL 30 DAY)
)
AND ignore_url != 1
ORDER BY host_id, last_run_date
) NHƯ t
LEFT JOIN HOSTS USING(host_id)
WHERE t.nr < 11
LIMIT 100;
仅此而已
Tôi là một lập trình viên xuất sắc, rất giỏi!