用户行为分析 (SQL)
表A:用户浏览视频日志 user_behavior: date, user_id, video_id, start_time, end_time 表B:视频信息 video_info: video_id, video_duration 表C:用户信息 user_info: user_id, gender
问题: (1)某一天(如 20200310),观看不同视频个数的前 5 名 user_id ; (2)观看超过 50 个不同视频的女性用户中,完整观看率最高的 10 个 user_id。
五个知识点:group by; 排序; join 的用法 ;条件筛选 ;业务理解:如何定义「完整观看率」
答案:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
select
user_id,
count(*) as video_cnt
from
(
select
user_id,
video_id
from
user_behavior
where
date = '20200310'
group by
user_id,
video_id
) a
group by
user_id
order by
video_cnt desc
limit 5;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
select
user_id,
count(*) as video_cnt,
sum(is_full) * 1.0/count(*) as full_ratio
from
(
select
a.user_id,
a.video_id,
case
when a.max_time >= c.duration then 1
else 0
end as is_full
from
(
select
user_id,
video_id,
max(end_time -start_time) as max_time
from
user_behavior
where
date = '20200310'
group by
user_id,
video_id
) a
inner join (
select
user_id,
gender
from
user_info
where
gender = 'female'
) b on a.user_id = b.user_id
left outer join video_info c on a.video_id = c.video_id
) a
group by
user_id
having
video_cnt >= 50
order by
full_ratio desc
limit
10
统计数据表中每个小时记录数
例如:数据表
1
2
3
4
5
6
7
id p_date
1 2017/08/29 00:10:10
2 2017/08/29 01:10:10
3 2017/08/29 01:10:10
4 2017/08/29 02:10:10
5 2017/08/29 01:10:10
……
答案:
1
2
3
4
5
6
7
SELECT
DATE_FORMAT(p_date, "%Y-%m-%d %H" ),
count(*)
FROM
video.cdn_quality_log
group by
DATE_FORMAT(p_date, "%Y-%m-%d %H" ) ;
如何建立联合索引
1
2
SELECT * FROM `table` WHERE a IN (1,2,3) and b > 1;
SELECT * FROM `table` WHERE a > 1 ORDER BY b;