需求描述
统计硅谷影音视频网站的常规指标,各种TopN 指标:
- 统计视频观看数Top10
- 统计视频类别热度Top10
- 统计视频观看数Top20 所属类别以及类别包含的Top20 的视频个数
- 统计视频观看数Top50 所关联视频的所属类别Rank
- 统计每个类别中的视频热度Top10
- 统计每个类别中视频流量Top10
- 统计上传视频最多的用户Top10 以及他们上传的观看次数在前20 视频
- 统计每个类别视频观看数Top10
数据结构
视频表
字段 | 备注 | 详细描述 |
---|---|---|
video id | 视频唯一id | 11 位字符串 |
uploader | 视频上传者 | 上传视频的用户名String |
age | 视频年龄 | 视频在平台上的整数天 |
category | 视频类别 | 上传视频指定的视频分类 |
length | 视频长度 | 整形数字标识的视频长度 |
views | 观看次数 | 视频被浏览的次数 |
rate | 视频评分 | 满分5 分 |
ratings | 流量 | 视频的流量,整型数字 |
conments | 评论数 | 一个视频的整数评论数 |
related ids | 相关视频id | 相关视频的id,最多20 个 |
用户表
字段 | 备注 | 字段类型 |
---|---|---|
uploader | 上传者用户名 | string |
videos | 上传视频数 | int |
friends | 朋友数量 | int |
答案
- 统计视频观看数Top10
select videoId, views
from gulivideo_orc
order by views desc
limit 10;
dMH0bHeiRNg 42513417
0XxI-hvPRRA 20282464
1dmVU08zVpA 16087899
RB-wUgnyGv0 15712924
QjA5faZF1A8 15256922
-_CSo1gOd48 13199833
49IDp76kjPw 11970018
tYnn51C3X_w 11823701
pv5zWaTEVkI 11672017
D2kJZOfq7zk 11184051
- 统计视频类别热度Top10
select category_name, count(*) category_count
from
(select videoId, category_name
from gulivideo_orc lateral view explode(category) tmp_category as category_name) t1
group by category_name
order by category_count desc
limit 10;
- 统计视频观看数Top20 所属类别以及类别包含的Top20 的视频个数
select category_name, count(*) category_count
from
(select videoId, category_name
from
(select videoId, views, category
from gulivideo_orc
order by views desc
limit 20) t1 lateral view explode(category) tmp_category as category_name) t2
group by category_name
order by category_count desc;
- 统计视频观看数Top50 所关联视频的所属类别Rank
select category_name, count(*) category_count
from
(select explode(category) category_name
from
(select category
from
(select related_id
from
(select relatedId, views
from gulivideo_orc
order by views desc
limit 50) t1
lateral view explode(relatedId) tmp_related as related_id
group by related_id) t2
join gulivideo_orc orc on t2.related_id=orc.videoId) t3 ) t4
group by category_name
order by category_count desc;
- 统计每个类别中的视频热度Top10
select video.videoId, video.views
from
(select uploader, videos
from gulivideo_user_orc
order by videos desc
limit 10) t1
join gulivideo_orc video
on t1.uploader=video.uploader
order by views desc
limit 20;
统计每个类别中视频流量Top10
统计上传视频最多的用户Top10 以及他们上传的观看次数在前20 视频
统计每个类别视频观看数Top10
1.给每一种类别根据视频观看数添加rank值(倒序)
select categoryId, videoId, views,
rank() over(partition by categoryId order by views desc) rk
from gulivideo_category;
2.过滤前十
select categoryId, videoId, views
from
(select categoryId, videoId, views,
rank() over(partition by categoryId order by views desc) rk
from gulivideo_category) t1
where rk<=10;