一、Cgroup v2架构
1.1 基本概念
bash
# 查看系统是否启用cgroup v2
$ mount | grep cgroup2
cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime)
# 检查内核配置
$ grep CGROUP /boot/config-$(uname -r)
CONFIG_CGROUPS=y
CONFIG_CGROUP_SCHED=y
CONFIG_CGROUP_CPUACCT=y
CONFIG_CGROUP_BPF=y
1.2 目录结构
plaintext
cgroup v2目录层级:
/sys/fs/cgroup/
├── cpu.max
├── cpu.pressure
├── cpu.stat
├── io.max
├── io.pressure
├── io.stat
├── memory.current
├── memory.max
├── memory.pressure
└── system.slice/
├── ssh.service
├── docker.service
└── application.service
二、CPU资源管理
2.1 CPU配置参数
python
classCPUController:
def __init__(self, cgroup_path):
self.path = cgroup_path
def set_cpu_weight(self, weight):
"""设置CPU权重"""
with open(f"{self.path}/cpu.weight","w")as f:
f.write(str(weight))
def set_cpu_max(self, max_quota, period):
"""设置CPU使用上限"""
with open(f"{self.path}/cpu.max","w")as f:
f.write(f"{max_quota} {period}")
def get_cpu_stat(self):
"""获取CPU使用统计"""
with open(f"{self.path}/cpu.stat","r")as f:
stats ={}
for line in f:
key, value = line.strip().split()
stats[key]=int(value)
return stats
2.2 CPU限制示例
bash
# 创建cgroup
$ mkdir -p /sys/fs/cgroup/myapp
# 设置CPU限制(50%)
$ echo "50000 100000">/sys/fs/cgroup/myapp/cpu.max
# 设置CPU权重
$ echo "100">/sys/fs/cgroup/myapp/cpu.weight
# 添加进程到cgroup
$ echo $PID >/sys/fs/cgroup/myapp/cgroup.procs
三、内存资源管理
3.1 内存限制配置
python
classMemoryController:
def __init__(self, cgroup_path):
self.path = cgroup_path
def set_memory_limit(self, limit_bytes):
"""设置内存限制"""
with open(f"{self.path}/memory.max","w")as f:
f.write(str(limit_bytes))
def set_memory_high(self, high_bytes):
"""设置内存软限制"""
with open(f"{self.path}/memory.high","w")as f:
f.write(str(high_bytes))
def handle_oom(self):
"""OOM处理策略"""
with open(f"{self.path}/memory.oom.group","w")as f:
f.write("1")
3.2 内存监控
bash
# 查看内存使用情况
$ cat /sys/fs/cgroup/myapp/memory.current
$ cat /sys/fs/cgroup/myapp/memory.stat
# 监控内存压力
$ cat /sys/fs/cgroup/myapp/memory.pressure
# 查看OOM事件
$ cat /sys/fs/cgroup/myapp/memory.events
四、IO资源管理
4.1 IO限制配置
python
classIOController:
def configure_io(self, device, rbps, wbps):
"""配置IO限制"""
rules = f"{device} rbps={rbps} wbps={wbps}"
with open(f"{self.path}/io.max","w")as f:
f.write(rules)
def set_io_weight(self, weight):
"""设置IO权重"""
with open(f"{self.path}/io.weight","w")as f:
f.write(str(weight))
def monitor_io(self):
"""监控IO使用"""
with open(f"{self.path}/io.stat","r")as f:
stats = f.read()
returnself.parse_io_stats(stats)
4.2 IO控制示例
bash
# 设置读写速度限制(设备号252:1)
$ echo "252:1 rbps=2097152 wbps=2097152">/sys/fs/cgroup/myapp/io.max
# 设置IO权重
$ echo "100">/sys/fs/cgroup/myapp/io.weight
# 监控IO使用
$ cat /sys/fs/cgroup/myapp/io.stat
五、实际应用场景
5.1 Web服务限制
plaintext
Web服务资源配置:
资源类型限制值说明
CPU 200%最多使用2个核心
内存2GB内存硬限制
内存软限制1.5GB触发回收阈值
IO读取50MB/s 磁盘读取速度
IO写入30MB/s 磁盘写入速度
5.2 数据库资源隔离
bash
# 创建数据库cgroup
$ mkdir -p /sys/fs/cgroup/mysql
# CPU限制:最多使用8核
$ echo "800000 100000">/sys/fs/cgroup/mysql/cpu.max
# 内存限制:16GB
$ echo "17179869184">/sys/fs/cgroup/mysql/memory.max
# IO限制:读200MB/s,写100MB/s
$ echo "252:1 rbps=209715200 wbps=104857600">/sys/fs/cgroup/mysql/io.max
六、监控与调优
6.1 性能监控
python
classCgroupMonitor:
def __init__(self):
self.metrics ={
'cpu':self.monitor_cpu,
'memory':self.monitor_memory,
'io':self.monitor_io
}
def collect_metrics(self):
"""收集所有指标"""
results ={}
for name, monitor inself.metrics.items():
try:
results[name]= monitor()
exceptExceptionas e:
logger.error(f"Failed to collect {name} metrics: {e}")
return results
6.2 告警配置
yaml
# Prometheus告警规则
groups:
- name: cgroup_alerts
rules:
- alert:HighCPUUsage
expr: cgroup_cpu_usage_seconds >0.9
for:5m
labels:
severity: warning
- alert:MemoryNearLimit
expr: cgroup_memory_usage_bytes / cgroup_memory_max_bytes >0.9
for:5m
labels:
severity: warning
七、常见问题解决
7.1 排查清单
CPU相关
检查cpu.max设置
查看cpu.pressure
分析cpu.stat
内存相关
检查memory.current
分析OOM事件
查看内存压力
IO相关
检查io.max设置
分析io.pressure
监控IO延迟
7.2 最佳实践
plaintext
优化建议:
1. CPU配置
-根据应用特性设置权重
-预留足够的CPU余量
-避免过度限制
2.内存配置
-设置合理的软限制
-配置OOM策略
-监控内存压力
3. IO配置
-按设备分别限制
-配置适当的权重
-监控IO延迟