Browse Source

Initial commit: anomaly detection module

Xmia 1 tháng trước cách đây
commit
9eed84e8f6

+ 444 - 0
README.md

@@ -0,0 +1,444 @@
+# 风机异常检测项目
+
+基于机型的风机运行异常检测系统,支持数据打标、模型训练、推理检测、结果入库全流程。
+
+---
+
+## 目录结构
+
+```
+anomaly_detection/
+├── config.py             # 路径 & 超参配置(所有可调参数集中在此)
+├── data_loader.py        # parquet 读取,测点缺失容错
+├── labeler.py            # 状态打标模块(运行/限功率/停机/传感器异常)
+├── train.py              # 训练入口(按机型)
+├── detect.py             # 推理入口(逐点结果写数据库,9个检测器并行)
+├── saved_models/         # 训练后模型自动保存至此(按机型子目录)
+│   └── {model_name}/
+│       ├── wind_power_curve.pkl
+│       ├── wind_power_scatter.pkl
+│       ├── yaw_static.pkl
+│       ├── yaw_twist.pkl
+│       ├── pitch_regulation.pkl
+│       ├── pitch_coord.pkl
+│       ├── pitch_min.pkl
+│       ├── ctrl_power_quality.pkl
+│       ├── ctrl_op_state.pkl
+│       └── model_stats.pkl       # 训练时保存的统计值,推理时直接加载
+└── models/
+    ├── wind_power.py     # 风速-功率异常(PowerCurve: 分段z-score逐点 / Scatter: IsolationForest)
+    ├── yaw.py            # 偏航系统异常(IsolationForest,含短/长窗口滚动特征)
+    ├── pitch.py          # 变桨系统异常(LOF自适应 / IsolationForest)
+    └── control_params.py # 风机运行状态综合异常(IsolationForest)
+```
+
+---
+
+## 使用流程
+
+### Step 1. 安装依赖
+
+```bash
+pip install pyarrow scikit-learn joblib pandas numpy
+```
+
+### Step 2. 配置路径和参数
+
+编辑 `config.py`,至少修改以下三项(标有 TODO):
+
+```python
+# parquet 数据根目录,目录结构: {PARQUET_ROOT}/{model_name}/{farm_name}/{turbine_name}.parquet
+PARQUET_ROOT   = Path("/your/data/path")
+
+# 模型保存目录,训练后按机型自动创建子目录
+MODEL_SAVE_DIR = Path("/your/model/path")
+
+# 结果数据库,默认 SQLite 文件路径,生产环境可替换为数据库连接字符串
+DB_PATH        = "/your/result.db"
+```
+
+parquet 目录结构要求:
+```
+PARQUET_ROOT/
+└── EN156-3300/          ← 机型名称(model_name)
+    └── 某风场/           ← 风场名称(farm_name)
+        ├── 001.parquet  ← 风机名称(turbine_name = 001)
+        └── 002.parquet
+```
+
+parquet 文件需包含 `data_time` 列(时间戳),检测时按此列过滤目标日期数据。
+
+其他可选调整(均有默认值,不改也能运行):
+- `ISO_CONTAMINATION`:预期异常比例,默认 0.01(1%),数据质量差时可适当调大
+- `WIND_VALID_MIN/MAX`:有效发电风速区间,默认 3~25 m/s
+- `STATUS_MODEL_SPECIAL_RULES`:按机型添加特殊打标规则,如某机型切入风速不同
+
+### Step 3. 训练模型
+
+在 `anomaly_detection/` 目录下执行:
+
+```bash
+# 查看所有可用机型
+python train.py --list
+
+# 训练所有机型
+python train.py
+
+# 只训练指定机型
+python train.py --model EN156-3300
+```
+
+训练过程输出示例:
+```
+==================================================
+开始训练机型: EN156-3300
+  [自适应统计] 额定: 3300.0kW | 基准桨距: 1.23°
+  [训练数据] wind_power_curve: 运行85000 + 传感器异常可用1200 = 86200 行
+  [风速功率] 功率曲线模型已保存
+  ...
+机型 EN156-3300 训练完成,模型保存至: /your/model/path/EN156-3300
+```
+
+> 测点缺失的检测器会自动跳过,不影响其他检测器训练。
+
+### Step 4. 运行检测
+
+检测默认读取**前1天**的数据(按 `data_time` 列过滤):
+
+```bash
+# 检测所有机型(昨天数据)
+python detect.py
+
+# 只检测指定机型
+python detect.py --model EN156-3300
+
+# 指定检测日期
+python detect.py --date 2026-02-24
+```
+
+检测过程输出示例:
+```
+结果数据库: /your/result.db
+检测日期:   2026-02-24
+
+机型 EN156-3300: 共 50 台风机,检测日期 2026-02-24
+  检测: 某风场 / 001
+  某风场/001: 总144 | 运行108 | 传感器异常6 | 停机/限功率30
+    [wind_power_curve] 检测 110 点,异常 2 点
+    [wind_power_scatter] 检测 110 点,异常 5 点
+    [yaw_static] 检测 108 点,异常 3 点
+    ...
+```
+
+### Step 5. 查看结果
+
+结果写入 `DB_PATH` 指定的 SQLite 数据库,表名 `anomaly_points`,每行对应一个时间戳×检测器:
+
+```python
+import sqlite3
+import pandas as pd
+
+conn = sqlite3.connect("/your/result.db")
+
+# 查看某台风机某天的所有异常点
+df = pd.read_sql("""
+    SELECT data_time, detector, result_type, is_anomaly, anomaly_score, anomaly_label
+    FROM anomaly_points
+    WHERE turbine_name = '001'
+      AND data_time LIKE '2026-02-24%'
+      AND is_anomaly = 1
+    ORDER BY data_time
+""", conn)
+
+# 统计某天各风机异常点数(按检测器)
+df_summary = pd.read_sql("""
+    SELECT model_name, farm_name, turbine_name, detector,
+           COUNT(*) AS total_points,
+           SUM(is_anomaly) AS anomaly_count,
+           ROUND(1.0 * SUM(is_anomaly) / COUNT(*), 4) AS anomaly_ratio,
+           MIN(anomaly_score) AS min_score
+    FROM anomaly_points
+    WHERE data_time LIKE '2026-02-24%'
+      AND result_type = 'anomaly_detection'
+    GROUP BY model_name, farm_name, turbine_name, detector
+    ORDER BY anomaly_ratio DESC
+""", conn)
+
+# 查看传感器异常点
+df_sensor = pd.read_sql("""
+    SELECT data_time, farm_name, turbine_name, detector, anomaly_label
+    FROM anomaly_points
+    WHERE result_type = 'sensor_anomaly'
+      AND data_time LIKE '2026-02-24%'
+    ORDER BY turbine_name, data_time
+""", conn)
+
+conn.close()
+```
+
+### Step 6. 重新训练(模型调优)
+
+```python
+# config.py
+
+# 调整异常比例(数据中实际异常点多时调大)
+ISO_CONTAMINATION = 0.02
+
+# 调整有效风速区间(切入风速不同的机型)
+WIND_VALID_MIN = 2.5
+
+# 调整打标阈值
+STATUS_CURTAIL_PITCH_OFFSET = 5.0  # 限功率判定更严格
+```
+
+重新训练会覆盖已有模型文件,无需手动删除。
+
+---
+
+## 数据打标逻辑
+
+打标由 `labeler.py` 实现,状态优先级从高到低:
+
+| 优先级 | 状态 | 判断条件 |
+|--------|------|----------|
+| 1(最高)| 传感器异常-xxx异常 | 任意传感器异常列为 True |
+| 2 | 停机 | 功率值正常且 ≤ max(10kW, 额定×0.5%) |
+| 3 | 限功率 | 功率在 [2%, 95%] 额定区间 且 桨距角 > 基准+3° |
+| 4(默认)| 运行 | 其余所有点 |
+
+传感器异常类型:
+
+| 异常列 | 含义 |
+|--------|------|
+| d_val_power | 功率值越界(> 额定×1.25 或 < 额定×-0.1) |
+| d_val_wind | 风速值越界(> 75 或 < -2 m/s) |
+| d_val_pitch | 变桨值越界(> 105° 或 < -10°) |
+| d_val_spd | 转速值越界(> 历史最大×1.25) |
+| d_val_torque | 扭矩值越界(> 历史最大×1.25 或 < -2000) |
+| d_logic_wind_pwr | 风速-功率逻辑悖论(风速<0.1 但功率>100kW) |
+| d_logic_torque_pwr | 转速-扭矩逻辑悖论(有功率但扭矩接近0) |
+
+所有阈值均在 `config.py` 中可配置,支持按机型添加特殊规则(`STATUS_MODEL_SPECIAL_RULES`)。
+
+---
+
+## model_stats.pkl 存储与调用逻辑
+
+`model_stats.pkl` 保存机型级自适应统计阈值,训练时生成,推理时直接加载,避免推理阶段重新读取全量数据。
+
+### 训练时(train.py)
+
+```
+load_model_type()          ← 加载机型全量 parquet 数据
+    │
+    ▼
+get_model_statistics()     ← 计算自适应统计值
+    │   p_max_observed  = p_active 的 99.5% 分位数(近似额定功率)
+    │   baseline_pitch  = 20%~60% 额定功率区间内 pitch_ang_act_1 的中位数(基准桨距角)
+    │   spd_limit       = gen_spd 的 99.9% 分位数(转速上限基准)
+    │   torque_limit    = actual_torque 的 99.9% 分位数(扭矩上限基准)
+    │
+    ▼
+joblib.dump(stats, model_dir / "model_stats.pkl")   ← 保存至机型模型目录
+```
+
+### 推理时(detect.py)
+
+```
+detect_model_type()
+    │
+    ├── model_stats.pkl 存在
+    │       YES → joblib.load(stats_path)          ← 直接加载,毫秒级
+    │       NO  → load_model_type() + get_model_statistics()
+    │              (回退方案,建议重新训练生成 pkl)
+    │
+    ▼
+detect_turbine()
+    │
+    ▼
+label_dataframe(df_raw, model_stats, model_name)   ← 用 stats 计算打标阈值
+```
+
+### stats 字段用途
+
+| 字段 | 来源 | 用途 |
+|------|------|------|
+| p_max_observed | p_active 99.5% 分位数 | 功率越界判断(×1.25上限、×-0.1下限)、停机阈值(×0.5%)、限功率区间(×2%~95%) |
+| baseline_pitch | 中载区间桨距角中位数 | 限功率判断(基准桨距角 + 3° 偏移) |
+| spd_limit | gen_spd 99.9% 分位数 | 转速越界判断(×1.25上限) |
+| torque_limit | actual_torque 99.9% 分位数 | 扭矩越界判断(×1.25上限)、转速-扭矩逻辑悖论判断 |
+
+---
+
+## 训练数据筛选规则
+
+每个检测器只使用与自身相关的干净数据:
+
+- **运行** 数据:全部纳入训练
+- **传感器异常** 数据:该检测器关心的测点无异常时纳入训练
+- **停机 / 限功率**:不参与训练
+
+```python
+DETECTOR_SENSOR_COLS = {
+    "wind_power_curve":   ["d_val_wind", "d_val_power", "d_logic_wind_pwr"],
+    "wind_power_scatter": ["d_val_wind", "d_val_power", "d_logic_wind_pwr"],
+    "yaw_static":         [],   # 无关联传感器列,传感器异常数据全部纳入
+    "yaw_twist":          [],
+    "pitch_regulation":   ["d_val_pitch"],
+    "pitch_coord":        ["d_val_pitch", "d_val_spd", "d_val_power"],
+    "pitch_min":          ["d_val_pitch"],
+    "ctrl_power_quality": ["d_val_power"],
+    "ctrl_op_state":      ["d_val_power", "d_val_spd", "d_val_pitch"],
+}
+```
+
+---
+
+## 检测数据筛选逻辑
+
+检测时同样先对当天数据打标,再按状态分流处理:
+
+```
+当天数据(parquet)
+    │
+    ▼ label_dataframe()
+    ├── 停机 / 限功率 ──────────→ 跳过,不输出任何记录
+    │
+    ├── 运行         ──────────→ 直接送入模型预测
+    │                            result_type = anomaly_detection
+    │
+    └── 传感器异常
+            │
+            ├── 该检测器关心的测点有异常?
+            │       YES → 直接标记为传感器异常
+            │              result_type = sensor_anomaly, is_anomaly = 1
+            │              anomaly_label = 传感器异常-xxx异常
+            │
+            └── 该检测器关心的测点无异常(或无关联测点)
+                    └──→ 送入模型预测
+                           result_type = anomaly_detection
+```
+
+每个检测器关心的传感器列(`DETECTOR_SENSOR_COLS`)决定了传感器异常数据的走向:
+
+| 检测器 | 关联传感器列 | 传感器异常数据处理 |
+|--------|-------------|------------------|
+| wind_power_curve | d_val_wind, d_val_power, d_logic_wind_pwr | 相关列有异常→sensor_anomaly;否则→模型预测 |
+| wind_power_scatter | d_val_wind, d_val_power, d_logic_wind_pwr | 同上 |
+| yaw_static | (无) | 全部传感器异常数据→模型预测 |
+| yaw_twist | (无) | 全部传感器异常数据→模型预测 |
+| pitch_regulation | d_val_pitch | 相关列有异常→sensor_anomaly;否则→模型预测 |
+| pitch_coord | d_val_pitch, d_val_spd, d_val_power | 相关列有异常→sensor_anomaly;否则→模型预测 |
+| pitch_min | d_val_pitch | 相关列有异常→sensor_anomaly;否则→模型预测 |
+| ctrl_power_quality | d_val_power | 相关列有异常→sensor_anomaly;否则→模型预测 |
+| ctrl_op_state | d_val_power, d_val_spd, d_val_pitch | 相关列有异常→sensor_anomaly;否则→模型预测 |
+
+> 9个检测器通过 `ThreadPoolExecutor` 并行执行,主线程统一批量写入数据库,避免 SQLite 锁冲突。
+
+---
+
+## 检测模块说明
+
+### 1. 风速-功率异常(wind_power.py)
+
+训练和预测均只使用有效发电区间(`WIND_VALID_MIN=3.0` ~ `WIND_VALID_MAX=25.0` m/s)数据,排除切入/切出段噪声。
+
+| 检测器 | 算法 | 特征 | 检测目标 |
+|--------|------|------|----------|
+| PowerCurveDetector | z-score(逐点,分段阈值) | 按风速分箱统计 (mean, std),逐点计算功率偏离度 | 功率曲线偏移或单点功率异常;低风速段阈值放宽(4σ),中风速(3σ),高风速段收紧(2.5σ) |
+| ScatterDetector | IsolationForest | (wind_spd, p_active, p_active/wind_spd³, ambient_temp可选) | 单点偏离正常散点云,捕捉风能利用率偏低;温度特征反映空气密度影响 |
+
+### 2. 偏航系统异常(yaw.py)
+
+IsolationForest 替代原 DBSCAN,原生支持 fit/predict 分离,日推理数据量小时更稳定,且自带 anomaly score。不使用 `wind_dir`(存在与 `yaw_err` 数据互换的风险),仅依赖 `yaw_ang` 和 `twist_ang`。
+
+| 检测器 | 算法 | 特征 | 检测目标 |
+|--------|------|------|----------|
+| StaticYawDetector | IsolationForest | yaw_ang、短窗口滚动均值/标准差(2小时,12点)、长窗口滚动均值(12小时,72点) | 偏航角瞬时偏离(短窗口)及持续慢漂移(长窗口) |
+| CableTwistDetector | IsolationForest | twist_ang、绝对值、变化率 | 扭缆角度超限或变化异常(未及时解缆) |
+
+### 3. 变桨系统异常(pitch.py)
+
+LOF 的 n_neighbors 自适应调整:`max(5, min(20, len(train)//50))`,避免小样本时退化。`PitchCoordDetector` 过滤转速 < 5 rpm 的点,避免低转速时功率/转速比噪声放大。
+
+| 检测器 | 算法 | 特征 | 检测目标 |
+|--------|------|------|----------|
+| PitchRegulationDetector | LOF (novelty=True) | 三桨叶设定值-实际值偏差、三桨叶不一致度(std)、变桨速度均值/不一致度(可选) | 桨距角调节偏差过大或三桨叶不同步 |
+| PitchCoordDetector | LOF (novelty=True) | 三桨叶均值/不一致度、rotor_spd、p_active 及衍生比值(仅 rotor_spd > 5 rpm 的点) | 变桨-转速-功率协调关系异常 |
+| MinPitchDetector | IsolationForest | 三桨叶最小值、均值、极差 | 最小桨距角偏离正常范围 |
+
+### 4. 风机运行状态综合异常(control_params.py)
+
+| 检测器 | 算法 | 特征 | 检测目标 |
+|--------|------|------|----------|
+| PowerQualityDetector | IsolationForest | 实际/理论功率偏差比、功率因数、三相电流/电压不平衡度、频率偏差 | 电气侧功率质量异常(三相不平衡、频率偏差、功率因数偏低) |
+| OperationStateDetector | IsolationForest | p_active、gen_spd、三桨叶均值/不一致度、twist_ang、功率/转速比 | 机械侧运行状态整体偏离正常模式 |
+
+---
+
+## 检测输出逻辑
+
+每个时间戳 × 检测器输出一行,`result_type` 区分两类:
+
+| result_type | 触发条件 | is_anomaly | anomaly_score | anomaly_label |
+|-------------|----------|-----------|---------------|---------------|
+| anomaly_detection | 运行数据 + 传感器异常中测点干净的数据,经模型预测 | 0 或 1 | 模型输出分数 | NULL |
+| sensor_anomaly | 传感器异常数据中,该检测器关心的测点有异常 | 恒为 1 | NULL | 传感器异常-xxx异常 |
+
+停机 / 限功率数据不输出任何记录。
+
+所有检测器的 `predict` 输出与输入数据**等长**(保留原始索引),特征缺失的点输出 `anomaly=False, score=NaN`。
+
+---
+
+## 结果数据库表结构
+
+表名:`anomaly_points`
+
+| 字段 | 类型 | 说明 |
+|------|------|------|
+| data_time | TEXT | 数据时间戳(来自 parquet 的 data_time 列) |
+| model_name | TEXT | 机型名称 |
+| farm_name | TEXT | 风场名称 |
+| turbine_name | TEXT | 风机名称 |
+| detector | TEXT | 检测器名称 |
+| result_type | TEXT | anomaly_detection / sensor_anomaly |
+| is_anomaly | INTEGER | 1=异常,0=正常 |
+| anomaly_score | REAL | 异常分数(越低越异常;sensor_anomaly 时为 NULL) |
+| anomaly_label | TEXT | 传感器异常标签(anomaly_detection 时为 NULL) |
+| detect_time | TEXT | 本次检测运行时间 |
+
+索引:`(turbine_name, data_time)`,加速按风机+时间查询。删除旧记录时同时匹配 `farm_name`,防止不同风场同名风机互删。
+
+> 当前使用 SQLite,生产环境可在 `detect.py` 的 `init_db` 和 `_bulk_insert` 中替换为 PostgreSQL/MySQL。
+
+---
+
+## 异常分数说明
+
+`anomaly_score` 来自各算法的 `score_samples()` 方法,含义是"该点属于正常分布的程度",**越低越异常**:
+
+| 算法 | 检测器 | 分数范围 | 说明 |
+|------|--------|----------|------|
+| z-score | wind_power_curve | (-∞, +∞) | 功率偏离 bin 均值的标准差倍数;低风速段 \|z\|>4、中风速 \|z\|>3、高风速 \|z\|>2.5 判异常 |
+| IsolationForest | wind_power_scatter, yaw_static, yaw_twist, pitch_min, ctrl_* | 约 [-0.5, 0.5] | 基于平均路径长度,< -0.1 开始值得关注 |
+| LOF | pitch_regulation, pitch_coord | 约 (-∞, -1] | 局部离群因子取负,-1 表示完全正常 |
+
+实际使用时以 `is_anomaly=1` 为主要判断依据,`anomaly_score` 用于排序优先级(分数越低 → 越需要优先排查)。
+
+---
+
+## 配置参数说明
+
+所有参数集中在 `config.py`:
+
+| 参数 | 默认值 | 说明 |
+|------|--------|------|
+| ISO_CONTAMINATION | 0.01 | IsolationForest/LOF 预期异常比例 |
+| ISO_N_ESTIMATORS | 100 | IsolationForest 树数量 |
+| WIND_BIN_WIDTH | 0.5 | 风速分箱宽度(m/s) |
+| WIND_VALID_MIN | 3.0 | 有效发电风速下限(m/s) |
+| WIND_VALID_MAX | 25.0 | 有效发电风速上限(m/s) |
+| STATUS_POWER_UPPER_RATIO | 1.25 | 功率越界上限倍数 |
+| STATUS_CURTAIL_PITCH_OFFSET | 3.0 | 限功率桨距角偏移量(°) |
+| STATUS_MODEL_SPECIAL_RULES | {...} | 机型特殊打标规则 |

+ 0 - 0
__init__.py


+ 90 - 0
config.py

@@ -0,0 +1,90 @@
+from pathlib import Path
+
+# ============================================================
+# 路径配置(请根据实际环境修改)
+# ============================================================
+
+# Parquet 数据根目录,目录结构: {PARQUET_ROOT}/{model_name}/{farm_name}/{turbine_name}.parquet
+# TODO: 替换为实际路径
+PARQUET_ROOT = Path("/Volumes/P3/大唐标准化数据/stander_parquet")
+
+# 模型保存目录,按机型子目录存储
+# TODO: 替换为实际路径
+MODEL_SAVE_DIR = Path("/Users/xmia/Desktop/ZN/项目/大唐25-26/Dtest_code/code_abnormal/anomaly_detection/saved_models")
+
+# 结果数据库路径(SQLite,可替换为其他数据库连接字符串)
+# TODO: 替换为实际数据库地址
+DB_PATH = "/Users/xmia/Desktop/ZN/项目/大唐25-26/Dtest_code/code_abnormal/anomaly_detection/anomaly_results.db"
+
+# ============================================================
+# IsolationForest 通用参数
+# ============================================================
+ISO_CONTAMINATION = 0.01   # 预期异常比例,可按机型调整
+ISO_RANDOM_STATE  = 42
+ISO_N_ESTIMATORS  = 100
+
+# ============================================================
+# 状态打标参数
+# ============================================================
+
+# 功率越界倍数(相对额定功率)
+STATUS_POWER_UPPER_RATIO  = 1.25   # 功率上限倍数
+STATUS_POWER_LOWER_RATIO  = -0.10  # 功率下限倍数
+STATUS_SHUTDOWN_RATIO     = 0.005  # 停机功率阈值(额定功率的比例),最小取 10kW
+STATUS_CURTAIL_LOW_RATIO  = 0.02   # 限功率下限(额定功率比例)
+STATUS_CURTAIL_HIGH_RATIO = 0.95   # 限功率上限(额定功率比例)
+STATUS_CURTAIL_PITCH_OFFSET = 3.0  # 限功率桨距角偏移量(基准桨距角 + 此值)
+
+# 风速越界阈值(m/s)
+STATUS_WIND_MAX = 75.0
+STATUS_WIND_MIN = -2.0
+
+# 变桨越界阈值(°)
+STATUS_PITCH_MAX = 105.0
+STATUS_PITCH_MIN = -10.0
+
+# 转速/扭矩越界倍数
+STATUS_SPD_UPPER_RATIO    = 1.25
+STATUS_TORQUE_UPPER_RATIO = 1.25
+STATUS_TORQUE_LOWER_ABS   = -2000.0
+
+# 逻辑悖论阈值
+STATUS_LOGIC_WIND_MIN     = 0.1    # 风速低于此值但功率大则为逻辑异常
+STATUS_LOGIC_POWER_MIN    = 100.0  # 配合上面风速阈值的功率下限
+
+# 机型特殊规则: {model_name: [(wind_thresh, power_thresh), ...]}
+# 风速 < wind_thresh 但 功率 > power_thresh 时追加逻辑异常
+STATUS_MODEL_SPECIAL_RULES = {
+    "EN156-3300": [(2.7, 450)],
+}
+
+# ============================================================
+# 风速功率分箱参数
+# ============================================================
+WIND_BIN_WIDTH = 0.5   # m/s
+WIND_BIN_MIN   = 0.0
+WIND_BIN_MAX   = 25.0
+
+# 有效发电风速区间(过滤切入/切出段噪声)
+WIND_VALID_MIN = 3.0   # m/s,切入风速
+WIND_VALID_MAX = 25.0  # m/s,切出风速
+
+# ============================================================
+# 测点列名(与 parquet 字段名对应)
+# ============================================================
+COL_WIND_SPD         = "wind_spd"
+COL_P_ACTIVE         = "p_active"
+COL_YAW_ANG          = "yaw_ang"
+COL_TWIST_ANG        = "twist_ang"
+COL_PITCH_SET_1      = "pitch_ang_set_1"
+COL_PITCH_SET_2      = "pitch_ang_set_2"
+COL_PITCH_SET_3      = "pitch_ang_set_3"
+COL_PITCH_ACT_1      = "pitch_ang_act_1"
+COL_PITCH_ACT_2      = "pitch_ang_act_2"
+COL_PITCH_ACT_3      = "pitch_ang_act_3"
+COL_PITCH_SPD_1      = "pitch_spd_1"
+COL_PITCH_SPD_2      = "pitch_spd_2"
+COL_PITCH_SPD_3      = "pitch_spd_3"
+COL_ROTOR_SPD        = "rotor_spd"
+COL_THEORY_P_ACTIVE  = "theory_p_active"
+COL_AMBIENT_TEMP     = "ambient_temp"

+ 110 - 0
data_loader.py

@@ -0,0 +1,110 @@
+import os
+from pathlib import Path
+from typing import List, Optional
+import pandas as pd
+import pyarrow.parquet as pq
+from config import PARQUET_ROOT
+
+
+def list_model_types() -> List[str]:
+    """返回 PARQUET_ROOT 下所有机型文件夹名称。"""
+    return [p.name for p in PARQUET_ROOT.iterdir() if p.is_dir()]
+
+
+def list_turbines(model_name: str) -> List[dict]:
+    """
+    返回某机型下所有风机信息列表。
+    每条记录: {model_name, farm_name, turbine_name, path}
+    """
+    records = []
+    model_root = PARQUET_ROOT / model_name
+    if not model_root.exists():
+        return records
+    for farm_dir in model_root.iterdir():
+        if not farm_dir.is_dir():
+            continue
+        for pq_file in farm_dir.glob("*.parquet"):
+            if pq_file.name.startswith("._"):
+                continue
+            records.append({
+                "model_name":   model_name,
+                "farm_name":    farm_dir.name,
+                "turbine_name": pq_file.stem,
+                "path":         pq_file,
+            })
+    return records
+
+
+def load_turbine(
+    path: Path,
+    required_cols: List[str],
+    optional_cols: Optional[List[str]] = None,
+) -> Optional[pd.DataFrame]:
+    """
+    读取单台风机 parquet 文件,带测点缺失容错。
+
+    - required_cols: 任意一列缺失则跳过该文件,返回 None
+    - optional_cols: 存在则读取,不存在则忽略
+    - 返回仅包含实际存在列的 DataFrame,已对 required_cols 做 dropna
+    """
+    try:
+        pq_file = pq.ParquetFile(path)
+        schema_names = set(pq_file.schema.names)
+    except Exception as e:
+        print(f"[WARN] 无法读取 schema {path.name}: {e}")
+        return None
+
+    missing_required = [c for c in required_cols if c not in schema_names]
+    if missing_required:
+        print(f"[SKIP] {path.name} 缺少必要测点: {missing_required}")
+        return None
+
+    cols_to_read = list(required_cols)
+    if optional_cols:
+        cols_to_read += [c for c in optional_cols if c in schema_names]
+
+    try:
+        df = pq_file.read(columns=cols_to_read).to_pandas()
+    except Exception as e:
+        print(f"[WARN] 读取数据失败 {path.name}: {e}")
+        return None
+
+    # 转换数值类型,跳过时间戳列,过滤必要列的空值
+    numeric_cols = [c for c in cols_to_read if c != "data_time"]
+    for col in numeric_cols:
+        df[col] = pd.to_numeric(df[col], errors='coerce')
+    df = df.dropna(subset=required_cols)
+
+    if df.empty:
+        print(f"[SKIP] {path.name} 必要测点全为空值")
+        return None
+
+    return df
+
+
+def load_model_type(
+    model_name: str,
+    required_cols: List[str],
+    optional_cols: Optional[List[str]] = None,
+) -> pd.DataFrame:
+    """
+    聚合某机型下所有风机数据为一个 DataFrame(用于训练)。
+    自动附加 farm_name / turbine_name 列便于溯源。
+    """
+    frames = []
+    for rec in list_turbines(model_name):
+        df = load_turbine(rec["path"], required_cols, optional_cols)
+        if df is None:
+            continue
+        df = df.copy()
+        df["farm_name"]    = rec["farm_name"]
+        df["turbine_name"] = rec["turbine_name"]
+        frames.append(df)
+
+    if not frames:
+        print(f"[WARN] 机型 {model_name} 无有效数据(所需列: {required_cols})")
+        return pd.DataFrame()
+
+    result = pd.concat(frames, ignore_index=True)
+    print(f"[INFO] 机型 {model_name} 加载完成,共 {len(result)} 行,{len(frames)} 台风机")
+    return result

+ 333 - 0
detect.py

@@ -0,0 +1,333 @@
+"""
+推理入口: 读取前1天 parquet 数据,逐点输出异常检测结果。
+
+数据读取:
+  - 每天更新一个 parquet,路径: {PARQUET_ROOT}/{model_name}/{farm_name}/{turbine_name}.parquet
+  - 读取时过滤 data_time 在 [昨日 00:00, 今日 00:00) 范围内的数据
+
+检测逻辑:
+  1. 打标(运行/限功率/停机/传感器异常)
+  2. 停机 / 限功率 → 跳过,不输出
+  3. 运行数据 → 模型预测,逐点输出 is_anomaly + score
+  4. 传感器异常数据:
+       该检测器关心测点有异常 → 逐点输出 result_type='sensor_anomaly', anomaly_label=标签
+       该检测器关心测点无异常 → 模型预测,逐点输出
+
+输出表 anomaly_points(逐点):
+  data_time, model_name, farm_name, turbine_name, detector,
+  result_type, is_anomaly, anomaly_score, anomaly_label, detect_time
+
+用法:
+    python detect.py                    # 检测所有机型所有风机(前1天数据)
+    python detect.py --model 机型名称   # 只检测指定机型
+    python detect.py --date 2026-02-24  # 指定日期(默认昨天)
+"""
+import argparse
+import sqlite3
+from concurrent.futures import ThreadPoolExecutor, as_completed
+from datetime import datetime, timedelta
+from pathlib import Path
+
+import joblib
+import pandas as pd
+import numpy as np
+
+from config import MODEL_SAVE_DIR, DB_PATH
+from data_loader import list_model_types, list_turbines, load_turbine
+from labeler import get_model_statistics, label_dataframe, DETECTOR_SENSOR_COLS
+from models.wind_power import PowerCurveDetector, ScatterDetector
+from models.yaw import StaticYawDetector, CableTwistDetector
+from models.pitch import PitchRegulationDetector, PitchCoordDetector, MinPitchDetector
+from models.control_params import PowerQualityDetector, OperationStateDetector
+
+
+# ── 数据库 ─────────────────────────────────────────────────────────────────────
+
+def init_db(db_path: str) -> sqlite3.Connection:
+    conn = sqlite3.connect(db_path)
+    conn.execute("""
+        CREATE TABLE IF NOT EXISTS anomaly_points (
+            id            INTEGER PRIMARY KEY AUTOINCREMENT,
+            data_time     TEXT,     -- 数据时间戳(来自 parquet 的 data_time 列)
+            model_name    TEXT,     -- 机型
+            farm_name     TEXT,     -- 风场
+            turbine_name  TEXT,     -- 风机
+            detector      TEXT,     -- 检测器名称
+            result_type   TEXT,     -- 'anomaly_detection' | 'sensor_anomaly'
+            is_anomaly    INTEGER,  -- 1=异常, 0=正常(sensor_anomaly 时恒为 1)
+            anomaly_score REAL,     -- 异常分数(越低越异常;sensor_anomaly 时为 NULL)
+            anomaly_label TEXT,     -- 异常标签(sensor_anomaly 时填传感器异常类型,anomaly_detection 时为 NULL)
+            detect_time   TEXT      -- 本次检测运行时间
+        )
+    """)
+    conn.execute("""
+        CREATE INDEX IF NOT EXISTS idx_ap_turbine_time
+        ON anomaly_points (turbine_name, data_time)
+    """)
+    conn.commit()
+    return conn
+
+
+def _bulk_insert(conn, rows: list):
+    """批量写入,rows 为 tuple 列表。"""
+    if not rows:
+        return
+    conn.executemany(
+        "INSERT INTO anomaly_points VALUES (NULL,?,?,?,?,?,?,?,?,?,?)",
+        rows
+    )
+    conn.commit()
+
+
+def _delete_existing(conn, turbine_name: str, farm_name: str, target_date: str):
+    """写入前删除该风机该日期的旧记录,防止重复运行产生重复数据。"""
+    conn.execute(
+        "DELETE FROM anomaly_points WHERE turbine_name=? AND farm_name=? AND data_time LIKE ?",
+        (turbine_name, farm_name, f"{target_date}%"),
+    )
+    conn.commit()
+
+
+# ── 数据时间过滤 ───────────────────────────────────────────────────────────────
+
+def _filter_date(df: pd.DataFrame, target_date: datetime.date) -> pd.DataFrame:
+    """
+    过滤 data_time 列在 target_date 当天的数据。
+    data_time 列不存在时返回原始数据(不过滤)。
+    """
+    if "data_time" not in df.columns:
+        return df
+    dt = pd.to_datetime(df["data_time"], errors="coerce")
+    mask = dt.dt.date == target_date
+    return df[mask & dt.notna()].copy()
+
+
+# ── 核心检测函数(逐点输出)────────────────────────────────────────────────────
+
+def _run_detector(
+    model_name: str, farm_name: str, turbine_name: str,
+    model, detector_name: str,
+    df_running: pd.DataFrame,
+    df_sensor: pd.DataFrame,
+    detect_time: str,
+) -> list:
+    """
+    对单个检测器执行检测,返回 rows 列表(由调用方统一写入数据库)。
+    model: 已加载的检测器对象(机型级预加载,避免每台风机重复 joblib.load)。
+    """
+    if model is None:
+        return []
+
+    rows = []
+    sensor_cols = DETECTOR_SENSOR_COLS.get(detector_name, [])
+
+    # ── 处理传感器异常数据 ────────────────────────────────────────────────────
+    if not df_sensor.empty:
+        existing_sc = [c for c in sensor_cols if c in df_sensor.columns]
+        if existing_sc:
+            has_anom = df_sensor[existing_sc].any(axis=1)
+        else:
+            has_anom = pd.Series(False, index=df_sensor.index)
+
+        # 有传感器异常 → 直接标记
+        df_direct = df_sensor[has_anom]
+        if not df_direct.empty:
+            dt_col = df_direct["data_time"].astype(str) if "data_time" in df_direct.columns \
+                     else pd.Series("", index=df_direct.index)
+            st_col = df_direct["status"].astype(str) if "status" in df_direct.columns \
+                     else pd.Series("传感器异常", index=df_direct.index)
+            for t, st in zip(dt_col.values, st_col.values):
+                rows.append((
+                    t, model_name, farm_name, turbine_name, detector_name,
+                    "sensor_anomaly", 1, None, st, detect_time,
+                ))
+
+        # 无传感器异常 → 加入模型检测队列
+        df_sensor_clean = df_sensor[~has_anom]
+    else:
+        df_sensor_clean = pd.DataFrame()
+
+    # ── 合并需要模型预测的数据 ────────────────────────────────────────────────
+    frames = [f for f in [df_running, df_sensor_clean] if not f.empty]
+    if frames:
+        df_detect = pd.concat(frames, ignore_index=False)
+        try:
+            result = model.predict(df_detect)
+            # 按 index 对齐时间戳,避免 predict 内部 dropna 后行数不一致导致错位
+            time_map = df_detect["data_time"].astype(str).to_dict() if "data_time" in df_detect.columns \
+                       else {i: "" for i in df_detect.index}
+            is_anom_col = result["anomaly"].astype(int) if "anomaly" in result.columns \
+                          else pd.Series(0, index=result.index)
+            score_col = result["score"] if "score" in result.columns \
+                        else pd.Series(np.nan, index=result.index)
+            for idx in result.index:
+                sc_val = score_col.loc[idx]
+                sc = float(sc_val) if pd.notna(sc_val) else None
+                rows.append((
+                    time_map.get(idx, ""),
+                    model_name, farm_name, turbine_name, detector_name,
+                    "anomaly_detection",
+                    int(is_anom_col.loc[idx]),
+                    sc,
+                    None,
+                    detect_time,
+                ))
+            n_anom = int(is_anom_col.sum())
+            print(f"    [{detector_name}] 检测 {len(result)} 点,异常 {n_anom} 点")
+        except Exception as e:
+            print(f"    [{detector_name}] 检测失败: {e}")
+
+    return rows
+
+
+# ── 单台风机检测 ───────────────────────────────────────────────────────────────
+
+def detect_turbine(rec: dict, models: dict, conn: sqlite3.Connection,
+                   model_stats: dict, target_date, detect_time: str):
+    """models: {detector_name: loaded_model_object},机型级预加载。"""
+    mn, fn, tn, path = rec["model_name"], rec["farm_name"], rec["turbine_name"], rec["path"]
+
+    all_cols = [
+        "data_time",
+        "wind_spd", "p_active", "gen_spd", "actual_torque",
+        "yaw_ang", "twist_ang",
+        "pitch_ang_set_1", "pitch_ang_set_2", "pitch_ang_set_3",
+        "pitch_ang_act_1", "pitch_ang_act_2", "pitch_ang_act_3",
+        "pitch_spd_1", "pitch_spd_2", "pitch_spd_3",
+        "rotor_spd", "theory_p_active", "p_reactive", "grid_freq",
+        "grid_ia", "grid_ib", "grid_ic",
+        "grid_ua", "grid_ub", "grid_uc",
+        "ambient_temp",
+    ]
+    df_raw = load_turbine(path, required_cols=["p_active"], optional_cols=all_cols)
+    if df_raw is None:
+        return
+
+    df_raw = _filter_date(df_raw, target_date)
+    if df_raw.empty:
+        print(f"  {fn}/{tn}: 无 {target_date} 数据,跳过")
+        return
+
+    # 写入前去重
+    _delete_existing(conn, tn, fn, str(target_date))
+
+    labeled   = label_dataframe(df_raw, model_stats, mn)
+    df_run    = labeled[labeled["status"] == "运行"].copy()
+    df_sensor = labeled[labeled["status"].str.startswith("传感器异常")].copy()
+
+    total = len(labeled)
+    print(f"  {fn}/{tn}: 总{total} | 运行{len(df_run)} | "
+          f"传感器异常{len(df_sensor)} | 停机/限功率{total-len(df_run)-len(df_sensor)}")
+
+    # 并行运行9个检测器(模型对象只读,线程安全),主线程统一写入避免 SQLite 锁冲突
+    DETECTOR_NAMES = [
+        "wind_power_curve", "wind_power_scatter",
+        "yaw_static", "yaw_twist",
+        "pitch_regulation", "pitch_coord", "pitch_min",
+        "ctrl_power_quality", "ctrl_op_state",
+    ]
+    all_rows = []
+    with ThreadPoolExecutor(max_workers=min(9, len(DETECTOR_NAMES))) as executor:
+        futures = {
+            executor.submit(
+                _run_detector, mn, fn, tn,
+                models.get(det_name), det_name,
+                df_run, df_sensor, detect_time
+            ): det_name
+            for det_name in DETECTOR_NAMES
+        }
+        for future in as_completed(futures):
+            det_name = futures[future]
+            try:
+                all_rows.extend(future.result())
+            except Exception as e:
+                print(f"    [{det_name}] 并行检测异常: {e}")
+    _bulk_insert(conn, all_rows)
+
+
+# ── 机型级检测 ─────────────────────────────────────────────────────────────────
+
+def detect_model_type(model_name: str, conn: sqlite3.Connection,
+                      target_date, detect_time: str):
+    model_dir = MODEL_SAVE_DIR / model_name
+    if not model_dir.exists():
+        print(f"[SKIP] 机型 {model_name} 无已训练模型,请先运行 train.py")
+        return
+
+    turbines = list_turbines(model_name)
+    print(f"\n机型 {model_name}: 共 {len(turbines)} 台风机,检测日期 {target_date}")
+
+    # ── 加载 model_stats(优先从 pkl,回退到全量数据) ──
+    stats_path = model_dir / "model_stats.pkl"
+    if stats_path.exists():
+        model_stats = joblib.load(stats_path)
+        print(f"  [stats] 从 model_stats.pkl 加载")
+    else:
+        from data_loader import load_model_type
+        df_all = load_model_type(model_name, required_cols=["p_active"],
+                                 optional_cols=["gen_spd", "actual_torque", "pitch_ang_act_1"])
+        if df_all.empty:
+            print(f"[SKIP] 机型 {model_name} 无数据")
+            return
+        model_stats = get_model_statistics(df_all)
+        del df_all
+        print(f"  [stats] 从全量数据计算(建议重新训练以生成 model_stats.pkl)")
+
+    # ── 机型级预加载所有模型(一次 joblib.load,所有风机复用) ──
+    PKL_MAP = {
+        "wind_power_curve":   ("wind_power_curve.pkl",   PowerCurveDetector.load),
+        "wind_power_scatter": ("wind_power_scatter.pkl", ScatterDetector.load),
+        "yaw_static":         ("yaw_static.pkl",         StaticYawDetector.load),
+        "yaw_twist":          ("yaw_twist.pkl",          CableTwistDetector.load),
+        "pitch_regulation":   ("pitch_regulation.pkl",   PitchRegulationDetector.load),
+        "pitch_coord":        ("pitch_coord.pkl",        PitchCoordDetector.load),
+        "pitch_min":          ("pitch_min.pkl",           MinPitchDetector.load),
+        "ctrl_power_quality": ("ctrl_power_quality.pkl", PowerQualityDetector.load),
+        "ctrl_op_state":      ("ctrl_op_state.pkl",      OperationStateDetector.load),
+    }
+    models = {}
+    for det_name, (pkl_file, load_fn) in PKL_MAP.items():
+        pkl_path = model_dir / pkl_file
+        if pkl_path.exists():
+            try:
+                models[det_name] = load_fn(pkl_path)
+            except Exception as e:
+                print(f"  [WARN] 加载 {pkl_file} 失败: {e}")
+    print(f"  [模型] 预加载 {len(models)}/{len(PKL_MAP)} 个检测器")
+
+    for rec in turbines:
+        print(f"  检测: {rec['farm_name']} / {rec['turbine_name']}")
+        detect_turbine(rec, models, conn, model_stats, target_date, detect_time)
+
+
+# ── 主流程 ─────────────────────────────────────────────────────────────────────
+
+def main():
+    parser = argparse.ArgumentParser(description="风机异常检测推理(逐点输出)")
+    parser.add_argument("--model", type=str, default=None, help="指定机型名称")
+    parser.add_argument("--date",  type=str, default=None,
+                        help="指定检测日期 YYYY-MM-DD,默认为昨天")
+    args = parser.parse_args()
+
+    if args.date:
+        target_date = datetime.strptime(args.date, "%Y-%m-%d").date()
+    else:
+        target_date = (datetime.now() - timedelta(days=1)).date()
+
+    detect_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+    conn = init_db(DB_PATH)
+    print(f"结果数据库: {DB_PATH}")
+    print(f"检测日期:   {target_date}")
+
+    if args.model:
+        detect_model_type(args.model, conn, target_date, detect_time)
+    else:
+        for mt in list_model_types():
+            detect_model_type(mt, conn, target_date, detect_time)
+
+    conn.close()
+    print("\n检测完成。")
+
+
+if __name__ == "__main__":
+    main()

+ 194 - 0
labeler.py

@@ -0,0 +1,194 @@
+"""
+状态打标模块
+
+功能:
+  1. get_model_statistics()  - 按机型统计额定功率、基准桨距角等自适应阈值
+  2. label_dataframe()       - 对 DataFrame 打标:运行 / 限功率 / 停机 / 传感器异常-xxx异常
+  3. DETECTOR_SENSOR_COLS    - 每个检测器所依赖的测点,用于 detect.py 判断传感器异常
+
+状态优先级(高→低): 传感器异常 > 停机 > 限功率 > 运行
+
+传感器异常列命名规则:
+  d_val_power   功率值越界
+  d_val_wind    风速值越界
+  d_val_pitch   变桨值越界
+  d_val_spd     转速值越界
+  d_val_torque  扭矩值越界
+  d_logic_wind_pwr    风速-功率逻辑悖论
+  d_logic_torque_pwr  转速-扭矩逻辑悖论
+"""
+import pandas as pd
+import numpy as np
+from typing import Dict, Optional
+
+from config import (
+    STATUS_POWER_UPPER_RATIO, STATUS_POWER_LOWER_RATIO,
+    STATUS_SHUTDOWN_RATIO, STATUS_CURTAIL_LOW_RATIO,
+    STATUS_CURTAIL_HIGH_RATIO, STATUS_CURTAIL_PITCH_OFFSET,
+    STATUS_WIND_MAX, STATUS_WIND_MIN,
+    STATUS_PITCH_MAX, STATUS_PITCH_MIN,
+    STATUS_SPD_UPPER_RATIO, STATUS_TORQUE_UPPER_RATIO, STATUS_TORQUE_LOWER_ABS,
+    STATUS_LOGIC_WIND_MIN, STATUS_LOGIC_POWER_MIN,
+    STATUS_MODEL_SPECIAL_RULES,
+)
+
+# ── 每个检测器依赖的传感器异常列 ──────────────────────────────────────────────
+# detect.py 用此映射判断:传感器异常数据中,该检测器所需测点是否有异常
+# 若有 → 直接输出传感器异常标签;若无 → 正常做异常检测
+DETECTOR_SENSOR_COLS: Dict[str, list] = {
+    "wind_power_curve":  ["d_val_wind", "d_val_power", "d_logic_wind_pwr"],
+    "wind_power_scatter":["d_val_wind", "d_val_power", "d_logic_wind_pwr"],
+    "yaw_static":        [],   # yaw_ang 无对应传感器异常列,不直接跳过
+    "yaw_twist":         [],
+    "pitch_regulation":  ["d_val_pitch"],
+    "pitch_coord":       ["d_val_pitch", "d_val_spd", "d_val_power"],
+    "pitch_min":         ["d_val_pitch"],
+    "ctrl_power_quality": ["d_val_power"],
+    "ctrl_op_state":      ["d_val_power", "d_val_spd", "d_val_pitch"],
+}
+
+# 传感器异常列 → 可读标签(用于输出结果)
+SENSOR_COL_LABEL: Dict[str, str] = {
+    "d_val_power":          "功率值异常",
+    "d_val_wind":           "风速值异常",
+    "d_val_pitch":          "变桨值异常",
+    "d_val_spd":            "转速值异常",
+    "d_val_torque":         "扭矩值异常",
+    "d_logic_wind_pwr":     "风速功率逻辑异常",
+    "d_logic_torque_pwr":   "转速扭矩逻辑异常",
+}
+
+
+def get_model_statistics(df: pd.DataFrame) -> dict:
+    """计算机型自适应统计阈值(需传入机型全量数据)。"""
+    stats: dict = {}
+    stats["p_max_observed"] = df["p_active"].quantile(0.995)
+    stats["torque_limit"] = (
+        df["actual_torque"].quantile(0.999) if "actual_torque" in df.columns else None
+    )
+    stats["spd_limit"] = (
+        df["gen_spd"].quantile(0.999) if "gen_spd" in df.columns else None
+    )
+
+    partial_mask = (
+        (df["p_active"] > stats["p_max_observed"] * 0.2) &
+        (df["p_active"] < stats["p_max_observed"] * 0.6)
+    )
+    if partial_mask.any() and "pitch_ang_act_1" in df.columns:
+        stats["baseline_pitch"] = df.loc[partial_mask, "pitch_ang_act_1"].median()
+    else:
+        stats["baseline_pitch"] = 0.0
+
+    print(
+        f"  [自适应统计] 额定: {stats['p_max_observed']:.1f}kW"
+        f" | 基准桨距: {stats['baseline_pitch']:.2f}°"
+    )
+    return stats
+
+
+def label_dataframe(df_input: pd.DataFrame, stats: dict, model_name: str) -> pd.DataFrame:
+    """
+    对 DataFrame 打标,返回含以下新列的 DataFrame:
+      - d_val_*  / d_logic_*  : 各传感器异常布尔列
+      - sensor_anomaly_tags   : 逗号拼接的传感器异常标签字符串(无异常为空串)
+      - status                : 运行 / 限功率 / 停机 / 传感器异常-xxx异常
+    """
+    df = df_input.copy()
+    P_MAX      = stats["p_max_observed"]
+    PITCH_BASE = stats["baseline_pitch"]
+
+    # ── 1. 传感器异常列 ────────────────────────────────────────────────────────
+    df["d_val_power"] = (
+        (df["p_active"] > P_MAX * STATUS_POWER_UPPER_RATIO) |
+        (df["p_active"] < P_MAX * STATUS_POWER_LOWER_RATIO)
+    )
+
+    if "wind_spd" in df.columns:
+        df["d_val_wind"] = (df["wind_spd"] > STATUS_WIND_MAX) | (df["wind_spd"] < STATUS_WIND_MIN)
+    else:
+        df["d_val_wind"] = False
+
+    pitch_cols = [c for c in df.columns if "pitch_ang_act" in c]
+    df["d_val_pitch"] = False
+    for col in pitch_cols:
+        df["d_val_pitch"] |= (df[col] > STATUS_PITCH_MAX) | (df[col] < STATUS_PITCH_MIN)
+
+    df["d_val_spd"] = False
+    if stats["spd_limit"] and "gen_spd" in df.columns:
+        df["d_val_spd"] = (
+            (df["gen_spd"] > stats["spd_limit"] * STATUS_SPD_UPPER_RATIO) |
+            (df["gen_spd"] < -200)
+        )
+
+    df["d_val_torque"] = False
+    if stats["torque_limit"] and "actual_torque" in df.columns:
+        df["d_val_torque"] = (
+            (df["actual_torque"] > stats["torque_limit"] * STATUS_TORQUE_UPPER_RATIO) |
+            (df["actual_torque"] < STATUS_TORQUE_LOWER_ABS)
+        )
+
+    # ── 2. 逻辑悖论列 ──────────────────────────────────────────────────────────
+    if "wind_spd" in df.columns:
+        df["d_logic_wind_pwr"] = (
+            (df["wind_spd"] < STATUS_LOGIC_WIND_MIN) &
+            (df["p_active"] > STATUS_LOGIC_POWER_MIN)
+        )
+        for wind_thresh, pwr_thresh in STATUS_MODEL_SPECIAL_RULES.get(model_name, []):
+            df["d_logic_wind_pwr"] |= (
+                (df["wind_spd"] < wind_thresh) & (df["p_active"] > pwr_thresh)
+            )
+    else:
+        df["d_logic_wind_pwr"] = False
+
+    df["d_logic_torque_pwr"] = False
+    if stats["torque_limit"] and "actual_torque" in df.columns:
+        df["d_logic_torque_pwr"] = (
+            (df["p_active"] > P_MAX * 0.1) &
+            (df["actual_torque"].abs() < stats["torque_limit"] * 0.01)
+        )
+
+    # ── 3. 汇总传感器异常标签 ──────────────────────────────────────────────────
+    all_sensor_cols = list(SENSOR_COL_LABEL.keys())
+    existing = [c for c in all_sensor_cols if c in df.columns]
+
+    # 向量化拼接:逐列生成标签,再按行合并(避免 apply axis=1)
+    if existing:
+        tag_matrix = np.where(
+            df[existing].values,
+            np.array([SENSOR_COL_LABEL[c] for c in existing]),
+            "",
+        )
+        df["sensor_anomaly_tags"] = pd.Series(
+            [",".join(t for t in row if t) for row in tag_matrix],
+            index=df.index,
+        )
+    else:
+        df["sensor_anomaly_tags"] = ""
+
+    # ── 4. 业务状态打标 ────────────────────────────────────────────────────────
+    any_sensor_anomaly = df[[c for c in existing if c in df.columns]].any(axis=1)
+
+    df["status"] = "运行"
+
+    shutdown_thresh = max(10.0, P_MAX * STATUS_SHUTDOWN_RATIO)
+    mask_shutdown = (~df["d_val_power"]) & (df["p_active"] <= shutdown_thresh)
+    df.loc[mask_shutdown, "status"] = "停机"
+
+    if "pitch_ang_act_1" in df.columns:
+        mask_curtail = (
+            (df["status"] != "停机") &
+            (~df["d_val_power"]) &
+            (~df["d_val_pitch"]) &
+            (df["p_active"] > P_MAX * STATUS_CURTAIL_LOW_RATIO) &
+            (df["p_active"] < P_MAX * STATUS_CURTAIL_HIGH_RATIO) &
+            (df["pitch_ang_act_1"] > (PITCH_BASE + STATUS_CURTAIL_PITCH_OFFSET))
+        )
+        df.loc[mask_curtail, "status"] = "限功率"
+
+    # 传感器异常覆盖(最高优先级)
+    mask_sensor = any_sensor_anomaly
+    df.loc[mask_sensor, "status"] = (
+        "传感器异常-" + df.loc[mask_sensor, "sensor_anomaly_tags"]
+    )
+
+    return df

+ 0 - 0
models/__init__.py


BIN
models/__pycache__/__init__.cpython-39.pyc


BIN
models/__pycache__/control_params.cpython-39.pyc


BIN
models/__pycache__/pitch.cpython-39.pyc


BIN
models/__pycache__/wind_power.cpython-39.pyc


BIN
models/__pycache__/yaw.cpython-39.pyc


+ 192 - 0
models/control_params.py

@@ -0,0 +1,192 @@
+"""
+Module 4: 运行参数异常检测
+
+
+算法: IsolationForest
+  - 相比 OneClassSVM 对高维稀疏特征更稳定,训练速度更快
+  - 两个检测器特征维度较高(电气侧多测点),IF 更合适
+
+子检测器:
+  A. PowerQualityDetector    - 功率质量异常
+     测点: p_active, theory_p_active, p_reactive, grid_freq,
+           grid_ia/ib/ic, grid_ua/ub/uc
+     检测: 三相不平衡、功率因数偏低、频率偏差、理论/实际功率偏差
+
+  B. OperationStateDetector  - 运行状态异常
+     测点: p_active, gen_spd, pitch_ang_act_1/2/3, twist_ang
+     检测: 转速-功率-桨距角-扭缆整体运行状态偏离正常模式
+"""
+import pandas as pd
+import numpy as np
+from sklearn.ensemble import IsolationForest
+from sklearn.preprocessing import StandardScaler
+import joblib
+from pathlib import Path
+
+from config import (
+    ISO_CONTAMINATION, ISO_RANDOM_STATE, ISO_N_ESTIMATORS,
+    COL_P_ACTIVE, COL_ROTOR_SPD,
+    COL_PITCH_ACT_1, COL_PITCH_ACT_2, COL_PITCH_ACT_3,
+    COL_TWIST_ANG,
+)
+
+_GRID_CURR = ["grid_ia", "grid_ib", "grid_ic"]
+_GRID_VOLT = ["grid_ua", "grid_ub", "grid_uc"]
+
+
+# ── A. 功率质量检测器 ──────────────────────────────────────────────────────────
+
+class PowerQualityDetector:
+    """
+    特征工程:
+      - 理论/实际功率偏差比 (p_diff_ratio)
+      - 功率因数近似 (p_active / sqrt(p_active^2 + p_reactive^2))
+      - 三相电流不平衡度 (std/mean)
+      - 三相电压不平衡度 (std/mean)
+      - 电网频率偏差 (grid_freq - 50)
+    所有测点均为可选,存在则纳入特征,缺失则跳过。
+    至少需要 p_active + 任意一个辅助测点。
+    """
+    def __init__(self, contamination: float = ISO_CONTAMINATION):
+        self.scaler = StandardScaler()
+        self.model  = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        feat = {}
+
+        # 理论/实际功率偏差比
+        if "theory_p_active" in df.columns and COL_P_ACTIVE in df.columns:
+            denom = df["theory_p_active"].replace(0, np.nan)
+            feat["p_diff_ratio"] = (df[COL_P_ACTIVE] - df["theory_p_active"]) / denom
+
+        # 功率因数近似(需要有功+无功)
+        if COL_P_ACTIVE in df.columns and "p_reactive" in df.columns:
+            apparent = np.sqrt(df[COL_P_ACTIVE] ** 2 + df["p_reactive"] ** 2)
+            feat["power_factor"] = df[COL_P_ACTIVE] / apparent.replace(0, np.nan)
+
+        # 三相电流不平衡度
+        curr_cols = [c for c in _GRID_CURR if c in df.columns]
+        if len(curr_cols) >= 2:
+            curr = df[curr_cols]
+            mean_c = curr.mean(axis=1).replace(0, np.nan)
+            feat["curr_imbalance"] = curr.std(axis=1) / mean_c
+
+        # 三相电压不平衡度
+        volt_cols = [c for c in _GRID_VOLT if c in df.columns]
+        if len(volt_cols) >= 2:
+            volt = df[volt_cols]
+            mean_v = volt.mean(axis=1).replace(0, np.nan)
+            feat["volt_imbalance"] = volt.std(axis=1) / mean_v
+
+        # 频率偏差
+        if "grid_freq" in df.columns:
+            feat["freq_dev"] = df["grid_freq"] - 50.0
+
+        if not feat:
+            return pd.DataFrame()
+
+        return pd.DataFrame(feat, index=df.index).replace([np.inf, -np.inf], np.nan).dropna()
+
+    def fit(self, df: pd.DataFrame) -> "PowerQualityDetector":
+        feat = self._features(df)
+        if feat.empty or len(feat.columns) < 2:
+            raise ValueError("功率质量特征不足(至少需要 p_active + 一个辅助测点)")
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "PowerQualityDetector":
+        return joblib.load(path)
+
+
+# ── B. 运行状态综合检测器 ──────────────────────────────────────────────────────
+
+class OperationStateDetector:
+    """
+    特征工程:
+      - p_active(有功功率)
+      - gen_spd(发电机转速)
+      - 三桨叶实际桨距角均值、不一致度(std)
+      - twist_ang(扭缆角度)
+      - 功率/转速比(反映转矩状态)
+      - 桨距角均值 × 转速(协调特征)
+    所有测点均为可选,p_active 为必须项。
+    """
+    def __init__(self, contamination: float = ISO_CONTAMINATION):
+        self.scaler = StandardScaler()
+        self.model  = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        if COL_P_ACTIVE not in df.columns:
+            return pd.DataFrame()
+
+        feat = {COL_P_ACTIVE: df[COL_P_ACTIVE]}
+
+        if COL_ROTOR_SPD in df.columns:
+            feat[COL_ROTOR_SPD] = df[COL_ROTOR_SPD]
+            spd_safe = df[COL_ROTOR_SPD].replace(0, np.nan)
+            feat["p_per_spd"] = df[COL_P_ACTIVE] / spd_safe
+
+        # 三桨叶特征
+        act_cols = [c for c in [COL_PITCH_ACT_1, COL_PITCH_ACT_2, COL_PITCH_ACT_3]
+                    if c in df.columns]
+        if act_cols:
+            pitch_df = df[act_cols]
+            feat["pitch_mean"] = pitch_df.mean(axis=1)
+            if len(act_cols) >= 2:
+                feat["pitch_std"] = pitch_df.std(axis=1)
+            if COL_ROTOR_SPD in df.columns:
+                feat["pitch_x_spd"] = feat["pitch_mean"] * df[COL_ROTOR_SPD]
+
+        if COL_TWIST_ANG in df.columns:
+            feat[COL_TWIST_ANG] = df[COL_TWIST_ANG]
+
+        result = pd.DataFrame(feat, index=df.index)
+        return result.replace([np.inf, -np.inf], np.nan).dropna()
+
+    def fit(self, df: pd.DataFrame) -> "OperationStateDetector":
+        feat = self._features(df)
+        if feat.empty or len(feat.columns) < 2:
+            raise ValueError("运行状态特征不足(至少需要 p_active + 一个辅助测点)")
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "OperationStateDetector":
+        return joblib.load(path)

+ 232 - 0
models/pitch.py

@@ -0,0 +1,232 @@
+"""
+Module 3: 变桨系统异常检测
+
+算法: LocalOutlierFactor (LOF)
+  - 基于局部密度,适合检测多桨叶不一致的局部异常
+  - novelty=True 支持 fit/predict 分离(训练集拟合,推理时预测新数据)
+  - n_neighbors 自适应:min(20, len(train) // 50),避免小样本退化
+
+子检测器:
+  A. PitchRegulationDetector - 桨距角调节异常(设定值 vs 实际值,3个桨叶)
+  B. PitchCoordDetector      - 变桨-转速-功率协调异常
+  C. MinPitchDetector        - 最小桨距角异常(保留 IsolationForest,分布异常更合适)
+"""
+import pandas as pd
+import numpy as np
+from sklearn.neighbors import LocalOutlierFactor
+from sklearn.ensemble import IsolationForest
+from sklearn.preprocessing import StandardScaler
+import joblib
+from pathlib import Path
+
+from config import (
+    COL_PITCH_SET_1, COL_PITCH_SET_2, COL_PITCH_SET_3,
+    COL_PITCH_ACT_1, COL_PITCH_ACT_2, COL_PITCH_ACT_3,
+    COL_PITCH_SPD_1, COL_PITCH_SPD_2, COL_PITCH_SPD_3,
+    COL_ROTOR_SPD, COL_P_ACTIVE,
+    ISO_CONTAMINATION, ISO_RANDOM_STATE, ISO_N_ESTIMATORS,
+)
+
+PITCH_SET_COLS = [COL_PITCH_SET_1, COL_PITCH_SET_2, COL_PITCH_SET_3]
+PITCH_ACT_COLS = [COL_PITCH_ACT_1, COL_PITCH_ACT_2, COL_PITCH_ACT_3]
+PITCH_SPD_COLS = [COL_PITCH_SPD_1, COL_PITCH_SPD_2, COL_PITCH_SPD_3]
+
+
+# ── A. 桨距角调节检测器 (LOF) ──────────────────────────────────────────────────
+
+class PitchRegulationDetector:
+    """
+    特征:
+      - 每个桨叶的 (设定值-实际值) 偏差
+      - 三桨叶实际值不一致度(std)
+      - 三桨叶变桨速度均值、不一致度(若 pitch_spd_1/2/3 存在)
+    LOF 检测局部密度异常,适合多桨叶不同步场景。
+    """
+    def __init__(self, n_neighbors: int = 20, contamination: float = ISO_CONTAMINATION):
+        self.n_neighbors   = n_neighbors
+        self.contamination = contamination
+        self.scaler        = StandardScaler()
+        self.model         = LocalOutlierFactor(
+            n_neighbors=n_neighbors,
+            contamination=contamination,
+            novelty=True,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        feat = {}
+        for i, (s, a) in enumerate(zip(PITCH_SET_COLS, PITCH_ACT_COLS), 1):
+            if s in df.columns and a in df.columns:
+                feat[f"err_{i}"] = df[s] - df[a]
+            elif a in df.columns:
+                feat[f"err_{i}"] = pd.Series(np.nan, index=df.index)
+        act_cols = [c for c in PITCH_ACT_COLS if c in df.columns]
+        if len(act_cols) >= 2:
+            feat["act_std"] = df[act_cols].std(axis=1)
+        # 变桨速度特征(可选)
+        spd_cols = [c for c in PITCH_SPD_COLS if c in df.columns]
+        if len(spd_cols) >= 2:
+            spd_df = df[spd_cols]
+            feat["spd_mean"] = spd_df.mean(axis=1)
+            feat["spd_std"]  = spd_df.std(axis=1)
+        return pd.DataFrame(feat, index=df.index).dropna()
+
+    def fit(self, df: pd.DataFrame) -> "PitchRegulationDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("变桨调节特征为空,检查测点是否存在")
+        # 自适应 n_neighbors:避免小样本时 n_neighbors 过大导致 LOF 退化
+        adaptive_k = max(5, min(self.n_neighbors, len(feat) // 50))
+        if adaptive_k != self.n_neighbors:
+            self.model = LocalOutlierFactor(
+                n_neighbors=adaptive_k,
+                contamination=self.contamination,
+                novelty=True,
+            )
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "PitchRegulationDetector":
+        return joblib.load(path)
+
+
+# ── B. 变桨-转速-功率协调检测器 (LOF) ─────────────────────────────────────────
+
+class PitchCoordDetector:
+    """
+    特征: pitch_ang_act_1, rotor_spd, p_active 及衍生比值。
+    优化: 若三桨叶均存在,加入三桨叶均值、不一致度(std)特征,
+    替代单桨叶 pitch_ang_act_1,捕捉三桨叶整体协调异常。
+    LOF 检测三者协调关系的局部偏离。
+    """
+    REQUIRED = [COL_PITCH_ACT_1, COL_ROTOR_SPD, COL_P_ACTIVE]
+
+    def __init__(self, n_neighbors: int = 20, contamination: float = ISO_CONTAMINATION):
+        self.n_neighbors   = n_neighbors
+        self.contamination = contamination
+        self.scaler        = StandardScaler()
+        self.model         = LocalOutlierFactor(
+            n_neighbors=n_neighbors,
+            contamination=contamination,
+            novelty=True,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        if COL_ROTOR_SPD not in df.columns or COL_P_ACTIVE not in df.columns:
+            return pd.DataFrame()
+        d = df[[COL_ROTOR_SPD, COL_P_ACTIVE]].copy()
+        # 低转速时 p_per_spd 极度放大噪声,过滤掉
+        d = d[d[COL_ROTOR_SPD] > 5.0]
+        # 三桨叶一致性特征(优先使用全部三桨叶)
+        act_cols = [c for c in PITCH_ACT_COLS if c in df.columns]
+        if len(act_cols) >= 2:
+            pitch_df = df[act_cols].loc[d.index]
+            d["pitch_mean"] = pitch_df.mean(axis=1)
+            d["pitch_std"]  = pitch_df.std(axis=1)
+        elif COL_PITCH_ACT_1 in df.columns:
+            d["pitch_mean"] = df[COL_PITCH_ACT_1].loc[d.index]
+        else:
+            return pd.DataFrame()
+        d = d.dropna()
+        d["p_per_spd"]   = d[COL_P_ACTIVE] / d[COL_ROTOR_SPD]
+        d["pitch_x_spd"] = d["pitch_mean"] * d[COL_ROTOR_SPD]
+        return d.dropna()
+
+    def fit(self, df: pd.DataFrame) -> "PitchCoordDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("变桨协调特征为空,检查测点是否存在")
+        # 自适应 n_neighbors
+        adaptive_k = max(5, min(self.n_neighbors, len(feat) // 50))
+        if adaptive_k != self.n_neighbors:
+            self.model = LocalOutlierFactor(
+                n_neighbors=adaptive_k,
+                contamination=self.contamination,
+                novelty=True,
+            )
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "PitchCoordDetector":
+        return joblib.load(path)
+
+
+# ── C. 最小桨距角检测器 (IsolationForest) ─────────────────────────────────────
+
+class MinPitchDetector:
+    """
+    特征: 三桨叶实际值的最小值、均值、极差。
+    保留 IsolationForest:最小桨距角是全局分布异常,IF 更合适。
+    """
+    def __init__(self, contamination: float = ISO_CONTAMINATION):
+        self.scaler = StandardScaler()
+        self.model  = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        act_cols = [c for c in PITCH_ACT_COLS if c in df.columns]
+        if not act_cols:
+            return pd.DataFrame()
+        d = df[act_cols].dropna()
+        return pd.DataFrame({
+            "min_pitch":   d.min(axis=1),
+            "mean_pitch":  d.mean(axis=1),
+            "range_pitch": d.max(axis=1) - d.min(axis=1),
+        }, index=d.index)
+
+    def fit(self, df: pd.DataFrame) -> "MinPitchDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("最小桨距角特征为空,检查测点是否存在")
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "MinPitchDetector":
+        return joblib.load(path)

+ 178 - 0
models/wind_power.py

@@ -0,0 +1,178 @@
+"""
+Module 1: 风速-功率异常检测
+
+优化点:
+  - PowerCurveDetector 改为逐点输出:训练时保存 bin 级正常范围 (mean±σ),
+    predict 时将每个点映射到所属 bin,计算 z-score 作为逐点异常分数
+  - ScatterDetector 增加环境温度特征(如果 ambient_temp 列存在),
+    因为空气密度影响功率输出
+  - 训练/预测时过滤有效发电区间 (WIND_VALID_MIN ~ WIND_VALID_MAX)
+  - 分箱样本量权重过滤,样本不足的箱不参与训练
+
+子检测器:
+  A. PowerCurveDetector  - 基于分箱统计的功率曲线异常(z-score 逐点)
+  B. ScatterDetector     - 基于原始散点的风速-功率异常(IsolationForest)
+"""
+import numpy as np
+import pandas as pd
+from sklearn.ensemble import IsolationForest
+from sklearn.preprocessing import StandardScaler
+import joblib
+from pathlib import Path
+
+from config import (
+    COL_WIND_SPD, COL_P_ACTIVE, COL_AMBIENT_TEMP,
+    WIND_BIN_WIDTH, WIND_BIN_MIN, WIND_BIN_MAX,
+    ISO_CONTAMINATION, ISO_RANDOM_STATE, ISO_N_ESTIMATORS,
+    WIND_VALID_MIN, WIND_VALID_MAX,
+)
+
+# 分箱最小样本量,样本不足的箱不参与训练
+_MIN_BIN_COUNT = 30
+
+# 功率偏离阈值(标准差倍数),按风速区间分段
+# 低风速段波动大,阈值放宽;高风速段接近额定,阈值收紧
+_SIGMA_LOW  = 4.0   # < 8 m/s
+_SIGMA_MID  = 3.0   # 8~16 m/s
+_SIGMA_HIGH = 2.5   # > 16 m/s
+
+
+def _get_sigma(wind_spd: float) -> float:
+    if wind_spd < 8.0:
+        return _SIGMA_LOW
+    elif wind_spd <= 16.0:
+        return _SIGMA_MID
+    else:
+        return _SIGMA_HIGH
+
+
+def _bin_wind_speed(series: pd.Series) -> pd.Series:
+    bins = np.arange(WIND_BIN_MIN, WIND_BIN_MAX + WIND_BIN_WIDTH, WIND_BIN_WIDTH)
+    return pd.cut(series, bins=bins, labels=False)
+
+
+def _filter_valid_wind(df: pd.DataFrame) -> pd.DataFrame:
+    """过滤有效发电风速区间,排除切入/切出段噪声。"""
+    mask = (df[COL_WIND_SPD] >= WIND_VALID_MIN) & (df[COL_WIND_SPD] <= WIND_VALID_MAX)
+    return df[mask]
+
+
+# ── A. 功率曲线检测器(逐点输出) ──────────────────────────────────────────────
+
+class PowerCurveDetector:
+    """
+    训练: 按风速分箱统计每个 bin 的 (mean, std),保存为正常范围。
+    预测: 将每个数据点映射到所属 bin,计算功率偏离度 z-score,
+          按风速区间分段 sigma 判异常(低风放宽/高风收紧)。
+          输出与原始数据等长。
+    """
+    def __init__(self):
+        self.bin_stats: pd.DataFrame = pd.DataFrame()
+
+    def fit(self, df: pd.DataFrame) -> "PowerCurveDetector":
+        d = _filter_valid_wind(df[[COL_WIND_SPD, COL_P_ACTIVE]].copy())
+        d["wind_bin"] = _bin_wind_speed(d[COL_WIND_SPD])
+        stats = (
+            d.groupby("wind_bin")[COL_P_ACTIVE]
+            .agg(mean_power="mean", std_power="std", count="count")
+            .reset_index()
+            .dropna()
+        )
+        stats = stats[stats["count"] >= _MIN_BIN_COUNT]
+        # std 为 0 时用全局 std 兜底
+        global_std = d[COL_P_ACTIVE].std()
+        stats["std_power"] = stats["std_power"].replace(0, global_std)
+        if len(stats) < 3:
+            raise ValueError("功率曲线有效分箱不足")
+        self.bin_stats = stats
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        # 保留原始索引,最终结果与输入等长
+        out = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+
+        valid_mask = (
+            df[COL_WIND_SPD].notna() & df[COL_P_ACTIVE].notna() &
+            (df[COL_WIND_SPD] >= WIND_VALID_MIN) & (df[COL_WIND_SPD] <= WIND_VALID_MAX)
+        )
+        d = df.loc[valid_mask, [COL_WIND_SPD, COL_P_ACTIVE]].copy()
+        if d.empty:
+            return out
+
+        d["wind_bin"] = _bin_wind_speed(d[COL_WIND_SPD])
+        bin_map = self.bin_stats.set_index("wind_bin")[["mean_power", "std_power"]]
+        d["mean_power"] = d["wind_bin"].map(bin_map["mean_power"])
+        d["std_power"]  = d["wind_bin"].map(bin_map["std_power"])
+
+        has_stat = d["mean_power"].notna() & d["std_power"].notna()
+        d_stat = d.loc[has_stat].copy()
+        z = (d_stat[COL_P_ACTIVE] - d_stat["mean_power"]) / d_stat["std_power"]
+        sigma_arr = d_stat[COL_WIND_SPD].map(_get_sigma)
+        out.loc[d_stat.index, "score"]   = z.values
+        out.loc[d_stat.index, "anomaly"] = (z.abs() > sigma_arr).values
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "PowerCurveDetector":
+        return joblib.load(path)
+
+
+# ── B. 散点检测器 ─────────────────────────────────────────────────────────────
+
+class ScatterDetector:
+    """
+    对有效发电区间内的 (wind_spd, p_active) 点对做异常检测。
+    增加 p_active / wind_spd^3 特征(近似 Cp),捕捉风能利用率偏离。
+    可选: 如果数据中包含 ambient_temp 列,加入温度特征(空气密度影响功率)。
+    """
+    def __init__(self, contamination=ISO_CONTAMINATION):
+        self.contamination = contamination
+        self.scaler = StandardScaler()
+        self.model  = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+        self._has_temp = False
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        cols = [COL_WIND_SPD, COL_P_ACTIVE]
+        has_temp = COL_AMBIENT_TEMP in df.columns
+        if has_temp:
+            cols.append(COL_AMBIENT_TEMP)
+        d = _filter_valid_wind(df[cols].copy()).dropna()
+        wind3 = d[COL_WIND_SPD] ** 3
+        d["cp_proxy"] = d[COL_P_ACTIVE] / wind3.replace(0, np.nan)
+        if has_temp:
+            # 温度越低空气密度越大,同风速下功率应更高
+            d["temp"] = d[COL_AMBIENT_TEMP]
+        return d.dropna()
+
+    def fit(self, df: pd.DataFrame) -> "ScatterDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("散点特征为空,检查风速功率数据")
+        self._has_temp = COL_AMBIENT_TEMP in feat.columns
+        X = self.scaler.fit_transform(feat.select_dtypes(include=[np.number]))
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat.select_dtypes(include=[np.number]))
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "ScatterDetector":
+        return joblib.load(path)

+ 128 - 0
models/yaw.py

@@ -0,0 +1,128 @@
+"""
+Module 2: 偏航系统异常检测
+
+算法: IsolationForest
+  - 替代原 DBSCAN,原生支持 fit/predict 分离
+  - 日推理数据量小时 DBSCAN 聚类不稳定,IF 更鲁棒
+  - 自带 anomaly score,支持按严重程度排序
+
+子检测器:
+  A. StaticYawDetector   - 静态偏航角 (yaw_ang) 异常检测
+  B. CableTwistDetector  - 扭缆角度 (twist_ang) 异常检测
+"""
+import numpy as np
+import pandas as pd
+from sklearn.ensemble import IsolationForest
+from sklearn.preprocessing import StandardScaler
+import joblib
+from pathlib import Path
+
+from config import (
+    COL_YAW_ANG, COL_TWIST_ANG,
+    ISO_CONTAMINATION, ISO_RANDOM_STATE, ISO_N_ESTIMATORS,
+)
+
+
+# ── A. 静态偏航检测器 ──────────────────────────────────────────────────────────
+
+class StaticYawDetector:
+    """
+    特征: yaw_ang、短窗口滚动均值/标准差(2小时)、长窗口滚动均值(12小时)。
+    短窗口捕捉瞬时偏离,长窗口捕捉持续慢漂移。
+    IsolationForest 检测偏航角持续偏离正常分布的异常。
+    """
+    WINDOW_SHORT = 12   # 10min采样 × 12 ≈ 2小时
+    WINDOW_LONG  = 72   # 10min采样 × 72 ≈ 12小时
+
+    def __init__(self, contamination: float = ISO_CONTAMINATION):
+        self.scaler = StandardScaler()
+        self.model = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        s = df[COL_YAW_ANG].copy()
+        feat = pd.DataFrame({
+            "yaw_ang":        s,
+            "roll_mean_short": s.rolling(self.WINDOW_SHORT, min_periods=1).mean(),
+            "roll_std_short":  s.rolling(self.WINDOW_SHORT, min_periods=1).std().fillna(0),
+            "roll_mean_long":  s.rolling(self.WINDOW_LONG,  min_periods=1).mean(),
+        }, index=df.index)
+        return feat.dropna(subset=["yaw_ang"])
+
+    def fit(self, df: pd.DataFrame) -> "StaticYawDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("偏航特征为空,检查 yaw_ang 测点")
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "StaticYawDetector":
+        return joblib.load(path)
+
+
+# ── B. 扭缆角度检测器 ──────────────────────────────────────────────────────────
+
+class CableTwistDetector:
+    """
+    特征: twist_ang、绝对值、变化率。
+    IsolationForest 检测扭缆角度异常偏离。
+    """
+    def __init__(self, contamination: float = ISO_CONTAMINATION):
+        self.scaler = StandardScaler()
+        self.model = IsolationForest(
+            n_estimators=ISO_N_ESTIMATORS,
+            contamination=contamination,
+            random_state=ISO_RANDOM_STATE,
+        )
+
+    def _features(self, df: pd.DataFrame) -> pd.DataFrame:
+        s = df[COL_TWIST_ANG].copy()
+        feat = pd.DataFrame({
+            "twist_ang": s,
+            "abs_twist": s.abs(),
+            "delta":     s.diff().fillna(0),
+        }, index=df.index)
+        return feat.dropna(subset=["twist_ang"])
+
+    def fit(self, df: pd.DataFrame) -> "CableTwistDetector":
+        feat = self._features(df)
+        if feat.empty:
+            raise ValueError("扭缆特征为空,检查 twist_ang 测点")
+        X = self.scaler.fit_transform(feat)
+        self.model.fit(X)
+        return self
+
+    def predict(self, df: pd.DataFrame) -> pd.DataFrame:
+        out  = pd.DataFrame({"anomaly": False, "score": np.nan}, index=df.index)
+        feat = self._features(df)
+        if feat.empty:
+            return out
+        X = self.scaler.transform(feat)
+        out.loc[feat.index, "anomaly"] = self.model.predict(X) == -1
+        out.loc[feat.index, "score"]   = self.model.score_samples(X)
+        return out
+
+    def save(self, path: Path):
+        joblib.dump(self, path)
+
+    @classmethod
+    def load(cls, path: Path) -> "CableTwistDetector":
+        return joblib.load(path)

+ 223 - 0
train.py

@@ -0,0 +1,223 @@
+"""
+训练入口: 按机型遍历数据,训练所有异常检测模型并保存。
+
+训练数据筛选规则(与检测逻辑对称):
+  - '运行' 数据: 全部纳入训练
+  - '传感器异常' 数据: 该检测器关心的测点无传感器异常时纳入训练
+    例如: actual_torque 有异常,但 wind_spd/p_active 正常 → 可用于风速功率模型训练
+  - '停机' / '限功率': 不参与训练
+
+用法:
+    python train.py                    # 训练所有机型
+    python train.py --model 机型名称   # 只训练指定机型
+"""
+import argparse
+from pathlib import Path
+
+import joblib
+import pandas as pd
+
+from config import MODEL_SAVE_DIR
+from data_loader import list_model_types, load_model_type
+from labeler import get_model_statistics, label_dataframe, DETECTOR_SENSOR_COLS
+from models.wind_power import PowerCurveDetector, ScatterDetector
+from models.yaw import StaticYawDetector, CableTwistDetector
+from models.pitch import PitchRegulationDetector, PitchCoordDetector, MinPitchDetector
+from models.control_params import PowerQualityDetector, OperationStateDetector
+
+
+def get_model_dir(model_name: str) -> Path:
+    d = MODEL_SAVE_DIR / model_name
+    d.mkdir(parents=True, exist_ok=True)
+    return d
+
+
+def _filter_for_detector(labeled: pd.DataFrame, detector_name: str) -> pd.DataFrame:
+    """
+    返回可用于该检测器训练的数据:
+      - status == '运行'
+      - status 以 '传感器异常' 开头,且该检测器关心的传感器异常列全为 False
+    """
+    sensor_cols = DETECTOR_SENSOR_COLS.get(detector_name, [])
+    existing_sc = [c for c in sensor_cols if c in labeled.columns]
+
+    # 运行数据
+    df_run = labeled[labeled["status"] == "运行"]
+
+    # 传感器异常数据中,该检测器关心的测点无异常的行
+    df_sensor = labeled[labeled["status"].str.startswith("传感器异常")]
+    if not df_sensor.empty and existing_sc:
+        no_anom = ~df_sensor[existing_sc].any(axis=1)
+        df_sensor = df_sensor[no_anom]
+    elif not df_sensor.empty and not existing_sc:
+        # 该检测器无关联传感器列(如偏航),传感器异常数据也全部纳入
+        pass
+    else:
+        df_sensor = pd.DataFrame()
+
+    result = pd.concat([df_run, df_sensor], ignore_index=True)
+    print(f"  [训练数据] {detector_name}: 运行{len(df_run)} + 传感器异常可用{len(df_sensor)}"
+          f" = {len(result)} 行")
+    return result
+
+
+def train_wind_power(labeled: pd.DataFrame, model_dir: Path):
+    for cls, fname, det_name, label in [
+        (PowerCurveDetector, "wind_power_curve.pkl",  "wind_power_curve",  "功率曲线"),
+        (ScatterDetector,    "wind_power_scatter.pkl", "wind_power_scatter", "散点"),
+    ]:
+        df = _filter_for_detector(labeled, det_name)
+        if df.empty:
+            print(f"  [风速功率] {label}跳过(无可用数据)")
+            continue
+        try:
+            cls().fit(df).save(model_dir / fname)
+            print(f"  [风速功率] {label}模型已保存")
+        except Exception as e:
+            print(f"  [风速功率] {label}训练失败: {e}")
+
+
+def train_yaw(labeled: pd.DataFrame, model_dir: Path):
+    configs = [
+        ("yaw_ang",   "yaw_static.pkl", "yaw_static", "静态偏航", StaticYawDetector),
+        ("twist_ang", "yaw_twist.pkl",  "yaw_twist",  "扭缆",     CableTwistDetector),
+    ]
+    for req_col, fname, det_name, label, cls in configs:
+        if req_col not in labeled.columns:
+            print(f"  [偏航] {label}跳过(缺少 {req_col} 列)")
+            continue
+        df = _filter_for_detector(labeled, det_name)
+        if df.empty:
+            print(f"  [偏航] {label}跳过(无可用数据)")
+            continue
+        try:
+            cls().fit(df).save(model_dir / fname)
+            print(f"  [偏航] {label}模型已保存")
+        except Exception as e:
+            print(f"  [偏航] {label}训练失败: {e}")
+
+
+def train_pitch(labeled: pd.DataFrame, model_dir: Path):
+    # A. 调节异常 & C. 最小桨距角
+    for cls, fname, det_name, label in [
+        (PitchRegulationDetector, "pitch_regulation.pkl", "pitch_regulation", "调节"),
+        (MinPitchDetector,        "pitch_min.pkl",        "pitch_min",        "最小桨距角"),
+    ]:
+        if "pitch_ang_act_1" not in labeled.columns:
+            print(f"  [变桨] {label}跳过(缺少 pitch_ang_act_1)")
+            continue
+        df = _filter_for_detector(labeled, det_name)
+        if df.empty:
+            print(f"  [变桨] {label}跳过(无可用数据)")
+            continue
+        try:
+            cls().fit(df).save(model_dir / fname)
+            print(f"  [变桨] {label}模型已保存")
+        except Exception as e:
+            print(f"  [变桨] {label}训练失败: {e}")
+
+    # B. 协调异常
+    required = ["pitch_ang_act_1", "rotor_spd", "p_active"]
+    if not all(c in labeled.columns for c in required):
+        print(f"  [变桨] 协调跳过(缺少必要列)")
+        return
+    df2 = _filter_for_detector(labeled, "pitch_coord")
+    if df2.empty:
+        print(f"  [变桨] 协调跳过(无可用数据)")
+        return
+    try:
+        PitchCoordDetector().fit(df2).save(model_dir / "pitch_coord.pkl")
+        print(f"  [变桨] 协调模型已保存")
+    except Exception as e:
+        print(f"  [变桨] 协调训练失败: {e}")
+
+
+def train_control_params(labeled: pd.DataFrame, model_dir: Path):
+    # A. 功率质量检测器
+    df = _filter_for_detector(labeled, "ctrl_power_quality")
+    if not df.empty:
+        try:
+            PowerQualityDetector().fit(df).save(model_dir / "ctrl_power_quality.pkl")
+            print(f"  [运行状态] 功率质量模型已保存")
+        except Exception as e:
+            print(f"  [运行状态] 功率质量训练失败: {e}")
+    else:
+        print(f"  [运行状态] 功率质量跳过(无可用数据)")
+
+    # B. 运行状态综合检测器
+    df2 = _filter_for_detector(labeled, "ctrl_op_state")
+    if not df2.empty:
+        try:
+            OperationStateDetector().fit(df2).save(model_dir / "ctrl_op_state.pkl")
+            print(f"  [运行状态] 综合运行状态模型已保存")
+        except Exception as e:
+            print(f"  [运行状态] 综合运行状态训练失败: {e}")
+    else:
+        print(f"  [运行状态] 综合运行状态跳过(无可用数据)")
+
+
+def train_one(model_name: str):
+    print(f"\n{'='*50}")
+    print(f"开始训练机型: {model_name}")
+    model_dir = get_model_dir(model_name)
+
+    # ── 统一加载一次全量数据(所有列超集) ──
+    print(f"  [数据] 加载全量数据...")
+    _optional = [
+        "wind_spd", "gen_spd", "actual_torque",
+        "pitch_ang_set_1", "pitch_ang_set_2", "pitch_ang_set_3",
+        "pitch_ang_act_1", "pitch_ang_act_2", "pitch_ang_act_3",
+        "pitch_spd_1", "pitch_spd_2", "pitch_spd_3",
+        "rotor_spd", "yaw_ang", "twist_ang",
+        "theory_p_active", "p_reactive", "grid_freq",
+        "grid_ia", "grid_ib", "grid_ic",
+        "grid_ua", "grid_ub", "grid_uc",
+        "ambient_temp",
+    ]
+    df_raw = load_model_type(model_name, required_cols=["p_active"], optional_cols=_optional)
+    if df_raw.empty or "p_active" not in df_raw.columns:
+        print(f"  [数据] 跳过(无数据或缺少 p_active)")
+        return
+
+    stats = get_model_statistics(df_raw)
+    labeled = label_dataframe(df_raw, stats, model_name)
+    if labeled.empty:
+        print(f"  [数据] 打标后为空,跳过")
+        return
+
+    # 保存 stats 供推理时使用,避免推理时重新加载全量数据
+    joblib.dump(stats, model_dir / "model_stats.pkl")
+    print(f"  [数据] model_stats.pkl 已保存")
+
+    train_wind_power(labeled, model_dir)
+    train_yaw(labeled, model_dir)
+    train_pitch(labeled, model_dir)
+    train_control_params(labeled, model_dir)
+    print(f"机型 {model_name} 训练完成,模型保存至: {model_dir}")
+
+
+def main():
+    parser = argparse.ArgumentParser(description="风机异常检测模型训练")
+    parser.add_argument("--model", type=str, default=None, help="指定机型名称,不填则训练所有机型")
+    parser.add_argument("--list", action="store_true", help="列出所有可用机型后退出")
+    args = parser.parse_args()
+
+    if args.list:
+        model_types = list_model_types()
+        print(f"发现 {len(model_types)} 个机型:")
+        for i, mt in enumerate(model_types, 1):
+            print(f"  {i}. {mt}")
+        return
+
+    if args.model:
+        train_one(args.model)
+    else:
+        model_types = list_model_types()
+        print(f"发现 {len(model_types)} 个机型: {model_types}")
+        for mt in model_types:
+            train_one(mt)
+    print("\n全部训练完成。")
+
+
+if __name__ == "__main__":
+    main()