Retrieval metrics as decision surfaces
Viewing IR metrics as thresholded integrals.
Claim/Concept. Many IR metrics can be seen as thresholded integrals over relevance curves.
Example. Average Precision:
\mathrm{AP@k}=\frac{1}{\min(k,R)}\sum_{i=1}^{k} P(i)\cdot \mathrm{rel}(i).
Open questions - How does calibration drift affect AP comparability across runs?