Version Space #
The version space of a concept class C given a labeled sample S is the
subset of C whose concepts agree with S on every observed point — the
classical "concepts still consistent with the data" of Mitchell (1977) and
Angluin (1980).
Main definitions #
VersionSpace C S: the subset ofCwhose concepts agree withSon every sample point.IsConsistent A C: a learner is consistent withCif its output always lies in the version space at the received sample.empiricalMiscount h S: number of sample points whereherrs ([DecidableEq β]).empiricalMeasure S: the uniform Dirac mixture over the sample.empiricalError h S: the empirical distribution's mass on the disagreement set.Realizable C S: some concept inClabels every sample point correctly.
Main results #
versionSpace_subset,versionSpace_empty_sample,versionSpace_reindex,versionSpace_antitone,versionSpace_mono_C: structural properties.mem_versionSpace_iff_empiricalMiscount_zero: combinatorial bridge.mem_versionSpace_iff_empiricalError_zero: measure-theoretic bridge.IsConsistent.empiricalMiscount_eq_zero,IsConsistent.empiricalError_eq_zero: consistent learners achieve zero error / miscount on every sample.mem_versionSpace_of_realizable,Realizable.versionSpace_nonempty: realizable samples give non-empty version spaces.ae_mem_versionSpace_of_realizable: under iid sampling from a realizable joint distribution, the target concept lies in the version space almost surely.
References #
Version Space #
The version space of a concept class C given a labeled sample S:
the set of concepts in C whose labels agree with S on every observed point.
Equations
Instances For
Membership in the version space unfolds to concept membership plus per-sample consistency.
The version space is a subset of the original concept class.
Version space on the empty sample equals the whole concept class.
Version space reindexing. For any reindexing f : Fin m → Fin n, the
version space on S is contained in the version space on the reindexed sample
S ∘ f.
Version space antitonicity. Given a sample of size n and m ≤ n, the
version space on all n observations is a subset of the version space on the
first m observations. Special case of versionSpace_reindex with
f := Fin.castLE hmn.
Version space is monotone in the concept class.
Empirical Error #
The empirical miscount of a hypothesis h on a labeled sample S: the
number of sample points where h predicts incorrectly.
Equations
- Cslib.MachineLearning.PACLearning.empiricalMiscount h S = {i : Fin m | h (S i).1 ≠ (S i).2}.card
Instances For
The empirical distribution of a labeled sample: the uniform mixture of
Dirac measures at each sample point. Equals the zero measure when m = 0.
Equations
- Cslib.MachineLearning.PACLearning.empiricalMeasure S = if _hm : m = 0 then 0 else (↑m)⁻¹ • ∑ i : Fin m, MeasureTheory.Measure.dirac (S i)
Instances For
The empirical 0-1 error of h on S: the empirical distribution's
mass on the disagreement set.
Equations
Instances For
Version-space membership equals concept-class membership plus zero empirical miscount (combinatorial bridge).
Version-space membership equals concept-class membership plus zero empirical error (measure-theoretic bridge).
The empirical 0-1 error equals the empirical miscount divided by the sample size.
Consistent Learners #
A learner is consistent with the concept class C if, on every labeled
sample it receives, its output hypothesis lies in the version space of C at
that sample — i.e. the output is in C and agrees with every observed
labeled pair.
Equations
Instances For
A consistent learner's output is always in the concept class.
A consistent learner's output agrees with the sample on every observed point.
A consistent learner has zero empirical miscount on every sample.
A consistent learner has zero empirical error on every sample.
Realizable case #
A labeled sample S is realizable by concept class C if some concept
in C labels every sample point correctly.
Equations
- Cslib.MachineLearning.PACLearning.Realizable C S = ∃ c ∈ C, ∀ (i : Fin m), (S i).2 = c (S i).1
Instances For
Realizable version-space nonemptiness. If a target concept c lies in
C and the sample S is labeled by c (i.e. every (S i).2 = c (S i).1),
then c itself lies in the version space VersionSpace C S.
A realizable sample has nonempty version space.
Probabilistic Realizable #
Under iid sampling from the realizable joint distribution induced by
c ∈ C and a probability measure P on α, the target concept c lies in
the version space almost surely.