Documentation

Cslib.MachineLearning.PACLearning.VersionSpace

Version Space #

The version space of a concept class C given a labeled sample S is the subset of C whose concepts agree with S on every observed point — the classical "concepts still consistent with the data" of Mitchell (1977) and Angluin (1980).

Main definitions #

Main results #

References #

Version Space #

def Cslib.MachineLearning.PACLearning.VersionSpace {α : Type u_1} {β : Type u_2} {m : } (C : ConceptClass α β) (S : LabeledSample α β m) :

The version space of a concept class C given a labeled sample S: the set of concepts in C whose labels agree with S on every observed point.

Equations
Instances For
    theorem Cslib.MachineLearning.PACLearning.mem_versionSpace_iff {α : Type u_1} {β : Type u_2} {m : } {C : ConceptClass α β} {S : LabeledSample α β m} {h : αβ} :
    h VersionSpace C S h C ∀ (i : Fin m), h (S i).1 = (S i).2

    Membership in the version space unfolds to concept membership plus per-sample consistency.

    theorem Cslib.MachineLearning.PACLearning.versionSpace_subset {α : Type u_1} {β : Type u_2} {m : } (C : ConceptClass α β) (S : LabeledSample α β m) :

    The version space is a subset of the original concept class.

    Version space on the empty sample equals the whole concept class.

    theorem Cslib.MachineLearning.PACLearning.versionSpace_reindex {α : Type u_1} {β : Type u_2} {m n : } (f : Fin mFin n) (C : ConceptClass α β) (S : LabeledSample α β n) :

    Version space reindexing. For any reindexing f : Fin m → Fin n, the version space on S is contained in the version space on the reindexed sample S ∘ f.

    theorem Cslib.MachineLearning.PACLearning.versionSpace_antitone {α : Type u_1} {β : Type u_2} {m n : } (hmn : m n) (C : ConceptClass α β) (S : LabeledSample α β n) :

    Version space antitonicity. Given a sample of size n and m ≤ n, the version space on all n observations is a subset of the version space on the first m observations. Special case of versionSpace_reindex with f := Fin.castLE hmn.

    theorem Cslib.MachineLearning.PACLearning.versionSpace_mono_C {α : Type u_1} {β : Type u_2} {m : } {C C' : ConceptClass α β} (hCC' : C C') (S : LabeledSample α β m) :

    Version space is monotone in the concept class.

    Empirical Error #

    def Cslib.MachineLearning.PACLearning.empiricalMiscount {α : Type u_1} {β : Type u_2} [DecidableEq β] {m : } (h : αβ) (S : LabeledSample α β m) :

    The empirical miscount of a hypothesis h on a labeled sample S: the number of sample points where h predicts incorrectly.

    Equations
    Instances For
      noncomputable def Cslib.MachineLearning.PACLearning.empiricalMeasure {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] {m : } (S : LabeledSample α β m) :

      The empirical distribution of a labeled sample: the uniform mixture of Dirac measures at each sample point. Equals the zero measure when m = 0.

      Equations
      Instances For
        noncomputable def Cslib.MachineLearning.PACLearning.empiricalError {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] {m : } (h : αβ) (S : LabeledSample α β m) :

        The empirical 0-1 error of h on S: the empirical distribution's mass on the disagreement set.

        Equations
        Instances For
          theorem Cslib.MachineLearning.PACLearning.mem_versionSpace_iff_empiricalMiscount_zero {α : Type u_1} {β : Type u_2} [DecidableEq β] {m : } {C : ConceptClass α β} {S : LabeledSample α β m} {h : αβ} :

          Version-space membership equals concept-class membership plus zero empirical miscount (combinatorial bridge).

          Version-space membership equals concept-class membership plus zero empirical error (measure-theoretic bridge).

          theorem Cslib.MachineLearning.PACLearning.empiricalError_eq_div {α : Type u_1} {β : Type u_2} [DecidableEq β] [MeasurableSpace α] [MeasurableSpace β] [MeasurableSingletonClass α] [MeasurableSingletonClass β] {m : } (hm : 0 < m) (h : αβ) (S : LabeledSample α β m) :

          The empirical 0-1 error equals the empirical miscount divided by the sample size.

          Consistent Learners #

          def Cslib.MachineLearning.PACLearning.IsConsistent {α : Type u_1} {β : Type u_2} {m : } (A : Learner α β m) (C : ConceptClass α β) :

          A learner is consistent with the concept class C if, on every labeled sample it receives, its output hypothesis lies in the version space of C at that sample — i.e. the output is in C and agrees with every observed labeled pair.

          Equations
          Instances For
            theorem Cslib.MachineLearning.PACLearning.IsConsistent.output_mem_conceptClass {α : Type u_1} {β : Type u_2} {m : } {A : Learner α β m} {C : ConceptClass α β} (hA : IsConsistent A C) (S : LabeledSample α β m) :
            A S C

            A consistent learner's output is always in the concept class.

            theorem Cslib.MachineLearning.PACLearning.IsConsistent.output_agrees {α : Type u_1} {β : Type u_2} {m : } {A : Learner α β m} {C : ConceptClass α β} (hA : IsConsistent A C) (S : LabeledSample α β m) (i : Fin m) :
            A S (S i).1 = (S i).2

            A consistent learner's output agrees with the sample on every observed point.

            theorem Cslib.MachineLearning.PACLearning.IsConsistent.empiricalMiscount_eq_zero {α : Type u_1} {β : Type u_2} [DecidableEq β] {m : } {A : Learner α β m} {C : ConceptClass α β} (hA : IsConsistent A C) (S : LabeledSample α β m) :

            A consistent learner has zero empirical miscount on every sample.

            A consistent learner has zero empirical error on every sample.

            Realizable case #

            def Cslib.MachineLearning.PACLearning.Realizable {α : Type u_1} {β : Type u_2} {m : } (C : ConceptClass α β) (S : LabeledSample α β m) :

            A labeled sample S is realizable by concept class C if some concept in C labels every sample point correctly.

            Equations
            Instances For
              theorem Cslib.MachineLearning.PACLearning.mem_versionSpace_of_realizable {α : Type u_1} {β : Type u_2} {m : } {C : ConceptClass α β} {c : αβ} (hc : c C) (S : LabeledSample α β m) (hS : ∀ (i : Fin m), (S i).2 = c (S i).1) :

              Realizable version-space nonemptiness. If a target concept c lies in C and the sample S is labeled by c (i.e. every (S i).2 = c (S i).1), then c itself lies in the version space VersionSpace C S.

              A realizable sample has nonempty version space.

              Probabilistic Realizable #

              theorem Cslib.MachineLearning.PACLearning.ae_mem_versionSpace_of_realizable {α : Type u_1} {β : Type u_2} [MeasurableSpace α] [MeasurableSpace β] {C : ConceptClass α β} {c : αβ} (hc : c C) (hcm : Measurable c) (hG : MeasurableSet {p : α × β | p.2 = c p.1}) (P : MeasureTheory.Measure α) [MeasureTheory.IsProbabilityMeasure P] (m : ) :
              ∀ᵐ (S : LabeledSample α β m) MeasureTheory.Measure.pi fun (x : Fin m) => MeasureTheory.Measure.map (fun (x : α) => (x, c x)) P, c VersionSpace C S

              Under iid sampling from the realizable joint distribution induced by c ∈ C and a probability measure P on α, the target concept c lies in the version space almost surely.