Technical Reports (CIS)

Document Type

Technical Report

Date of this Version

April 1989


University of Pennsylvania Department of Computer and Information Science Technical Report No. MS-CIS-89-25.


We characterize learnability and non-learnability of subsets of Nm called 'semilinear sets', with respect to the distribution-free learning model of Valiant. In formal language terms, semilinear sets are exactly the class of 'letter-counts' (or Parikh-images) of regular sets. We show that the class of semilinear sets of dimensions 1 and 2 is learnable, when the integers are encoded in unary. We complement this result with negative results of several different sorts, relying on hardness assumptions of varying degrees - from P ≠ NP and RP ≠ NP to the hardness of learning DNF. We show that the minimal consistent concept problem is NP-complete for this class, verifying the non-triviality of our learnability result. We also show that with respect to the binary encoding of integers, the corresponding 'prediction' problem is already as hard as that of DNF, for a class of subsets of Nm much simpler than semilinear sets. The present work represents an interesting class of countably infinite concepts for which the questions of learnability have been nearly completely characterized. In doing so, we demonstrate how various proof techniques developed by Pitt and Valiant [14], Blumer et al. [3], and Pitt and Warmuth [16] can be fruitfully applied in the context of formal languages.



Date Posted: 03 January 2008