Selection of data-generating experiments identifiability and expected P-values

Yannis G. Yatracos Yau Mathematical Sciences Center Tsinghua University, Beijing; Beijing Institute of Mathematical Sciences and Applications

Machine Learning mathscidoc:2206.41013

In a Data-Generating Experiment (DGE), the data, X, is often obtained either from a Black-Box with inputs θ and Y, or from a Quantile function or a learning machine, f(Y, θ); θ is unknown, element of metric space (Θ, ρ), Y is random. If X has intractable or unknown c.d.f., Fθ, non-identifiability of θ cannot be confirmed and when present, among others, limits the predictive accuracy of the learned model, f(Y, \hat{θ}); \hat{θ} estimate of θ. In Machine Learning, non-identifiability of θ is ubiquitous and its extent is a criterion for selecting a learning machine. Empirical indices, EDI and PPVI, are introduced using P-values of Kolmogorov-Smirnov tests: i) to confirm almost surely, using generated data, the discrimination of θ from θ^∗, namely that the Kolmogorov distance, dK(Fθ, Fθ^∗), is positive, ii) to confirm identifiability of θ(∈ Θ) by repeating i) for θ^∗ in a sieve of Θ, since neighboring parameter values are in practice indistinguishable, and iii) most important, to compare EDI-graphs of DGEs, preferring more discrimination and less non-identifiability among parameters, and select one DGE to use. In applications, EDI-graphs confirm nonidentifiability in mixture models and in models parametrised with sums of parameters. EDI and PPVI explain why Tukey’s g-and-h model (DGE1) has better g-discrimination than the g-and-k model (DGE2), unless the sample size is extremely large; h_0 = k_0. EDIgraphs indicate that Normal learning machines have better parameter discrimination thanSigmoid learning machines and their parameters are non-identifiable.
No keywords uploaded!
[ Download ] [ 2022-06-21 16:40:30 uploaded by yatracos ] [ 32 downloads ] [ 0 comments ]
  title={Selection of data-generating experiments identifiability and expected P-values},
  author={Yannis G. Yatracos},
Yannis G. Yatracos. Selection of data-generating experiments identifiability and expected P-values. 2022.
Please log in for comment!
Contact us: | Copyright Reserved