Machine Learning at the Flatiron Institute Seminar: David Hogg
Title: Is good machine learning bad for science?
Abstract: ML is now baked into many scientific projects. I give two example contexts in which even very well-trained, very high-performance machine-learning methods can bias or compromise scientific projects. In one, a down-stream user of the output of a discriminative classification or regression will generally find that their scientific results are strongly (sometimes catastrophically) biased. This is not because the discriminative method is wrong; it is because it outputs posterior properties, usually with a prior that cannot be removed or accounted for. In another, a project that makes use of an emulator to speed computations will be faced with a verification problem that is either impossible to execute or else strongly confirmation-biased. There may be technical solutions to both of these problems; I will state the open questions as precisely as I can.