Evaluation Metrics 101
Building an End to End Machine Learning Solution is hard. It would cost you tons of resources and time. More often ML component of your project becomes relatively small compared to the other aspects of it. Nevertheless, the actual value generated by the solution is still reliant on this component. Therefore the decision on "whether to productionalize a model?" is heavily swayed by the performance of the model. But how do you measure the performance of the model even before people start using it? Trusty KPIs are not yet available, so how do you decide if a model is worthy of becoming an end to end solution? Model Performance Metrics The answer to the question above is quite simple. You test the model with questions, to which you already know the answers. Then you arrive at a number that is meaningful for the task that your model is performing. Depending on the context you may want this number to be higher or lower. Where do you get these queries and corresponding ground truth