Correctness and completeness are typically used to evaluate software quality. The idea can be extended to just about any task. Correctness usually reflects if the program works as expected for all test inputs. Completeness refers to the availability of all features requested as per the requirement. In software testing parlance, a program is required to be correct and complete to the best extent possible.
This terminology sounds strikingly similar to precision and recall. In the Information Retrieval community, precision is defined as the fraction of relevant results retrieved over all the retrieved results. Recall refers to the fraction of relevant results retrieved over all relevant results found in the system.
In our research work of ANNE, we tag code snippets with natural language phrases. For example, “[ ]” is tagged as “array”. In this context, instead of the information retrieval terminology of precision, we chose to use “correctness” for the tagging task. It was more intuitive since we are not searching for anything here. Similarly, completeness refers to the fact if all entities are discovered and tagged. It makes more sense over recall.
Machine Learning community, especially the classifier builders use “Accuracy” which is not exactly the same as a combination of precision and recall, say “f-score”. Let us say that you are developing a binary classifier to predict if a tumor is malignant or benign. You would calculate accuracy as the fraction of the number of correct classification over the entire set that was classified. This sounds exactly like precision. The only difference is we do not have a restricted set of results. For instance, there is no Accuracy@10 like we do with precision@10.
I believe, the ideas as similar. Terminology is different. Depending on the task at hand, we apply our best judgment to select the most appropriate measure.