I support the use of well designed algorithms to achieve positive results in a variety of fields. Following a lot of deserved bad publicity for the Ofqual 2020 results algorithm, I intend to demonstrate that this due to bad implementation and not simply because it was an algorithm. Sources: [A] Awarding results summer 2020 https://assets.publishing.service.g...fications_in_summer_2020_-_interim_report.pdf [D] Data requirements https://assets.publishing.service.g...on_of_results_in_summer_2020_inc._Annex_G.pdf Plus my own school’s results statements. The intention: The intention of the algorithm was to take a measure of the school’s previous performance, adjust it to allow for the potential of this year’s cohort to produce an ‘expected’ set of grades for the school. This could then be compared with the teacher ranking and predictions to adjust grades for individual students. The algorithm was NOT allocating grades to students or evaluating their performance. It was measuring and adjusting for over-optimism from teachers. This is what our school maths department would have done if we were given the task. It’s probably the least bad method available. The method (as applied to A Level Maths): Start by averaging the previous 3 years set of results (2017-2019) to give %A*, %A, %B, %C, %D, … for the subject at that school. [A, 8.2.1, p85] [D, Annex B, p8][D, X1, p31] Create a matrix showing the chance of obtaining each A-Level grade for each decile of GCSE APS, based on 2019 data only, for the whole country. [A, 8.2.2, p86][D, Annex D, p20][D, X2, p32, X3, p33] This means group GCSE students into ability bands, and for each band find a %A*, %A, %B... For the school, use the matrix to calculate the grades it should have achieved in 2019 (by counting how many students are in each of the groups and adding up the % chances of achieving each A Level grade for each of them), [A, 8.2.3, p89][D Annex D, p22][D, X4, p34] and the grades it should have achieved in 2020 [A, 8.2.4, p90][D, X5, p35], based only on the GCSE APS of the cohorts at the school. Subtract one from the other to make an adjustment. (There’s an extra bit here about dealing with not all the students matching I’ve left out) Apply this adjustment to the previous results [A, 8.2.6, p93][D, X7, p37] This gives you an expected 2020 %A*, %A, %B.... Allocate grades to students based on the rank in the school. [A, 8.2.7, p94] Allocate a score based on how far up or down each rank the students were. [A,8.2.8, p95] [D, X8, p38] Compare all the scores nationally, and calculate grade boundaries based on desired % grades at each level. [A,8.2.9, p97]Allocate students their grade. The result: This gave us the worst set of results in the school’s recorded history, for a cohort within the range of starting ability of those in the last 3 years. This is clearly a wrong answer. Where are the flaws? The big flaw is the difference in the number of years between calculating prior attainment (3 years) and calculating cohort ability (1 year). The performance is measured over 3 years of results, but the ability level that adjusts those results is only one year, one third of the students who achieved the results. To emphasise, we are starting with a set of results, and adjusting them based on the ability of A GROUP OF STUDENTS WHO DO NOT REPRESENT THOSE RESULTS. You can justify using 2017-9 data and 2019 abilities if you assume there is NO difference between cohorts. You can justify making an adjustment for ability for 2020 if you assume there IS a difference between cohorts. When you do both at once you have a CONTRADICTION built into your model. To illustrate the effects of this: 3 schools with identical prior performance and identical cohort abilities will get 3 different predictions!!! School A: results: 2017 poor 2018 average 2019 good cohort: 2017 poor 2018 average 2019 good 2017-2019 average overall, 2020 average cohort: results adjusted down (good 2019 cohort becomes average) School B: results: 2017 good 2018 average 2019 poor cohort: 2017 good 2018 average 2019 poor 2017-2019 average overall, 2020 average cohort: results adjusted up (poor 2019 cohort becomes average) School C: results: 2017 good 2018 poor 2019 average cohort: 2017 good 2018 poor 2019 average 2017-2019 average overall, 2020 average cohort: results stay same (average 2019 cohort stays average) [Same issue for all A level subjects to differing degrees – D Annex D p22 – all start with 3 years data, none use 3 years data for the ability adjustments] The final stage of the model: ranking all students nationally, calculating cut offs and allocating grades, should not have been needed. In the first 2 stages, schools have grades allocated on past performance and ability of cohort. That should have produced a correct set of national grades. To the extent that grades had to be altered further (and ours were, substantially), that would be due to the bad implementation for the first 2 stages. There are other flaws that have been highlighted. Clearly checking the sense of individual centre results failed. But this is why the biggest core of the model, for standard centres, gave wrong results. I sympathise with whoever, or whichever group of people were tasked to design and implement the algorithm. It was a massive task. But the reason for early collaboration with data holders (schools) and use of expert help (RSS, universities) is that when you get it wrong, everyone can see. I'm reasonably confident that I've read the documents correctly, but happy to be corrected if there are any mistakes.