A belated congratulations

Written By: - Date published: 3:54 pm, May 26th, 2018 - 22 comments
Categories: greens, labour, public services - Tags: , , , , , , , ,

It’s finally happened.

I know I have been harsh on Minister Clare Curran, but she’s actually done something positive, in her portfolio, that might achieve something tangible domestically, however small, and I confess to being just a little bit excited. I am harsh because I genuinely care about the portfolios she has being treated well and delivering for us all, and this is the first news I actually feel like she’s broadly on the right track and has done something substantive.

Together with Greens co-leader and in this case more relevantly, Statistics Minister James Shaw, she has set up a stocktake and review of all government algorithms. I offer no conclusions on who led the initiative, and frankly it doesn’t bother me if it gets claimed jointly even if it was Shaw’s idea- Curran has clearly bought into it either way. This may sound like a technical and bureaucratic change, and in some ways it is, but remember, that dumb Immigration NZ fiasco all came about because a half-arsed spreadsheet model started going into actual use, with no auditing, and no advance transparency of how INZ planned to automate decisions, or internal justification of why or how a spreadsheet model would be an appropriate basis for decision-making, and we probably only ended up stopping it because Golriz speaking out about it embarrassed the government into having another look. Proper automation as a starting point for making decisions, with human review by relevant staff who will be expected to explain why or why not they followed the algorithm’s recommendation in their relevant file notes, can be a good thing.

But only where the model is robust, itself free of both explicit and implicit discriminatory factors, (thus acting to reduce discrimination by making the model’s decision the baseline case for humans to check against) and if the algorithm for the decision is publicly available for free critique, such algorithms can reduce bias, increase consistency, cut red tape, and lower complaint rates when utilized strategically.

Here’s a summary of the salient points:

  • The review defines an algorithm as: “when computer programs search for patterns in relevant data, to help model potential outcomes that could occur given different circumstances.”
  • Stage 1 will finish in August.
  • It is intended to increase transparency and accountability of data usage.
  • It will develop new guidelines for government agencies, setting a consistent standard.

Reviewing all automated decision-making and data analysis throughout government is an excellent step in deconstructing National’s failed social investment model, already embedded in many of the Ministries in the worst state after the last Government, and is clearly necessary to ensure we don’t have any more departments going rogue in how they make decisions.

The definition above is a reasonable starting point, although it ought to explicitly include scripts used inside documents and webpages, so that all departments are clear that spreadsheets or internal websites can be models or contain algorithms with decision-making or advisory powers. Galloway got in trouble precisely because he thought he could get away with claiming that a script is not a “real” program, as if using something you regarded so dismissively was somehow better. In fact programs themselves are nothing more than large chains of scripts, possibly with some user interface thrown in to pad the user from all of the maths and simplify their tasks down a little bit.

I like the goals they’ve included, but I do think there’s an obvious one missing: Why not commit to making all algorithms used in modelling for decision-making publicly available at a deadline to be determined? Ministerial algorithms are going to become something a lot like sub-laws going forward, where they will govern the expected way certain government departments act. I expect Shaw is already onto this with Stats as-is, of course, but Curran can get the rest of government set on the right track.

On this subject, I helped with the design of a calculation spreadsheet for EQC small-claims cash settlement during my time there, (and many other sheets to model or report information that was pertinent to management, rather than customers) and even though it was a spreadsheet we treated the thing very seriously and even had to get managerial sign-off for it afterwards despite them commissioning it in the first place, because we knew if we made any mistakes or left anything relevant out of the spreadsheet, it would potentially guide people into making incorrect decisions. (We still constructed at least a good four or five revisions of that spreadsheet afterwards in my time there, of course, as policy evolved, new needs emerged, or we simply found assumptions being made that real-world claims would break) That is how all design of government algorithms should be approached, and it’s not unreasonable for people to know the maths behind how their decision was made where such a thing applies, (at least so long as that maths doesn’t fall squarely into one of the deniable OIA categories, such as National Security or economic sensitivity) and we would expect under the OIA principle of gradually expanding openness that we will increase the sorts of information released under the OIA or proactively over time anyway, so this is really just getting the public sector’s legal obligations out of the way on the front foot.

And if the objection is around writing a briefing on all of those algorithms… well, if us “techwizards” can’t explain it to ordinary people to some meaningful degree, (which we should be doing anyway for the decision-making type of algorithm, because they need managerial approval) honestly, we probably don’t understand it ourselves well enough to use it to be making decisions.

So, my sincere congratulations to Minister Curran: I hope to see more positive initiatives in the future, and I hope to see positive results on this soon.

22 comments on “A belated congratulations ”

  1. OnceWasTim 1

    @ Mathew.
    I well remember that “dumb” immigration algorithm, and there’s no doubt there are many others spread across our civil service.
    The depressing thing is that those that designed and implemented it see no wrong in having done so. Along with a review, there needs to be a cultural change – whether that’s done by way of the existing framework (such as bloody purchase agreements and KPIs), or whether it’s by way of a complete review a state agencies and the way they operate (or don’t operate)
    Simply reviewing the algorithmic approach is not actually enough. (see OM 1.3 and below)

    • Matthew Whitehead 1.1

      I don’t disagree, which is why you’ll note I say we need humans checking algorithmic results every single time and justifying why they are either appropriate or inappropriate. 🙂

      Over-emphasis on KPIs is indeed insidious. A person who does excellent work but is behind KPI may not improve their work by speeding up, and may be a much better employee than one who meets or smashes KPIs but makes frequent mistakes or causes unnecessary friction within your organization.

      • OnceWasTim 1.1.1

        Ae. In full agreement – it’s just that despite all the evidence, we never seem to learn.

        I’d almost put money on MPI (and MoBIE and WINZ for that matter) having met most of their KPI’s

        Btw, I’ll reread when I find my bloody specs

      • Nic the NZer 1.1.2

        “I don’t disagree, which is why you’ll note I say we need humans checking algorithmic results every single time and justifying why they are either appropriate or inappropriate.”

        I think this is idea runs into a bit of trouble, and would put it differently. First of all we should note that an algorithm does not require a computer. Its just a series of steps (maybe on some data or parameters) to produce a particular result (the result may be as simple as a yes/no answer).

        Probably a reasonable way for the review to conclude is to require algorithms which the government uses to make decisions should be made available to members of the public where an algorithmic decision has been applied to them and that should include the data and parameters required to re-produce the algorithms decision in their circumstances. So concretely, if WINZ denies somebody support they should know when they were denied support due to their income being too high the person should know what income is too high and how much income WINZ believes they received.

        Algorithms implemented on computers can have bugs, hopefully the examples of this are rare and negligible. The problem here is if your algorithm produces some incorrect answers and in those circumstances you expect to detect that and use a wider decision making process then in that case you are simply making a more general algorithm of the same problem. It still matters that there are rules which can be followed in place to make these decisions however. But the staff making decisions still need some framework of rules to decide how they will make decisions even when they don’t have a simple and strict framework of rules to follow.

        Maybe the review could conclude something like a requirement that government algorithms producing a decision should always conclude one one of three cases, Yes/No and Don’t Know. Where the third case indicates that the algorithm did not have sufficient information to conclude Yes or No and conversely Yes and No can only be concluded for examples where the algorithm has enough information to make a decision. Higher level human intervention would happen in the Don’t Know cases, if the algorithm was initially not involving a human then this may be when the case is handed up to a higher level manager.

        The other problem with getting these results is if your algorithm involves a statistical model which is aggregated from data it loses all connection to the real world (statistical mathematical models can’t be proven to be good models of the real world by statistics). A statistical model will have trouble meeting the Don’t Know criteria above (it will always produce a Yes or No answer but if that’s wrong we will have no way to detect that from how its produced), and also the parameters criteria (which could be to some extent mitigated by gathering the statistics and then publishing the aggregates of those so people can find out which category they have been placed inside). This should severely limit the use of such statistical algorithms in government departments. On the other hand I was extremely skeptical of the value of the Social Investment/Big Data initiative amounting to anything of value due to the built in reliance on statistical models.

        • Incognito 1.1.2.1

          Nice comments.

          What do you mean by “[A] statistical model will have trouble meeting the Don’t Know criteria above (it will always produce a Yes or No answer …”?

          Many models put out p-values or (other) coefficients; these can be turned into a simplistic decision-making rubric such as the one you describe, e.g. by using the traffic-light system (Green=Yes; Red=No; Orange=Don’t Know).

          All models need validation/calibration – think of all the red-light cameras that are not operational; they work through algorithms too.

          The description (‘definition’) of “algorithm” used in the Government announcement as pattern recognition does not equate directly to decision-making IMO. For example, in weather forecast big data are analysed in order to make a forecast (e.g. 30% probability of a heavy storm/hurricane that could cause widespread or local damage and/or flooding); it’s up to the people to then plan accordingly and make decisions based on that forecast.

          I believe that Curran and Shaw either don’t have a good idea of what they’re tackling or that the review will be much more limited (‘focussed’) in the terms of reference & scope than one would think (and hope for!).

          • Nic the NZer 1.1.2.1.1

            “What do you mean by” …

            “Many models put out p-values or (other) coefficients; these can be turned into a simplistic decision-making rubric such as the one you describe”

            It’s a bit more fundamental than p-values. Maybe you can use them to some extent to point out the cases which need further review over a decision. But fundamentally a statistical algorithm can never bridge the gap from known cases to the future cases where it’s invalid (maybe even the statistical model never was valid).

            To use a weather forecasting example, maybe my forecast (for some place) says its going to rain 50% of the days over winter. That is my forecast every day has a 50% chance of rain, its based of last winter when half of the days it rained at that location. But this winter it rained 70% of the days. My forecast was wrong by on average 20% (e.g I estimate how wrong it was by averaging (1 – 0.5) when it rained and (0 – 0.5) when it didn’t rain and I was off by 0.2. So the question is, was my model wrong or was I just unlucky?

            Yes, my forecast model here is a very very simplified one based on only a climatology, however these issues still apply to more sophisticated examples. Also I am just saying this can’t be judged from statistics alone, a strong realistic model informed by scientific understanding can draw often answer the above question.

            I also think if they manage to get too clear a definition of algorithm for the review then the conclusions will just become obvious, the government should by and large stop doing it all together. This is because not only can’t the question above “was my model wrong or was I just unlucky” not be answered. Further all the social science models of human behaviour falling out won’t be described otherwise than hopelessly naive and un-realistic.

      • OnceWasTim 1.1.3

        Ekshully @ Mathew, have a listen (or maybe you’ll have listened) to a couple of things on RNZ Sunday Morning today 27/5.
        The first: Media Watch dealing with AI and media, the second,
        Jeremy Heimans: the power of new power.
        http://www.radionz.co.nz/national/programmes/mediawatch/audio/2018646179/ai-and-the-media-coming-ready-or-not
        Jeremy Heimans, not yet up
        QI.
        In a political sense – issues around representation and accountbility, and on a human level – human agency and its place in future

        • Nic the NZer 1.1.3.1

          That’s somewhat interesting but I think shows something about what AI means in more old school terms. Its generally just applying a statistical categorization to data, so if its applied to people it’s putting them in categories.

          Sometimes there is an extrapolation mechanism where other members of those categories can be generated from the categorization mechanism. This is how further Mike Hosking esque editorials can be created by AI. But relevantly there has been a history of Mike Hosking esque pieces being generated in the media already, that segment was known as ‘Like Mike’ by Jeremy Wells. Called correctly that kind of piece is known as an impersonation, or maybe a parody.

          The limitations of this should be somewhat obvious however. There as simply things which can’t be captured by a correct categorization of what is being dealt with. So (referring specifically to the INZ example) if you can’t determine, if a particular case will cause harm in NZ, should they not be deported, based on their country of origin and other factors (which you can’t) then this kind of application will always be quite problematic. In a best case, you might be able to accurately estimate the likely-hood that a particular case will cause harm in NZ, should they not be deported, based on their country of origin and other factors. But in fact all we can actually know is the rate of recorded incidents, where harm has been caused in NZ, after particular cases have not been deported based on their country of origin and other factors. This will always be open to accusations of racial profiling because that is what it is.

          Actually in the specific INZ case it seems they didn’t even get that right and effectively just made up the rate and assumed it was correct as a likely-hood estimate. The INZ thing on the other hand looks more like plain discrimination than racial profiling, because they never got far enough into the data to do racial profiling.

  2. Philg 2

    Why do I feel so underwhelmed by this? Sounds like algorithms are the way decisions will be made. Does this apply to medical procedures and end of life treatment? In Algos we trust?

    • Matthew Whitehead 2.1

      Let me put it this way:

      The government was already experimenting with this under National. Right now we have algorithms used in government, some of them that no expert has ever gotten eyes on, and the only thing they have to do is not breach existing legislation and we’ll most likely never hear about them.

      This review will dig all of them up and the government will consider all of them together, and develop guidelines for what’s acceptable and what’s not. If done properly, this could be a huge win, and essentially the start of overturning the “social investment” (ie. we target people through algorithms and statistics in the most stupid way possible) doctrine in government.

      I agree that we shouldn’t blindly trust algorithms, or spreadsheets, or what have you. They can be used to help remind public sector staff of a good consistent way to make decisions, when they’re well designed. But when it’s time for an actual decision to be made there should always be a human staff member reviewing if the algorithm has gotten everything correct and whether we need to consider another way, either for legal reasons or better public service reasons or just plain because we need to consider other, more humanistic values.

      And no, the End of Life Choice Bill, despite my many problems with it, does not allow for an algorithm to make the decision. It requires reviews by doctors. Those doctors could possibly inform their decisions with algorithms, but that seems unnecessary and unlikely at this point in time.

      The thing to realize is that you’re not going to stop the government from using mathematics and conditional logic to help make its decisions. They’ve been doing so before computers existed. What’s new with modern algorithms is that it’s cheap and the skills are relatively widespread. (An organisation with 100 people in it is likely to have at least one or two people who understand how to do this sort of thing, even if they haven’t specifically hired for it) That’s a given.

      So what’s better is to ask them to be transparent about what their formulas are, when they use them, why, and whether the results are reviewed afterwards. If the public service has to proactively release that information, what’s likely to happen is that people will bring up potential problems with the algorithms in public, debate them, and be able to pressure agencies to change if they’ve made an inappropriate decision- something that’s actually quite hard when policies are concealed or are made according to hidden criteria.

      That said, getting that much information isn’t promised at this stage. I’m all for it, and a couple other people have been advocating for it, but it’ll take more than a few of us talking at Shaw and Curran to get it done, as while Shaw may be onside, I can’t see Labour being too keen about more open government given their record so far this term.

    • Draco T Bastard 2.2

      In Algos we trust?

      Better than trusting feelings.

  3. Incognito 3

    I’m puzzled yet intrigued.

    Looking for patterns is ambiguous. Do they mean they look at data for the presence of absence of expected patterns, i.e. a biased analysis. Or will they have an unbiased look at data to find novel patterns and then figure out whether they are real and what they might mean?

    It goes without saying that algorithms cannot be reviewed without the relevant context, which includes the “relevant data” and the “potential outcomes”. The word “outcome” is yet another ambiguous word. Do they mean “impact” or “ consequence”?

    Whatever the reviewers do, I think they first need to sharpen up the definitions of the terminology.

    This is complex stuff and I’d also love to know how they plan “to give New Zealanders confidence that their data is being used appropriately”; to say “trust us” won’t cut it …

  4. ropata 5

    I fully support Minister Curran’s initiative. Half arsed algorithms can kill.

    It is concerning that spreadsheets are treated as a reliable software tool for implementing business logic. Spreadsheets are not usually subject to the rigorous design/development/testing that is needed to deliver reliable information tech.

    Spreadsheets are a half arsed shortcut, and that leads to shit like Novopay

    • Draco T Bastard 5.1

      Spreadsheets are a half arsed shortcut, and that leads to shit like Novopay

      I suspose that depends upon how much effort went into developing the spreadsheet. Modern spreadsheets are fairly powerful and can do fairly complex stuff if people put the time and effort in.

      Novopay has nothing to do with spreadsheets but poor design and programming by the ‘professional’ software company which was developing it using a database.

    • Matthew Whitehead 5.2

      Spreadsheets are a perfectly reasonable tool for assisting people in doing their maths well, so long as they check their numbers and where they’ve put them. Using a calculator or a webpage is also a shortcut.

      Novopay’s issues were very different to using a spreadsheet, and largely revolved around poor UX and poor compliance in filling out information. (that latter likely being a result of the former)

      What I want is the same level of caution with approving off-the-cuff spreadsheet calculators and models as we use for professionally developed software solitions, where front-line experts and managers review them to make sure that the results are correct and it accounts for the vast majority of cases before it’s approved for usage.

  5. stever 6

    I agree that their algorithms need scrutiny and that they should make them public. But the statement made by the Govt seems confused. (It also falls into the modern, trendy, trap of using the term “AI” to mean only machine learning!!! There’s a lot more to AI, and decades of work on it, than just the area of machine learning.)

    A lot of decisions these days are made using models that are the outputs of machine learning, and though the the machine learning algorithms themselves are standard and algorithmic, the models that they build and which are the things that *actually* get used to identify patterns, make decisions etc. are (and this is the point) not themselves “algorithmic” in the sense of deterministic processes. They are models which classify data–broadly into “yes, this piece of data IS one of these” or “no, it isn’t one of these”…and does it with some calculable error, i.e. we know statistically how often they give the wrong answer. (BTW, getting these models to “explain” their categorisations, rather than merely saying “yes” or “no” is hard, and a focus for research.)

    So, seeing the algorithms in this case misses the point. What we need are the *models that the algorithms build*, the stats around how often they give the wrong answer and so on.

    Will they show us all that too?

    I’m assuming the Govt statement is based on trying to “make things simple” (or perhaps ignorance? I hope not!) but as it is currently worded is misses the main point.

    We don’t need the algorithms that build the decision mechanisms, we need the decisions mechanisms and the data on their reliability.

  6. DB 7

    Spreadsheets sure are getting a bad rap. It’s not the tool, it’s the idiot wielding it.

    I really hope Elon Musk is using his AI think tank to develop AI that ‘outs’ nefarious algorithms. Judging by the way very basic AI devices are already proving to be smart-ass, creepy, racist, and often still working for corporate overlords…

    “Trying to humanise AI and give it more complex tasks [is that] some people end up passing on their subjective views. And the problem of AI bias is nothing new. From 2010, when AI assumed that East Asians were blinking when they smile, to 2015 when Google’s photo service tagged black people as gorillas. In April of this year, Princeton University academics used an algorithm called GLoVe to show how AI can replicate stereotypes in human language.

    Then, in August, research revealed that a selection programme for a UK medical school negatively selected against women and ethnic minority candidates.”

    http://www.wired.co.uk/article/what-happened-in-ai-in-2017

    And then there’s Alexa…

    Spying, reporting, transferring conversation files to contacts… Generally being creepy (corporate design – surprise!). Google it.

    The eggheads have outdone themselves this time. Somewhere, rooms of self-entitled shits who’ve never been laid without their credit cards are writing code to ‘imbue human characteristics’ in AI, namely, to be creepy little wierdos.

    Notice how computers keep updating and adding shit to themselves without your permission, all… the… time… They think they’re entitled to do this. It’s an update! (new apps on desktop too).

    Malicious code, spying code, edging into your conscious uninvited or whatever human rights are infringed upon in code – the employers of coders, and writers of said code should be criminally charged with the offenses. But that’ll never happen because $$. When some corporate shit needs a scapegoat they’ll throw the coders on the fire if and as needed but the clowns in suits will not relent.

    Rebrand and resurface. That’s the corporate way.

    Excellent work by this Government recognizing that inhuman systems need to be vetted. Thoroughly!

    • Matthew Whitehead 7.1

      Yeah, I’m with you that Spreadsheets are getting a bad wrap. The problem with the INZ one was that it was profiling people in ways that are discriminatory with no reasonable evidence. We are told they were overstayers but there was some information that may have been inconsistent with this about visa status. (this could be INZ considering the status of an expired visa ofc, but it’s not 100% clear that was what happened)

      People would probably have liked the EQC one I helped with because it frequently made sure they didn’t have to ask for extra money by ensuring they got their full cash settlement paid correctly, and in one go. Before we instituted it, we had a lot of problems with maths errors in settlement, or leaving out relevant info, that the settlement aid spreadsheet helped them remember to include and check.

      In short, the INZ spreadsheet was a bad tool that couldn’t bare scrutiny. Bringing the algorithms into the daylight by proactively disclosing them would be ideal and make sure we have good tools that can wherever the full maths can be publicly disclosed, but even having good guidelines produced by DIA and Stats in this review will help.

  7. Antoine 8

    Well

    I think its going to be a lot of work and chew up a lot of analyst time, and at the end of the day, people will still be running bad models in dark corners. Or just making decisions off the cuff without modelling support.

    A.

    • One Anonymous Bloke 8.1

      It’ll make it harder for the National Party to justify.

      Minister: “I want to replace this algorithm with one that’s more hateful”.
      Judicial review: “Fuck off Judith”.

    • Matthew Whitehead 8.2

      They might, but if the government has taken reasonable steps to prevent it at least it will be clear who’s at fault when the issue is discovered, and the Minister can reasonably demand that person go.

The server will be getting hardware changes this evening starting at 10pm NZDT.
The site will be off line for some hours.