• @tal
    link
    English
    63 months ago

    The officials, quoted in an extensive investigation by the online publication jointly run by Palestinians and Israelis, said that the AI-based tool was called “Lavender” and was known to have a 10% error rate.

    So, there’s pretty much no information to decipher what it’s actually doing. But I think that one could at least use a human baseline. For a human in a similar role, assuming that a human can approximate whatever it’s doing, what’s the error rate?

    • FuglyDuck
      link
      fedilink
      English
      103 months ago

      So, there’s pretty much no information to decipher what it’s actually doing. But I think that one could at least use a human baseline. For a human in a similar role, assuming that a human can approximate whatever it’s doing, what’s the error rate?

      the verge had a piece on it.

      Lavender was trained to identify “features” associated with Hamas operatives, including being in a WhatsApp group with a known militant, changing cellphones every few months, or changing addresses frequently. That data was then used to rank other Palestinians in Gaza on a 1–100 scale based on how similar they were to the known Hamas operatives in the initial dataset.

      Basically, they’re looking at habits and social connections and the AI matches people.

      part of the problem?

      To build the Lavender system, information on known Hamas and Palestinian Islamic Jihad operatives was fed into a dataset — but, according to one source who worked with the data science team that trained Lavender, so was data on people loosely affiliated with Hamas, such as employees of Gaza’s Internal Security Ministry. “I was bothered by the fact that when Lavender was trained, they used the term ‘Hamas operative’ loosely, and included people who were civil defense workers in the training dataset,” the source told +972.

      shit data in, shit data out.

      • @tal
        link
        English
        23 months ago

        Hmm.

        I believe that law enforcement has done that sort of thing for a long time, built databases to look for correlating factors and among relationships. And it sounds like they’re explicitly writing up the criteria, else they probably wouldn’t be able to rattle them off. So I kind of doubt that they’re using machine learning to find new criteria.

        If I had to guess from your text, what they did is had people come up with all the criteria that they could think of that’s likely to indicate that someone is Hamas. Then they had some database of known Hamas figures, and ran their classifiers against it, let the system figure how weightings for each of those criteria. I don’t know if that last bit is standard practice for law enforcement software, to identify likely suspects, but I can believe that it might be.

        “AI” might be a slightly ambitious term to use for that. I have used SpamAssassin, which uses Bayesian classifiers to identify spam, for decades. It does something comparable, but I don’t think that people have generally called SpamAssassin “AI”.

        • FuglyDuck
          link
          fedilink
          English
          23 months ago

          So it’s machine learning.

          They have lots of data -social media foot print, addresses, names of friends, coworkers, etc. who they call, where they go for coffee; or happy hour after work, quite literally everything they can get on these guys.

          They the. Give it a known list and tell the machine to look for patterns (like switching burner cell phones every so often.) consistent between all of them.

          It even weights lower strength correlations as softer evidence.

          They then take that and run it against everyone they have in their database. And it spits out people that match the same things.

          As for it being artificial intelligence- it is, just not general AI (Like Data in Star Trek, R2-D2 in Star Wars or Kryzen in Red Dwarf). They’re more like idiot savants that are very good at this one task and suck and literally anything else.

          The problem is mostly in the shit data it was programmed with; and an assumption that it would always be right. It can recognize patterns, but there’s always some natural variation in the pattern.