• MacN'Cheezus
    link
    English
    324 days ago

    Llava and Bakllava are two Ollama models than can not only extract text but also describe what’s happening on screen.

    Using tesseract-ocr, as the other guy suggested, is probably simpler and less resource intensive though.