All the posts about Reddit blocking everyone except Google and Brave got me thinking: What if SearNGX was federated? I.E. when data is retrieved via a providers API, that data is then federated to all other instances.

It would spread the API load out amongst instances, removing the API bottlenecks that come from search providers.

It would allow for more anonymous search, since users could cycle between instances and get the same results.

Geographic bias would be a thing of the past.

Other than ActivityPub overhead and storage, which could be reduced by federating text-only content, I fail to see any downside.

Thoughts?

  • seang96@spgrn.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    6 months ago

    I really want to use this, but from what I read it basically requires a minimum of 20-30GB of RAM to be performant. Also the documentation appears to be a mess and highly outdated. I’d also want to cluster it internally and connect with outside peers still which seems possible, but with the large resource requirement not as feasible with my setup.

    • BuelldozerA
      link
      fedilink
      English
      arrow-up
      13
      ·
      edit-2
      6 months ago

      basically requires a minimum of 20-30GB of RAM to be performant.

      That’s odd, the project page states 256 Megabytes and practically speaking that’s nothing. Where did you find 20-30G? Are you sure you’re not confusing the memory requirement with the suggested free hard drive space?

      Even if it does need 32G of RAM to perform well it’s not a very high hurdle. 32G of DDR4 can be had used for less than $75. Toss that in an old Core8/9 I5 Desktop, install your preferred flavor of Linux, add Docker, and you’re off to the races.

      • seang96@spgrn.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        Like I said documentation is out of date, reading their forums you see quite a few posts about it. The yacy grid sounds perfect for me since it runs a bunch of microservices and I have a cluster of mini PCs. Only problem is the yacy grid is unfortunately lacks the distributed P2P part.

    • Im_old@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 months ago

      I’ve run it in containers, never used that many resources. The whole server (running a few dozen containers) was 32gb, and it wasn’t impacted in any sensible way.

    • hendrik@palaver.p3x.de
      link
      fedilink
      arrow-up
      3
      ·
      6 months ago

      That is misinformation. It doesn’t need anywhere close to that amount of RAM. It’s pretty much like other webapps and I used to run it on an old computer. It’ll fill up your harddisk, though. If you allow it to do that.

      • seang96@spgrn.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        There also seems to be a lot of settings so perhaps they had it misconfigured. It also is Java so I wouldn’t put it past it for such a monolith of a Java program to require so much to be performant. Perhaps I’ll try a cluster of them then and see how it fares.

      • seang96@spgrn.com
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 months ago

        Well initial setup was definitely interesting. I didn’t want to expose 8090 and wanted it behind a web proxy and I finally got that working and actually received my first remote crawl overnight. I had to change to 80/443 internally so it would map correctly for p2p connections, public port setting doesn’t apparently cut it. I kinda dislike the whole setup with it micromanaging CPU load, but otherwise it doesn’t seem atrocious for a new peer at least, I guess this and the web proxy problems are likely awkward due to the age of the software.