Skip to content

SilenWang/ReviewGPT

Repository files navigation

ReviewGPT

Researchers need to read a large amount of literature every day to keep up with the latest research progress, but the fragmentation of research results is even worse than that of Linux distributions, which slows down research work to some extent. This project aims to use ChatGPT to perform some scientific literature retrieval and work during the reading process, so that related work can be faster and more efficient.

Demo

Demo available on Huggingface, an OpenAI API Key is required

  • Screen:

demo

  • Summarise:

demo

  • Study:

demo demo

ToDo

  • Frontend:
    • A basic app but usable app
    • Setting key from frontend
    • Add a download button for raw parsing data(json)
    • Implementation of content summarise function
    • The About page
    • Add usage instructions
    • Paper Reading Page
  • Backend:
    • Call the chatGPT API for content summarization
    • Call the chatGPT API for literature content access judgment (for meta-analysis)
    • Call the biopython API to obtain literature bibliographic information and abstracts from PUBMED
    • Save and package raw parsing data
      • ~~Data security issues here, necessary to understand whether the returned id will cause Key leakage ~~
    • Add multiple repetitions of content access judgment (check whether the result is stable)
    • RIS file upload and parsing support
    • Support for models other than chatGPT
      • chatGLM
      • moss
      • LLaMA
    • Add the function of reading single paper
    • Add APIs for existing feature
    • Improve the PDF parsing module in the Study feature, with the goal of changing the unit from page to paragraph.
  • Reference learning:
    • Learn the content ofResearchGPT and add similar function
    • Learn the content ofchatPaper and add similar function
    • Try build something like chatPDF
  • Others:
    • Enhlish README
    • Dockfile for container building
    • A HuggingFace demo
    • Add error handling for network tasks, following the example of chatPaper

Code Interpretation

  • According to chatGPT, the implementation of ResearchGPT is as follows:
    • Convert file contents by page into text
    • Call text-embedding-ada-002 for text embedding matrix calculation
    • Convert the question into a matrix and calculate the similarity with the matrix of each page
    • Send the top 3 pages with the highest similarity to the proposed question to the chatGPT interface for literature interpretation

Problems

  • This project was initially developed using Pynecone, but encountered several problems that affected its use/appearance, so it was finally switched to Gradio.
    • pynecone continues to occupy the CPU after startup.
    • Currently, the file upload function is not very user-friendly, and you must use buttons or other content to trigger the upload (I have not found how to implement drag and drop upload).
    • After uploading the file, performing other operations will cause the displayed file name to be lost.