Files
Research-Digest/README.md
2025-11-05 12:35:09 -05:00

5.3 KiB

Python License arXiv Platform

📚 Research Digest

Automated daily research paper digest from arXiv with smart filtering, mobile-friendly interface, and AI-powered summaries.

Fetch, filter, and browse the latest research papers tailored to your interests. Desktop grid view for deep reading, mobile feed for quick scrolling.


Features

  • 🎯 Smart Filtering - Keyword-based relevance scoring across custom research interests
  • 📱 Mobile Feed - Swipeable, full-screen card interface optimized for phones
  • 🖥️ Desktop Grid - Multi-column layout with rich metadata and difficulty badges
  • 🧠 AI Summaries - Auto-generated layman explanations using transformers
  • 🔄 Deduplication - Never see the same paper twice with built-in tracking
  • ⚙️ Configurable - JSON-based settings for interests, filters, and preferences
  • 📦 Archive - Auto-saves daily digests with browsable index

🖼️ Screenshots

Desktop View

Desktop Demo

Mobile Feed

Mobile Demo


🚀 Quick Start

Windows

  1. Clone & Run

    git clone https://github.com/yourusername/research-digest.git
    cd research-digest
    run_digest.bat
    
  2. First run automatically:

    • Creates virtual environment
    • Installs dependencies
    • Fetches papers
    • Generates HTML digests
  3. Open in browser:

    • latest.html - Most recent digest
    • index.html - Browse all archives
    • tiktok_feed.html - Mobile-optimized feed

Linux/macOS

git clone https://github.com/yourusername/research-digest.git
cd research-digest
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py
python generate_index.py

⚙️ Configuration

Edit config.json to customize:

{
  "interests": {
    "Your Research Area": {
      "query": "cat:cs.LG OR cat:cs.AI",
      "keywords": ["keyword1", "keyword2", "keyword3"]
    }
  },
  "settings": {
    "papers_per_interest": 10,
    "recent_days": 7,
    "summary_max_length": 160
  }
}

Available Settings

Setting Default Description
papers_per_interest 10 Papers to fetch per category
recent_days 7 Look back window (0 = all time)
fallback_days 90 Extended search if few results
summary_max_length 160 Max characters for summaries
fetch_multiplier 5 Over-fetch for better filtering

📖 arXiv Query Syntax

Use arXiv category codes in queries:

  • cat:cs.LG - Machine Learning
  • cat:cs.CV - Computer Vision
  • cat:cs.CL - Computation & Language (NLP)
  • cat:cs.AI - Artificial Intelligence
  • cat:cs.CR - Cryptography & Security
  • cat:cs.DC - Distributed Computing

Combine with OR/AND: cat:cs.LG OR cat:cs.AI

Full category list


🔧 Advanced Usage

Automated Daily Digests

Windows Task Scheduler:

  1. Open Task Scheduler
  2. Create Basic Task → Daily → 7:00 AM
  3. Action: Start Program → C:\path\to\run_digest.bat

Linux/macOS Cron:

0 7 * * * cd /path/to/research-digest && ./venv/bin/python main.py && ./venv/bin/python generate_index.py

Sync to Mobile (Syncthing)

  1. Install Syncthing on PC and phone
  2. Share project folder
  3. Access HTML files directly on phone

Reset Seen Papers

python reset_seen_papers.py

📂 Project Structure

research-digest/
├── config.json              # Configuration (edit this!)
├── main.py                  # Core paper fetcher
├── generate_index.py        # Archive browser generator
├── generate_tiktok_feed.py  # Mobile feed generator
├── run_digest.bat           # Windows launcher
├── requirements.txt         # Python dependencies
├── latest.html              # Latest digest (auto-generated)
├── index.html               # Archive browser (auto-generated)
├── tiktok_feed.html         # Mobile feed (auto-generated)
├── seen_papers.json         # Deduplication tracker
└── arxiv_archive/           # Daily archives
    ├── arxiv_digest_20251101.html
    └── ...

🛠️ Requirements

  • Python 3.8+
  • Dependencies: transformers, torch, requests
  • Disk Space: ~2GB for model, ~10MB per digest
  • Internet: Required for arXiv API and first-time model download

📝 License

MIT License - see LICENSE file for details


🤝 Contributing

Contributions welcome! Ideas:

  • Additional paper sources (bioRxiv, SSRN, etc.)
  • Browser extension for direct syncing
  • Custom ML models for better summaries
  • Export to Notion/Obsidian/Roam

🙏 Acknowledgments

  • arXiv for the open research repository
  • Hugging Face for transformer models
  • Inspired by modern feed UIs and research workflows

Built with ❤️ for researchers who want to stay current without drowning in papers