initial commit

This commit is contained in:
2025-11-05 12:35:09 -05:00
commit 9c4ee28270
15 changed files with 4347 additions and 0 deletions

40
.gitignore vendored Normal file
View File

@@ -0,0 +1,40 @@
# Virtual Environment
venv/
env/
ENV/
.venv/
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
# ArXiv Downloads
arxiv_archive/
*.pdf
# Generated Files
seen_papers.json
latest.html
index.html
# Syncthing
.stfolder/
.stignore
# IDE / Editor
.vscode/
.idea/
*.swp
*.swo
*~
# OS
.DS_Store
Thumbs.db
desktop.ini
# Logs
*.log

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

203
README.md Normal file
View File

@@ -0,0 +1,203 @@
![Python](https://img.shields.io/badge/python-3.8+-blue.svg)
![License](https://img.shields.io/badge/license-MIT-green.svg)
![arXiv](https://img.shields.io/badge/arXiv-API-red.svg)
![Platform](https://img.shields.io/badge/platform-windows%20%7C%20linux%20%7C%20macos-lightgrey.svg)
# 📚 Research Digest
**Automated daily research paper digest from arXiv with smart filtering, mobile-friendly interface, and AI-powered summaries.**
Fetch, filter, and browse the latest research papers tailored to your interests. Desktop grid view for deep reading, mobile feed for quick scrolling.
---
## ✨ Features
- **🎯 Smart Filtering** - Keyword-based relevance scoring across custom research interests
- **📱 Mobile Feed** - Swipeable, full-screen card interface optimized for phones
- **🖥️ Desktop Grid** - Multi-column layout with rich metadata and difficulty badges
- **🧠 AI Summaries** - Auto-generated layman explanations using transformers
- **🔄 Deduplication** - Never see the same paper twice with built-in tracking
- **⚙️ Configurable** - JSON-based settings for interests, filters, and preferences
- **📦 Archive** - Auto-saves daily digests with browsable index
---
## 🖼️ Screenshots
### Desktop View
![Desktop Demo](desktop_demo.png)
### Mobile Feed
![Mobile Demo](mobile_demo.png)
---
## 🚀 Quick Start
### Windows
1. **Clone & Run**
```bash
git clone https://github.com/yourusername/research-digest.git
cd research-digest
run_digest.bat
```
2. **First run automatically:**
- Creates virtual environment
- Installs dependencies
- Fetches papers
- Generates HTML digests
3. **Open in browser:**
- `latest.html` - Most recent digest
- `index.html` - Browse all archives
- `tiktok_feed.html` - Mobile-optimized feed
### Linux/macOS
```bash
git clone https://github.com/yourusername/research-digest.git
cd research-digest
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python main.py
python generate_index.py
```
---
## ⚙️ Configuration
Edit `config.json` to customize:
```json
{
"interests": {
"Your Research Area": {
"query": "cat:cs.LG OR cat:cs.AI",
"keywords": ["keyword1", "keyword2", "keyword3"]
}
},
"settings": {
"papers_per_interest": 10,
"recent_days": 7,
"summary_max_length": 160
}
}
```
### Available Settings
| Setting | Default | Description |
|---------|---------|-------------|
| `papers_per_interest` | 10 | Papers to fetch per category |
| `recent_days` | 7 | Look back window (0 = all time) |
| `fallback_days` | 90 | Extended search if few results |
| `summary_max_length` | 160 | Max characters for summaries |
| `fetch_multiplier` | 5 | Over-fetch for better filtering |
---
## 📖 arXiv Query Syntax
Use arXiv category codes in queries:
- `cat:cs.LG` - Machine Learning
- `cat:cs.CV` - Computer Vision
- `cat:cs.CL` - Computation & Language (NLP)
- `cat:cs.AI` - Artificial Intelligence
- `cat:cs.CR` - Cryptography & Security
- `cat:cs.DC` - Distributed Computing
Combine with `OR`/`AND`: `cat:cs.LG OR cat:cs.AI`
[Full category list](https://arxiv.org/category_taxonomy)
---
## 🔧 Advanced Usage
### Automated Daily Digests
**Windows Task Scheduler:**
1. Open Task Scheduler
2. Create Basic Task → Daily → 7:00 AM
3. Action: Start Program → `C:\path\to\run_digest.bat`
**Linux/macOS Cron:**
```bash
0 7 * * * cd /path/to/research-digest && ./venv/bin/python main.py && ./venv/bin/python generate_index.py
```
### Sync to Mobile (Syncthing)
1. Install [Syncthing](https://syncthing.net/) on PC and phone
2. Share project folder
3. Access HTML files directly on phone
### Reset Seen Papers
```bash
python reset_seen_papers.py
```
---
## 📂 Project Structure
```
research-digest/
├── config.json # Configuration (edit this!)
├── main.py # Core paper fetcher
├── generate_index.py # Archive browser generator
├── generate_tiktok_feed.py # Mobile feed generator
├── run_digest.bat # Windows launcher
├── requirements.txt # Python dependencies
├── latest.html # Latest digest (auto-generated)
├── index.html # Archive browser (auto-generated)
├── tiktok_feed.html # Mobile feed (auto-generated)
├── seen_papers.json # Deduplication tracker
└── arxiv_archive/ # Daily archives
├── arxiv_digest_20251101.html
└── ...
```
---
## 🛠️ Requirements
- **Python 3.8+**
- **Dependencies:** `transformers`, `torch`, `requests`
- **Disk Space:** ~2GB for model, ~10MB per digest
- **Internet:** Required for arXiv API and first-time model download
---
## 📝 License
MIT License - see [LICENSE](LICENSE) file for details
---
## 🤝 Contributing
Contributions welcome! Ideas:
- Additional paper sources (bioRxiv, SSRN, etc.)
- Browser extension for direct syncing
- Custom ML models for better summaries
- Export to Notion/Obsidian/Roam
---
## 🙏 Acknowledgments
- [arXiv](https://arxiv.org/) for the open research repository
- [Hugging Face](https://huggingface.co/) for transformer models
- Inspired by modern feed UIs and research workflows
---
**Built with ❤️ for researchers who want to stay current without drowning in papers**

198
SETUP_GUIDE.md Normal file
View File

@@ -0,0 +1,198 @@
# 📱 Syncthing + Daily arXiv Digest Setup Guide
## 🎯 What This Does
- Automatically runs your arXiv digest **every morning at 7 AM**
- Archives each day's report in `arxiv_archive/`
- Creates `latest.html` for quick access
- Generates `index.html` to browse all past reports
- Syncs everything to your phone via Syncthing
---
## ⚙️ Step 1: Set Up Windows Task Scheduler
### Option A: Quick Setup (Copy-Paste This)
1. Press `Win + R`, type `taskschd.msc`, press Enter
2. Click **"Create Basic Task"** in the right panel
3. Fill in:
- **Name:** `arXiv Daily Digest`
- **Description:** `Fetches daily research papers and syncs to phone`
4. **Trigger:** Select "Daily"
- Start date: Today
- Start time: **7:00 AM**
- Recur every: **1 days**
5. **Action:** Select "Start a program"
- **Program/script:** `C:\Users\Admin\python\1aResearch\run_digest.bat`
- **Start in:** `C:\Users\Admin\python\1aResearch`
6. Check **"Open the Properties dialog"** at the end
7. In Properties:
- Go to **Conditions** tab
- ✅ Check "Start only if the following network connection is available" → Select "Any connection"
- ❌ Uncheck "Start the task only if the computer is on AC power"
8. Click **OK**
### Option B: Advanced Settings
If you want to run it at startup instead:
- Change Trigger to **"At log on"**
- Add a 2-minute delay: In Properties → Triggers → Edit → Delay task for: **2 minutes**
---
## 📂 Step 2: Set Up Syncthing
### On Your PC:
1. Open Syncthing web UI (usually `http://localhost:8384`)
2. Click **"Add Folder"**
- **Folder Path:** `C:\Users\Admin\python\1aResearch`
- **Folder Label:** `arXiv Research`
- **Folder ID:** `arxiv-research` (auto-generated)
3. Go to **"Sharing"** tab
4. Click **"Add Device"** and enter your phone's Device ID
### On Your Phone:
1. Install **Syncthing** from Play Store / App Store
2. Open app → **Add Device** → Scan QR code from PC
3. Accept the folder share request (`arXiv Research`)
4. Set sync folder location (e.g., `/storage/emulated/0/arXiv/`)
### What Gets Synced:
```
1aResearch/
├── latest.html ← Most recent digest (quick access)
├── index.html ← Browse all reports
└── arxiv_archive/
├── arxiv_digest_20251101.html
├── arxiv_digest_20251102.html
└── ... (daily backups)
```
---
## 📱 Step 3: View on Your Phone
### Method 1: Direct File Access
1. Open your phone's file manager
2. Navigate to the Syncthing folder (e.g., `arXiv/`)
3. Open `latest.html` with any browser
4. Open `index.html` to browse past reports
### Method 2: Use a Local HTML Viewer App
Install **"HTML Viewer"** or **"WebView Tester"** from the app store:
- Point it to your Syncthing folder
- Bookmark `latest.html` for instant access
### Method 3: Create a Home Screen Shortcut (Android)
1. Open `latest.html` in Chrome
2. Menu → **"Add to Home screen"**
3. Name it "arXiv Digest"
4. Now you have one-tap access!
---
## 🧪 Testing Your Setup
### Test the Batch Script:
```batch
# Double-click run_digest.bat or run in Command Prompt:
cd C:\Users\Admin\python\1aResearch
run_digest.bat
```
Expected output:
```
Running arXiv digest...
🔍 Fetching papers for: Efficient ML / Edge AI
→ Found 5 papers
...
✨ HTML digest saved to arxiv_archive\arxiv_digest_20251101.html
📄 Latest digest saved to latest.html
Generating index page...
📑 Index page generated with 1 reports
Done! All files updated.
```
### Test Syncthing Sync:
1. Create/edit any file in `C:\Users\Admin\python\1aResearch`
2. Check your phone's Syncthing folder
3. File should appear within seconds
### Test Task Scheduler:
1. Open Task Scheduler
2. Find "arXiv Daily Digest"
3. Right-click → **"Run"**
4. Watch it execute
---
## 🎨 Customization Ideas
### Change Run Time:
Edit the Task Scheduler trigger to your preferred time (e.g., 6 AM, 9 PM)
### Change Number of Papers:
Edit `main.py` line 21:
```python
PAPERS_PER_INTEREST = 10 # Fetch 10 instead of 5
```
### Add More Interest Areas:
Edit `main.py` lines 13-19 and add more queries:
```python
INTERESTS = {
"Your Topic": 'abs:"your keywords" OR ti:"your topic"',
# ... existing topics
}
```
### Sync Only HTML Files (Save Space):
In Syncthing → Folder → **Ignore Patterns**, add:
```
!/arxiv_archive/*.html
!/latest.html
!/index.html
*
```
---
## 🔧 Troubleshooting
### Task Scheduler doesn't run:
- Check Windows Event Viewer: `Win + X` → Event Viewer → Task Scheduler logs
- Ensure "Run whether user is logged on or not" is selected
- Make sure network connection is available
### Syncthing not syncing:
- Check both devices are connected to the same network (or internet)
- Verify Device IDs match
- Check folder status in Syncthing UI (should say "Up to Date")
### Python script fails:
- Test manually: `cd C:\Users\Admin\python\1aResearch && venv\Scripts\activate && python main.py`
- Check arXiv API rate limits (3-second delays are built in)
- Ensure internet connection is active
### Old reports taking up space:
Create a cleanup script to delete reports older than 30 days:
```python
# cleanup_old.py
import os, glob, time
for f in glob.glob("arxiv_archive/*.html"):
if os.path.getmtime(f) < time.time() - 30*86400:
os.remove(f)
```
---
## 🎉 You're All Set!
Every morning at 7 AM:
1. ✅ Script fetches latest papers
2. ✅ Generates beautiful HTML report
3. ✅ Archives it with date
4. ✅ Updates index page
5. ✅ Syncs to your phone
6. ✅ Read cutting-edge research over coffee!
**Enjoy your automated research digest! 🚀**

985
arxiv_digest_20251101.html Normal file
View File

@@ -0,0 +1,985 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>arXiv Digest • 2025-11-01</title>
<style>
* { box-sizing: border-box; }
:root {
--bg: #0f0f0f;
--text: #e8e8e8;
--muted: #999;
--border: #2a2a2a;
--card-bg: #1a1a1a;
--link: #6ba3ff;
--accent: #ff6b6b;
--green: #51cf66;
--yellow: #ffd43b;
--red: #ff6b6b;
--layman-bg: #1f2937;
--layman-border: #60a5fa;
}
body {
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
line-height: 1.5;
color: var(--text);
background: var(--bg);
margin: 0;
padding: 1rem;
}
.container {
max-width: 1600px;
margin: 0 auto;
}
header {
text-align: center;
padding: 2rem 1rem 3rem;
border-bottom: 2px solid var(--border);
margin-bottom: 2rem;
}
h1 {
font-weight: 900;
font-size: 2.5rem;
margin: 0;
background: linear-gradient(135deg, var(--accent), #ffa94d);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}
.meta {
color: var(--muted);
font-size: 0.95rem;
margin-top: 0.5rem;
letter-spacing: 0.5px;
}
.interest-section {
margin-bottom: 3rem;
}
.interest-header {
display: flex;
align-items: center;
gap: 0.8rem;
margin-bottom: 1.2rem;
padding: 0.8rem 1rem;
background: var(--card-bg);
border-radius: 12px;
border-left: 4px solid var(--accent);
}
.interest-title {
font-size: 1.3rem;
margin: 0;
font-weight: 700;
color: var(--text);
}
.papers-grid {
display: grid;
grid-template-columns: repeat(auto-fill, minmax(380px, 1fr));
gap: 1.2rem;
}
.paper {
background: var(--card-bg);
border: 1px solid var(--border);
border-radius: 12px;
padding: 1.2rem;
transition: all 0.2s ease;
position: relative;
display: flex;
flex-direction: column;
height: 100%;
}
.paper:hover {
border-color: var(--accent);
transform: translateY(-2px);
box-shadow: 0 8px 24px rgba(255, 107, 107, 0.15);
}
.paper-header {
display: flex;
justify-content: space-between;
align-items: flex-start;
gap: 0.8rem;
margin-bottom: 0.8rem;
}
.difficulty-badge {
padding: 0.3rem 0.7rem;
border-radius: 20px;
font-size: 0.7rem;
font-weight: 700;
white-space: nowrap;
flex-shrink: 0;
}
.paper h3 {
font-size: 1.05rem;
margin: 0 0 0.8rem 0;
font-weight: 700;
line-height: 1.4;
color: var(--text);
}
.layman-box {
background: var(--layman-bg);
border-left: 3px solid var(--layman-border);
padding: 0.7rem 0.9rem;
margin-bottom: 0.8rem;
border-radius: 6px;
font-size: 0.88rem;
line-height: 1.5;
color: #94a3b8;
font-style: italic;
}
.summary {
color: var(--muted);
margin-bottom: 1rem;
font-size: 0.88rem;
line-height: 1.6;
flex-grow: 1;
}
.paper-footer {
display: flex;
justify-content: space-between;
align-items: center;
padding-top: 0.8rem;
border-top: 1px solid var(--border);
margin-top: auto;
}
.category-tag {
background: #1e3a5f;
color: #60a5fa;
padding: 0.25rem 0.65rem;
border-radius: 15px;
font-size: 0.75rem;
font-weight: 600;
}
.date {
color: var(--muted);
font-size: 0.75rem;
}
.links {
display: flex;
gap: 1rem;
margin-top: 0.8rem;
}
.links a {
color: var(--link);
text-decoration: none;
font-size: 0.85rem;
font-weight: 600;
transition: color 0.2s;
}
.links a:hover {
color: var(--accent);
}
.footer {
text-align: center;
margin-top: 4rem;
padding: 2rem;
color: var(--muted);
font-size: 0.85rem;
border-top: 1px solid var(--border);
}
@media (max-width: 768px) {
.papers-grid {
grid-template-columns: 1fr;
}
h1 {
font-size: 2rem;
}
}
</style>
</head>
<body>
<div class="container">
<header>
<h1>arXiv Research Digest</h1>
<div class="meta">November 01, 2025 • 45 papers across 5 interests</div>
</header>
<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">Efficient ML / Edge AI</h2>
</div>
<div class="papers-grid">
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Inference-Cost-Aware Dynamic Tree Construction for Efficient Inference in Large Language Models</h3>
<div class="layman-box">💡 This research tackles the problem of language AI.</div>
<div class="summary"> Large Language Models (LLMs) face significant inference latency challenges stemming from their autoregressive design and large size . To address this, speculative decoding emerges as a solution, enabling the simultaneous generation and validation of multiple tokens .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26577v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26577.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Distilling Multilingual Vision-Language Models: When Smaller Models Stay Multilingual</h3>
<div class="layman-box">💡 This research reduces language AI.</div>
<div class="summary"> Knowledge distillation (KD) demonstrates promising results in transferring knowledge from larger to smaller VLMs . applying KD in multilingualism is an underexplored area . We study five distillation formulations across CLIP and SigLIP2 .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26271v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26271.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>STAR: A Privacy-Preserving, Energy-Efficient Edge AI Framework for Human Activity Recognition via Wi-Fi CSI in Mobile and Pervasive Computing Environments</h3>
<div class="layman-box">💡 This research presents techniques for privacy-preserving AI.</div>
<div class="summary"> Human Activity Recognition (HAR) via Wi-Fi Channel State Information (CSI) presents a privacy-preserving, contactless sensing approach suitable for smart homes, healthcare monitoring, and mobile IoT systems .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26148v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26148.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods</h3>
<div class="layman-box">💡 This research running AI locally on devices for computer vision.</div>
<div class="summary"> Knowledge distillation (KD) is an effective method for model compression and transferring knowledge between models . However, its effect on model's robustness against spurious correlations that degrade performance on out-of-distribution data remains underexplored . This study investigates the effect of knowledge distillation on the transferability of ``debiasing'' capabilities from teacher models to student models .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26038v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26038.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>An Agentic Framework for Rapid Deployment of Edge AI Solutions in Industry 5.0</h3>
<div class="layman-box">💡 This research reduces edge computing.</div>
<div class="summary"> We present a novel framework for Industry 5.0 that simplifies the deployment of AI models on edge devices in various industrial settings . The design reduces latency and avoids external data transfer by enabling local inference and real-time processing .</div>
<div class="paper-footer">
<span class="category-tag">cs.AI</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25813v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25813.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Energy-Efficient Autonomous Driving with Adaptive Perception and Robust Decision</h3>
<div class="layman-box">💡 This research explores techniques in machine learning.</div>
<div class="summary"> Autonomous driving is an emerging technology that is expected to bring significant social, economic, and environmental benefits . However, these benefits come with rising energy consumption by computation engines limiting the driving range of vehicles, especially electric ones . Perception computing is typically the most power-intensive component, as it relies on deep learning models to extract environmental features . To address these challenges, we propose an energy-efficient autonomous driving framework, called EneAD .</div>
<div class="paper-footer">
<span class="category-tag">cs.AI</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25205v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25205.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms</h3>
<div class="layman-box">💡 This research makes more efficient edge computing.</div>
<div class="summary"> While machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency . In practice, neural networks must not only operate efficiently but also provide reliable predictions under distributional shifts or unseen data . This work advances resource-efficient and robust inference for both conventional and Bayesian neural networks .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-28</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.24951v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.24951.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations</h3>
<div class="layman-box">💡 This research reduces computer vision.</div>
<div class="summary"> Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing cost while maintaining accuracy . In visual applications, where large-scale image models are widely used, KD enables efficient deployment .</div>
<div class="paper-footer">
<span class="category-tag">cs.CV</span>
<span class="date">2025-10-28</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.24116v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.24116.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>A Survey on Efficient Vision-Language-Action Models</h3>
<div class="layman-box">💡 This research presents techniques for computer vision.</div>
<div class="summary"> Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction . While these models have demonstrated remarkable generalist capabilities, deployment is severely hampered by the substantial computational and data requirements .</div>
<div class="paper-footer">
<span class="category-tag">cs.CV</span>
<span class="date">2025-10-27</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.24795v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.24795.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Rethinking Inference Placement for Deep Learning across Edge and Cloud Platforms: A Multi-Objective Optimization Perspective and Future Directions</h3>
<div class="layman-box">💡 This research running AI locally on devices for language AI.</div>
<div class="summary"> Edge intelligent applications like VR/AR and language model based chatbots have become widespread with the rapid expansion of IoT and mobile devices . But constrained edge devices often cannot serve the increasingly large and complex deep learning (DL) models . Research aims to balance accuracy, computation delay, transmission delay, and privacy concerns .</div>
<div class="paper-footer">
<span class="category-tag">cs.DC</span>
<span class="date">2025-10-27</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.22909v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.22909.pdf" target="_blank">PDF ↗</a>
</div>
</article>
</div>
</section>
<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">Privacy-Preserving ML</h2>
</div>
<div class="papers-grid">
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Non-Convex Over-the-Air Heterogeneous Federated Learning: A Bias-Variance Trade-off</h3>
<div class="layman-box">💡 This research distributed machine learning across computer vision.</div>
<div class="summary"> Over-the-air (OTA) federated learning (FL) has been well recognized as a scalable paradigm that exploits the waveform superposition of the wireless multiple-access channel . We develop novel OTA-FL SGD updates that allow a structured, time-invariant model bias while facilitating reduced variance updates .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26722v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26722.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟡 Advanced</span>
</div>
<h3>On Purely Private Covariance Estimation</h3>
<div class="layman-box">💡 This research presents techniques for privacy-preserving AI.</div>
<div class="summary"> We present a simple perturbation mechanism for the release of $d-dimensional covariance matrices under pure differential privacy . For large datasets with at least $n\geq d^2/\varepsilon elements, our mechanism recovers the provably optimal Frobenius norm error guarantees of \cite{nikolov2023private}.</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26717v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26717.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Tight Differentially Private PCA via Matrix Coherence</h3>
<div class="layman-box">💡 This research makes more efficient privacy-preserving AI.</div>
<div class="summary"> We revisit the task of computing the span of the top $r$ singular vectors $u_1, \ldots, u_r$ of a matrix under differential privacy . We show that a simple and efficient algorithm -- based on singular value decomposition and standard perturbation mechanisms -- returns a private rank-$r$ approximation whose error depends only on the coherence of the input .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26679v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26679.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>UnifiedFL: A Dynamic Unified Learning Framework for Equitable Federation</h3>
<div class="layman-box">💡 This research protecting data privacy in privacy-preserving AI.</div>
<div class="summary"> Federated learning (FL) has emerged as a key paradigm for collaborative model training across multiple clients without sharing raw data . We propose UnifiedFL, a dynamic federated learning framework that represents heterogeneous local networks as nodes and edges in a directed model graph optimized by a shared graph neural network .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26350v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26350.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>PEEL: A Poisoning-Exposing Encoding Theoretical Framework for Local Differential Privacy</h3>
<div class="layman-box">💡 This research protecting data privacy in privacy-preserving AI.</div>
<div class="summary"> Local Differential Privacy (LDP) is a widely adopted privacy-protection model in the Internet of Things . However, existing defenses either incur prohibitive resource overheads or rely on domain-specific prior knowledge . We propose PEEL, a Poisoning-Exposing Encoding theoretical framework for LDP, which departs from resource- or prior-dependent countermeasures . PEEL amplifies stealthy poisoning effects by re-encoding LDP-perturbed data via sparsification, normalization, and low-rank projection .</div>
<div class="paper-footer">
<span class="category-tag">cs.CR</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26102v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26102.pdf" target="_blank">PDF ↗</a>
</div>
</article>
</div>
</section>
<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">Creative AI / Emotion</h2>
</div>
<div class="papers-grid">
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Contribution-Guided Asymmetric Learning for Robust Multimodal Fusion under Imbalance and Noise</h3>
<div class="layman-box">💡 This research achieves better emotion AI.</div>
<div class="summary"> Contribution-Guided Asymmetric Learning (CAL) aims to enhance the contribution of high-contribution modalities while compressing weak modalities to increase their contribution . CAL has shown outstanding performance in imbalanced fusion tasks and noise robustness tests . CAL is based on a modality contribution metric W^m combining the information quantity I(m) and confidence D(m).</div>
<div class="paper-footer">
<span class="category-tag">cs.MM</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26289v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26289.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models</h3>
<div class="layman-box">💡 This research presents techniques for speech processing.</div>
<div class="summary"> Recent advances in speech foundation models (SFMs) have enabled the direct processing of spoken language from raw audio . This capability allows SFMs to be exposed to rich paralinguistic variations embedded in the input speech signal . One under-explored dimension of this variation is voice quality, encompassing phonation types such as creaky and breathy voice .</div>
<div class="paper-footer">
<span class="category-tag">eess.AS</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25577v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25577.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech</h3>
<div class="layman-box">💡 This research achieves better language AI.</div>
<div class="summary"> Advances in spoken language processing have driven the development of spoken language models . We evaluate four SLMs on the task of speech emotion recognition using a dataset of emotionally incongruent speech samples . Results indicate that SLMs rely predominantly on textual semantics rather than speech emotion to perform the task .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25054v2" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25054.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition</h3>
<div class="layman-box">💡 This research understanding emotions in emotion AI.</div>
<div class="summary"> Multimodal emotion recognition is crucial for future human-computer interaction . However accurate emotion recognition still faces significant challenges due to differences between different modalities and the difficulty of characterizing unimodal emotional information . A hybrid network model based on multipath cross-modal interaction (MCIHN) is proposed .</div>
<div class="paper-footer">
<span class="category-tag">cs.CV</span>
<span class="date">2025-10-28</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.24827v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.24827.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Emotion-Coherent Reasoning for Multimodal LLMs via Emotional Rationale Verifier</h3>
<div class="layman-box">💡 This research understanding emotions in language AI.</div>
<div class="summary"> Emotional Rationale Verifier (ERV) and an Explanation Reward are novel approaches to predicting emotions . Authors propose a novel approach: the ERV and an explanation reward . Their method significantly improves explanation-prediction consistency and explanation emotion accuracy .</div>
<div class="paper-footer">
<span class="category-tag">cs.AI</span>
<span class="date">2025-10-27</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.23506v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.23506.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Emotion Recognition with Minimal Wearable Sensing: Multi-domain Feature, Hybrid Feature Selection, and Personalized vs. Generalized Ensemble Model Analysis</h3>
<div class="layman-box">💡 This research proposes a method for edge computing.</div>
<div class="summary"> Negative emotions are linked to the onset of neurodegenerative diseases and dementia . Physiological signals from wearable devices offer a promising noninvasive method for continuous emotion monitoring . The method is designed for deployment in resource-constrained systems, such as Internet of Things .</div>
<div class="paper-footer">
<span class="category-tag">cs.HC</span>
<span class="date">2025-10-26</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.22498v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.22498.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis</h3>
<div class="layman-box">💡 This research explores techniques in emotion AI.</div>
<div class="summary"> LUNA (Latent Unified Network Architecture) is a self-supervised foundation model that reconciles disparate electrode geometries while scaling linearly -- not quadratically -- with channel count . LUNA compresses multi-channel EEG into a fixed-size, topology-agnostic latent space via learned queries and cross-attention . It demonstrates highly competitive performance across several benchmarks .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-25</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.22257v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.22257.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Multi-dataset Joint Pre-training of Emotional EEG Enables Generalizable Affective Computing</h3>
<div class="layman-box">💡 This research presents techniques for emotion AI.</div>
<div class="summary"> The method outperforms state-of-the-art large-scale EEG models by an average of 4.57% in AUROC for few-shot emotion recognition and 11.92% in accuracy for zero-shot generalization to a new dataset .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-25</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.22197v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.22197.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>SentiMaithili: A Benchmark Dataset for Sentiment and Reason Generation for the Low-Resource Maithili Language</h3>
<div class="layman-box">💡 This research presents techniques for language AI.</div>
<div class="summary"> Maithili is an Indo-Aryan language spoken by more than 13 million people in the Purvanchal region of India . It is valued for its rich linguistic structure and cultural significance .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-25</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.22160v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.22160.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>REVE: A Foundation Model for EEG -- Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects</h3>
<div class="layman-box">💡 This research reduces computer vision.</div>
<div class="summary"> Foundation models have transformed AI by reducing reliance on task-specific data through large-scale pretraining . While successful in language and vision, their adoption in EEG has lagged due to the heterogeneity of public datasets . Existing EEG foundation models struggle to generalize across these variations, often restricting pretraining to a single setup . We present REVE (Representation for EEG with Versatile Embeddings) a pretrained model .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-24</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.21585v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.21585.pdf" target="_blank">PDF ↗</a>
</div>
</article>
</div>
</section>
<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">Lightweight Systems</h2>
</div>
<div class="papers-grid">
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Vectorized Context-Aware Embeddings for GAT-Based Collaborative Filtering</h3>
<div class="layman-box">💡 This research enhances language AI.</div>
<div class="summary"> Recommender systems often struggle with data sparsity and cold-start scenarios . This paper presents a Graph Attention Network (GAT) based Collaborative Filtering (CF) framework enhanced with context aware embeddings .</div>
<div class="paper-footer">
<span class="category-tag">cs.IR</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26461v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26461.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟡 Advanced</span>
</div>
<h3>On neighborhoods of embedded toroidal and Hopf manifolds and their foliations</h3>
<div class="layman-box">💡 This research running AI on low-power devices for edge computing.</div>
<div class="summary"> In this article, we give completely new examples of embedded complex manifolds the germ of neighborhood of which is holomorphically equivalent to the zero section in its normal bundle . The first set of examples is composed of connected abelian complex Lie groups, embedded in some complex manifold $M$. The second set is $n$-dimensional Hopf manifolds, embedded as hypersurfaces .</div>
<div class="paper-footer">
<span class="category-tag">math.CV</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26454v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26454.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Scales++: Compute Efficient Evaluation Subset Selection with Cognitive Scales Embeddings</h3>
<div class="layman-box">💡 This research makes more efficient language AI.</div>
<div class="summary"> The prohibitive cost of evaluating large language models (LLMs) on comprehensive benchmarks necessitates the creation of small yet representative data subsets that enable efficient assessment while retaining predictive fidelity . Current methods for this task operate under a model-centric paradigm, selecting benchmarking items based on the collective performance of existing models . Such approaches are limited by large upfront costs, an inability to immediately handle new benchmarks (`cold-start'), and the fragile assumption that future models will share the failure patterns of their predecessors .</div>
<div class="paper-footer">
<span class="category-tag">cs.AI</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26384v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26384.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟡 Advanced</span>
</div>
<h3>From Embedding to Control: Representations for Stochastic Multi-Object Systems</h3>
<div class="layman-box">💡 This research achieves better machine learning.</div>
<div class="summary"> This paper studies how to achieve accurate modeling and effective control in stochastic nonlinear dynamics with multiple interacting objects . Non-uniform interactions and random topologies make this task challenging .</div>
<div class="paper-footer">
<span class="category-tag">eess.SY</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26344v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26344.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🔴 Theory-Heavy</span>
</div>
<h3>Sharp embeddings and existence results for Logarithmic $p$-Laplacian equations with critical growth</h3>
<div class="layman-box">💡 This research explores techniques in machine learning.</div>
<div class="summary"> In this paper, we derive a new $p$-Logarithmic Sobolev inequality and optimal continuous and compact embeddings into Orlicz-type spaces of the function space associated with the logarathmic $p$.-Laplacian . By employing the method of the Nehari manifold, we prove the existence of a nontrivial weak solution . We conduct an asymptotic analysis of a weighted nonlocal, nonlinear problem governed by the fractional</div>
<div class="paper-footer">
<span class="category-tag">math.AP</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26286v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26286.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Accretion rates of stellar-mass compact objects embedded in AGN discs</h3>
<div class="layman-box">💡 This research running AI on low-power devices for edge computing.</div>
<div class="summary"> Stellar-mass compact objects (COs) embedded in active galactic nucleus (AGN) discs are commonly assumed to accrete via Bondi or Bondi-Hoyle-Lyttleton prescriptions . We show that differential rotation in AGN discs can impart non-negligible angular momentum, in which case accretion proceeds through a viscous disc rather than Bondi/BHL flow .</div>
<div class="paper-footer">
<span class="category-tag">astro-ph.HE</span>
<span class="date">2025-10-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.26111v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.26111.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟡 Advanced</span>
</div>
<h3>An explicit formula of the limit of the heat kernel measures on the spheres embedded in $\R^\infty$</h3>
<div class="layman-box">💡 This research explores techniques in machine learning.</div>
<div class="summary"> We show that the heat kernel measures based at the north pole of the spheres converges to a Gaussian measure in $R^\infty$ We also find an explicit formula for this measure .</div>
<div class="paper-footer">
<span class="category-tag">math.PR</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25855v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25855.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟡 Advanced</span>
</div>
<h3>Tight Spherical Embeddings (Updated Version)</h3>
<div class="layman-box">💡 This research explores techniques in machine learning.</div>
<div class="summary"> This is an updated version of a paper which appeared in the proceedings of the 1979 Berlin Colloquium on Global Differential Geometry . The main result of this paper is that every compact isoparametric hypersurface $M^n \subset S^{n+1} is tight .</div>
<div class="paper-footer">
<span class="category-tag">math.DG</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25611v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25611.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Enhanced quality factors at resonance in acoustofluidic cavities embedded in matched elastic metamaterials</h3>
<div class="layman-box">💡 This research enhances machine learning.</div>
<div class="summary"> We show that by embedding liquid-filled acoustofluidic cavities in a metamaterial, the quality factor of the cavity at selected acoustic resonance modes can be enhanced by 2 to 3 orders of magnitude .</div>
<div class="paper-footer">
<span class="category-tag">physics.flu-dyn</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25527v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25527.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Hierarchical Physics-Embedded Learning for Spatiotemporal Dynamical Systems</h3>
<div class="layman-box">💡 This research explores techniques in edge computing.</div>
<div class="summary"> Modeling complex spatiotemporal dynamics, particularly in far-from-equilibrium systems, remains a challenge in science . The governing partial differential equations (PDEs) for these systems are often intractable to derive from first principles .</div>
<div class="paper-footer">
<span class="category-tag">cs.LG</span>
<span class="date">2025-10-29</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.25306v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.25306.pdf" target="_blank">PDF ↗</a>
</div>
</article>
</div>
</section>
<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">Offline-First / Local AI</h2>
</div>
<div class="papers-grid">
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>SBASH: a Framework for Designing and Evaluating RAG vs. Prompt-Tuned LLM Honeypots</h3>
<div class="layman-box">💡 This research explores techniques in language AI.</div>
<div class="summary"> Honeypots are decoy systems used for gathering valuable threat intelligence . Maximising attacker engagement is essential to their utility . Research has highlighted that context-awareness is necessary to increase engagement . Large Language Models (LLMs) have been shown as one approach to increase context awareness .</div>
<div class="paper-footer">
<span class="category-tag">cs.CR</span>
<span class="date">2025-10-24</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.21459v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.21459.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse Engineering</h3>
<div class="layman-box">💡 This research improves language AI.</div>
<div class="summary"> Large Language Models offer potential for improving RE efficiency through automated comprehension and commenting . Cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities . REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground truth by 20.3\% over its base model .</div>
<div class="paper-footer">
<span class="category-tag">cs.CR</span>
<span class="date">2025-10-23</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.20975v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.20975.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs</h3>
<div class="layman-box">💡 This research achieves better language AI.</div>
<div class="summary"> Mobile agents rely on Large Language Models (LLMs) to plan and execute tasks on smartphone user interfaces . While cloud-based LLMs achieve high task accuracy, they require uploading the full UI state at every step . In contrast, local LLMs avoid UI uploads but suffer from limited capacity, resulting in lower task success rates . CORE comprises three key components: (1) layout-aware block partitioning, (2) Co-planning) and Co-decision-making .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-17</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.15455v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.15455.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>LLM-guided Hierarchical Retrieval</h3>
<div class="layman-box">💡 This research explores techniques in language AI.</div>
<div class="summary"> Modern IR systems are increasingly tasked with answering complex, multi-faceted queries that require deep reasoning . We introduce LATTICE, a hierarchical retrieval framework that enables an LLM to reason over and navigate large corpora with logarithmic search complexity .</div>
<div class="paper-footer">
<span class="category-tag">cs.IR</span>
<span class="date">2025-10-15</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.13217v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.13217.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>COSTAR-A: A prompting framework for enhancing Large Language Model performance on Point-of-View questions</h3>
<div class="layman-box">💡 This research enhances language AI.</div>
<div class="summary"> COSTAR-A is a novel prompt engineering framework that enhances the existing COSTAR method . COSTAR stands for Context, Objective, Style, Tone, Audience, and Response, by adding the 'Answer' component at the end .</div>
<div class="paper-footer">
<span class="category-tag">cs.CL</span>
<span class="date">2025-10-14</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.12637v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.12637.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Bridging Semantics & Structure for Software Vulnerability Detection using Hybrid Network Models</h3>
<div class="layman-box">💡 This research explores techniques in language AI.</div>
<div class="summary"> Software vulnerabilities remain a persistent risk, yet static and dynamic analyses often overlook structural dependencies that shape insecure behaviors . Viewing programs as heterogeneous graphs, we capture control- and data-flow relations as complex interaction networks . Our hybrid framework combines these graph representations with light-weight (<4B) local LLMs .</div>
<div class="paper-footer">
<span class="category-tag">cs.SE</span>
<span class="date">2025-10-11</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.10321v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.10321.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>Open WebUI: An Open, Extensible, and Usable Interface for AI Interaction</h3>
<div class="layman-box">💡 This research presents techniques for language AI.</div>
<div class="summary"> The toolkit is designed to be open (open-source and local), extensible ( plugin support and users can interact with multiple models) The extensibility is enabled through a two-pronged plugin architecture and a community platform for sharing, importing, and adapting extensions .</div>
<div class="paper-footer">
<span class="category-tag">cs.HC</span>
<span class="date">2025-10-02</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.02546v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.02546.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>DualTune: Decoupled Fine-Tuning for On-Device Agentic Systems</h3>
<div class="layman-box">💡 This research protecting data privacy in language AI.</div>
<div class="summary"> Large Language Models (LLMs) consistently underperform compared to frontier models in tool calling scenarios . We propose "decoupled fine-tuning" to create dedicated LoRA adapters for tool selection and tool-specific argument generation using separate loss masking for each of the subtasks . DualTune is an inference framework that leverages the LRA adapters created using decoupled fines-tune to perform efficient agent orchestration with the help of local models .</div>
<div class="paper-footer">
<span class="category-tag">cs.AI</span>
<span class="date">2025-09-30</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2510.00229v2" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2510.00229.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>SecureFixAgent: A Hybrid LLM Agent for Automated Python Static Vulnerability Repair</h3>
<div class="layman-box">💡 This research automatically finding language AI.</div>
<div class="summary"> Static analysis tools like Bandit are effective at vulnerability detection but suffer from high false positives and lack repair capabilities . Large Language Models (LLMs) can suggest fixes but often hallucinate changes and lack self-validation . We present SecureFixAgent, a hybrid repair framework integrating Bandit with lightweight local LLMs in an iterative detect-repair-validate loop .</div>
<div class="paper-footer">
<span class="category-tag">cs.CR</span>
<span class="date">2025-09-18</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2509.16275v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2509.16275.pdf" target="_blank">PDF ↗</a>
</div>
</article>
<article class="paper">
<div class="paper-header">
<span class="difficulty-badge">🟢 Applied</span>
</div>
<h3>PrivWeb: Unobtrusive and Content-aware Privacy Protection For Web Agents</h3>
<div class="layman-box">💡 This research protecting data privacy in language AI.</div>
<div class="summary"> PrivWeb is a trusted add-on on web agents that anonymizes private information on interfaces according to user preferences . It features privacy categorization and adaptive notifications that selectively pauses tasks for user control over information collection for highly sensitive information . PrivWeb reduces perceived privacy risks with no associated increase in cognitive effort, and resulted in higher overall satisfaction .</div>
<div class="paper-footer">
<span class="category-tag">cs.HC</span>
<span class="date">2025-09-15</span>
</div>
<div class="links">
<a href="http://arxiv.org/abs/2509.11939v1" target="_blank">Abstract ↗</a>
<a href="https://arxiv.org/pdf/2509.11939.pdf" target="_blank">PDF ↗</a>
</div>
</article>
</div>
</section>
<div class="footer">
✨ Generated automatically • Powered by arXiv API
</div>
</div>
</body>
</html>

84
config.json Normal file
View File

@@ -0,0 +1,84 @@
{
"interests": {
"Efficient ML / Edge AI": {
"query": "cat:cs.LG OR cat:cs.CV OR cat:cs.CL",
"keywords": [
"efficient",
"edge",
"compression",
"quantization",
"pruning",
"distillation",
"inference",
"lightweight",
"mobile",
"accelerat"
]
},
"Privacy-Preserving ML": {
"query": "cat:cs.CR OR cat:cs.LG",
"keywords": [
"privacy",
"federated",
"differential",
"secure",
"encrypted",
"confidential",
"private",
"anonymi"
]
},
"Creative AI / Emotion": {
"query": "cat:cs.AI OR cat:cs.SD OR cat:cs.HC",
"keywords": [
"emotion",
"generative",
"creative",
"music",
"affective",
"sentiment",
"art",
"design",
"audio",
"synthesis"
]
},
"Lightweight Systems": {
"query": "cat:cs.DC OR cat:cs.AR",
"keywords": [
"embedded",
"iot",
"edge",
"resource",
"constrained",
"microcontroller",
"low-power",
"sensor",
"device"
]
},
"Offline-First / Local AI": {
"query": "cat:cs.LG",
"keywords": [
"local",
"device",
"mobile",
"offline",
"on-device",
"edge",
"browser",
"client-side",
"standalone"
]
}
},
"settings": {
"papers_per_interest": 10,
"summary_max_length": 160,
"recent_days": 7,
"fallback_days": 90,
"min_papers_threshold": 5,
"fetch_multiplier": 5,
"user_agent": "ResearchDigestBot/1.0 (github.com/wedsmoker)"
}
}

BIN
desktop_demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 126 KiB

205
generate_index.py Normal file
View File

@@ -0,0 +1,205 @@
"""Generate an index.html page to browse all archived digests."""
import os
from datetime import datetime
import glob
def generate_index():
archive_dir = "arxiv_archive"
# Get all digest files
if os.path.exists(archive_dir):
digest_files = sorted(glob.glob(os.path.join(archive_dir, "arxiv_digest_*.html")), reverse=True)
else:
digest_files = []
# Parse dates and create entries
entries = []
for filepath in digest_files:
filename = os.path.basename(filepath)
# Extract date from filename: arxiv_digest_20251101.html
date_str = filename.replace("arxiv_digest_", "").replace(".html", "")
try:
date_obj = datetime.strptime(date_str, "%Y%m%d")
formatted_date = date_obj.strftime("%B %d, %Y")
day_of_week = date_obj.strftime("%A")
entries.append({
'filename': filename,
'date': formatted_date,
'day': day_of_week,
'date_obj': date_obj
})
except ValueError:
continue
html = f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>arXiv Digest Archive</title>
<style>
* {{ box-sizing: border-box; }}
body {{
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: #0f0f0f;
color: #e8e8e8;
margin: 0;
padding: 2rem;
}}
.container {{
max-width: 900px;
margin: 0 auto;
}}
header {{
text-align: center;
margin-bottom: 3rem;
padding-bottom: 2rem;
border-bottom: 2px solid #2a2a2a;
}}
h1 {{
font-weight: 900;
font-size: 2.5rem;
margin: 0 0 0.5rem 0;
background: linear-gradient(135deg, #ff6b6b, #ffa94d);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}}
.subtitle {{
color: #999;
font-size: 1rem;
}}
.latest-link {{
display: inline-block;
background: #ff6b6b;
color: white;
padding: 1rem 2rem;
border-radius: 8px;
text-decoration: none;
font-weight: 700;
margin-bottom: 3rem;
transition: transform 0.2s, box-shadow 0.2s;
}}
.latest-link:hover {{
transform: translateY(-2px);
box-shadow: 0 8px 24px rgba(255, 107, 107, 0.3);
}}
.archive-list {{
list-style: none;
padding: 0;
}}
.archive-item {{
background: #1a1a1a;
border: 1px solid #2a2a2a;
border-radius: 10px;
padding: 1.5rem;
margin-bottom: 1rem;
transition: all 0.2s;
}}
.archive-item:hover {{
border-color: #ff6b6b;
transform: translateX(5px);
}}
.archive-item a {{
text-decoration: none;
color: #e8e8e8;
display: flex;
justify-content: space-between;
align-items: center;
}}
.date-info {{
display: flex;
flex-direction: column;
gap: 0.3rem;
}}
.date-main {{
font-size: 1.2rem;
font-weight: 700;
color: #6ba3ff;
}}
.date-day {{
font-size: 0.9rem;
color: #999;
}}
.arrow {{
font-size: 1.5rem;
color: #ff6b6b;
}}
.no-reports {{
text-align: center;
color: #999;
padding: 3rem;
}}
.stats {{
text-align: center;
margin-top: 3rem;
padding-top: 2rem;
border-top: 1px solid #2a2a2a;
color: #999;
font-size: 0.9rem;
}}
</style>
</head>
<body>
<div class="container">
<header>
<h1>📚 arXiv Digest Archive</h1>
<p class="subtitle">Browse your daily research digests</p>
</header>
<div style="text-align: center;">
<a href="latest.html" class="latest-link">📰 View Latest Digest</a>
</div>
<h2 style="margin-bottom: 1.5rem; color: #e8e8e8;">Past Reports</h2>
"""
if entries:
html += ' <ul class="archive-list">\n'
for entry in entries:
html += f""" <li class="archive-item">
<a href="arxiv_archive/{entry['filename']}">
<div class="date-info">
<div class="date-main">{entry['date']}</div>
<div class="date-day">{entry['day']}</div>
</div>
<div class="arrow">→</div>
</a>
</li>
"""
html += ' </ul>\n'
else:
html += ' <div class="no-reports">No archived reports yet. Run the digest script to generate your first report!</div>\n'
html += f"""
<div class="stats">
{len(entries)} report{"s" if len(entries) != 1 else ""} archived • Updated {datetime.now().strftime("%B %d, %Y at %I:%M %p")}
</div>
</div>
</body>
</html>
"""
with open("index.html", 'w', encoding='utf-8') as f:
f.write(html)
print(f"📑 Index page generated with {len(entries)} reports")
if __name__ == "__main__":
generate_index()

512
generate_tiktok_feed.py Normal file
View File

@@ -0,0 +1,512 @@
import json
import random
from datetime import datetime
def interleave_papers_by_interest(all_papers_by_interest):
"""
Interleave papers round-robin style across interests.
Returns a flat list cycling through: Interest1[0], Interest2[0], ..., Interest1[1], Interest2[1], ...
"""
# Shuffle papers within each interest category
for interest_name in all_papers_by_interest:
random.shuffle(all_papers_by_interest[interest_name])
# Interleave round-robin
interleaved = []
interest_names = list(all_papers_by_interest.keys())
max_papers = max(len(papers) for papers in all_papers_by_interest.values()) if all_papers_by_interest else 0
for i in range(max_papers):
for interest_name in interest_names:
papers = all_papers_by_interest[interest_name]
if i < len(papers):
# Add interest category to paper data
papers[i]['interest_category'] = interest_name
interleaved.append(papers[i])
return interleaved
def generate_tiktok_html(interleaved_papers):
"""Generate self-contained TikTok-style feed HTML with embedded data."""
papers_json = json.dumps(interleaved_papers, indent=2, ensure_ascii=False)
date_str = datetime.now().strftime('%B %d, %Y')
html = f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=no"/>
<title>Research Feed • {date_str}</title>
<style>
* {{
box-sizing: border-box;
margin: 0;
padding: 0;
}}
:root {{
--bg: #000000;
--text: #ffffff;
--muted: #a0a0a0;
--card-bg: #1a1a1a;
--border: #2a2a2a;
--accent: #ff6b6b;
--heart-red: #ff4458;
--layman-bg: #1f2937;
--layman-border: #60a5fa;
}}
body {{
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
background: var(--bg);
color: var(--text);
overflow-x: hidden;
-webkit-font-smoothing: antialiased;
}}
#feed-container {{
height: 100vh;
overflow-y: scroll;
scroll-snap-type: y mandatory;
-webkit-overflow-scrolling: touch;
padding-top: 60px; /* Space for fixed header */
}}
.paper-card {{
min-height: 100vh;
scroll-snap-align: start;
scroll-snap-stop: always;
display: flex;
flex-direction: column;
justify-content: center;
padding: 2rem 1.5rem;
position: relative;
border-bottom: 1px solid var(--border);
}}
.interest-badge {{
display: inline-block;
background: var(--accent);
color: white;
padding: 0.4rem 0.9rem;
border-radius: 20px;
font-size: 0.7rem;
font-weight: 700;
text-transform: uppercase;
letter-spacing: 0.5px;
margin-bottom: 1rem;
}}
.difficulty-badge {{
display: inline-block;
padding: 0.3rem 0.7rem;
border-radius: 15px;
font-size: 0.7rem;
font-weight: 600;
margin-left: 0.5rem;
}}
.paper-title {{
font-size: 1.5rem;
font-weight: 800;
line-height: 1.3;
margin-bottom: 1rem;
color: var(--text);
}}
.layman-box {{
background: var(--layman-bg);
border-left: 3px solid var(--layman-border);
padding: 1rem;
margin-bottom: 1rem;
border-radius: 8px;
font-size: 0.95rem;
line-height: 1.6;
color: #94a3b8;
font-style: italic;
}}
.summary {{
color: var(--muted);
font-size: 0.95rem;
line-height: 1.6;
margin-bottom: 1.5rem;
}}
.paper-meta {{
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 1rem;
color: var(--muted);
font-size: 0.8rem;
}}
.category-tag {{
background: #1e3a5f;
color: #60a5fa;
padding: 0.3rem 0.7rem;
border-radius: 15px;
font-size: 0.75rem;
font-weight: 600;
}}
.links {{
display: flex;
gap: 1.5rem;
margin-bottom: 2rem;
}}
.links a {{
color: #6ba3ff;
text-decoration: none;
font-size: 0.9rem;
font-weight: 600;
display: flex;
align-items: center;
gap: 0.3rem;
}}
.links a:active {{
color: var(--accent);
}}
/* Fixed header with export button */
.fixed-header {{
position: fixed;
top: 0;
left: 0;
right: 0;
background: rgba(0, 0, 0, 0.95);
backdrop-filter: blur(10px);
padding: 0.8rem 1.5rem;
display: flex;
align-items: center;
justify-content: space-between;
border-bottom: 1px solid var(--border);
z-index: 200;
}}
.like-counter {{
display: flex;
align-items: center;
gap: 0.5rem;
color: var(--muted);
font-size: 0.9rem;
}}
.export-button {{
background: linear-gradient(135deg, var(--accent), #ffa94d);
color: white;
border: none;
padding: 0.6rem 1.2rem;
border-radius: 20px;
font-weight: 700;
font-size: 0.85rem;
cursor: pointer;
transition: all 0.2s ease;
-webkit-tap-highlight-color: transparent;
opacity: 0.5;
pointer-events: none;
}}
.export-button.active {{
opacity: 1;
pointer-events: auto;
}}
.export-button.active:active {{
transform: scale(0.95);
}}
.like-button {{
position: fixed;
bottom: 2rem;
right: 1.5rem;
width: 60px;
height: 60px;
border-radius: 50%;
background: rgba(26, 26, 26, 0.9);
border: 2px solid var(--border);
display: flex;
align-items: center;
justify-content: center;
font-size: 1.8rem;
cursor: pointer;
transition: all 0.2s ease;
z-index: 100;
-webkit-tap-highlight-color: transparent;
}}
.like-button:active {{
transform: scale(0.9);
}}
.like-button.liked {{
background: var(--heart-red);
border-color: var(--heart-red);
animation: heartbeat 0.3s ease;
}}
@keyframes heartbeat {{
0%, 100% {{ transform: scale(1); }}
50% {{ transform: scale(1.2); }}
}}
.scroll-indicator {{
position: fixed;
bottom: 1rem;
left: 50%;
transform: translateX(-50%);
color: var(--muted);
font-size: 0.8rem;
animation: bounce 2s infinite;
z-index: 50;
}}
@keyframes bounce {{
0%, 100% {{ transform: translateX(-50%) translateY(0); }}
50% {{ transform: translateX(-50%) translateY(-10px); }}
}}
.hide-indicator {{
display: none;
}}
</style>
</head>
<body>
<!-- Fixed Header with Export Button -->
<div class="fixed-header">
<div class="like-counter">
<span>♥</span>
<span><span id="likeCount">0</span> liked</span>
</div>
<button class="export-button" id="exportButton">Export Likes</button>
</div>
<div id="feed-container"></div>
<div class="like-button" id="likeButton">
<span id="heartIcon">♡</span>
</div>
<div class="scroll-indicator" id="scrollIndicator">
↓ Scroll to explore
</div>
<script>
// ============================================
// EMBEDDED PAPERS DATA
// ============================================
const papers = {papers_json};
// ============================================
// STATE MANAGEMENT
// ============================================
let likes = JSON.parse(localStorage.getItem('tiktok_likes') || '{{}}');
let currentPaperIndex = 0;
// ============================================
// RENDER FEED
// ============================================
function renderFeed() {{
const container = document.getElementById('feed-container');
papers.forEach((paper, index) => {{
const card = document.createElement('div');
card.className = 'paper-card';
card.dataset.index = index;
card.innerHTML = `
<div class="interest-badge">${{paper.interest_category}}</div>
<div class="difficulty-badge">${{paper.difficulty}}</div>
<h1 class="paper-title">${{paper.title}}</h1>
<div class="layman-box">💡 ${{paper.layman}}</div>
<div class="summary">${{paper.summary}}</div>
<div class="paper-meta">
<span class="category-tag">${{paper.category}}</span>
<span class="date">${{paper.published}}</span>
</div>
<div class="links">
<a href="${{paper.link}}" target="_blank">Abstract ↗</a>
<a href="${{paper.pdf_link}}" target="_blank">PDF ↗</a>
</div>
`;
container.appendChild(card);
}});
}}
// ============================================
// LIKE SYSTEM
// ============================================
function getCurrentPaper() {{
const container = document.getElementById('feed-container');
const scrollPos = container.scrollTop;
const windowHeight = window.innerHeight;
// Find which paper is currently in view
const cards = document.querySelectorAll('.paper-card');
for (let i = 0; i < cards.length; i++) {{
const rect = cards[i].getBoundingClientRect();
if (rect.top >= -windowHeight/2 && rect.top < windowHeight/2) {{
return i;
}}
}}
return 0;
}}
function toggleLike() {{
const paperIndex = getCurrentPaper();
const paper = papers[paperIndex];
const arxivId = paper.arxiv_id;
const heartIcon = document.getElementById('heartIcon');
const likeButton = document.getElementById('likeButton');
if (likes[arxivId]) {{
// Unlike
delete likes[arxivId];
heartIcon.textContent = '';
likeButton.classList.remove('liked');
}} else {{
// Like
likes[arxivId] = {{
arxiv_id: arxivId,
title: paper.title,
abstract_url: paper.link,
category: paper.category,
interest_category: paper.interest_category,
liked_date: new Date().toISOString(),
difficulty: paper.difficulty
}};
heartIcon.textContent = '';
likeButton.classList.add('liked');
}}
// Save to localStorage
localStorage.setItem('tiktok_likes', JSON.stringify(likes));
// Update counter and export button
updateCounter();
updateExportButton();
}}
function updateLikeButton() {{
const paperIndex = getCurrentPaper();
const paper = papers[paperIndex];
const heartIcon = document.getElementById('heartIcon');
const likeButton = document.getElementById('likeButton');
if (likes[paper.arxiv_id]) {{
heartIcon.textContent = '';
likeButton.classList.add('liked');
}} else {{
heartIcon.textContent = '';
likeButton.classList.remove('liked');
}}
}}
function updateCounter() {{
const count = Object.keys(likes).length;
document.getElementById('likeCount').textContent = count;
}}
function updateExportButton() {{
const exportButton = document.getElementById('exportButton');
if (Object.keys(likes).length > 0) {{
exportButton.classList.add('active');
}} else {{
exportButton.classList.remove('active');
}}
}}
// ============================================
// EXPORT LIKES
// ============================================
function exportLikes() {{
const likedPapers = Object.values(likes);
// Calculate category preferences
const preferences = {{}};
likedPapers.forEach(paper => {{
const cat = paper.interest_category;
preferences[cat] = (preferences[cat] || 0) + 1;
}});
const exportData = {{
liked_papers: likedPapers,
preferences: preferences,
export_date: new Date().toISOString(),
total_likes: likedPapers.length
}};
const blob = new Blob([JSON.stringify(exportData, null, 2)], {{
type: 'application/json'
}});
const url = URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = `arxiv_likes_${{new Date().toISOString().split('T')[0]}}.json`;
document.body.appendChild(a);
a.click();
document.body.removeChild(a);
URL.revokeObjectURL(url);
}}
// ============================================
// EVENT LISTENERS
// ============================================
document.getElementById('likeButton').addEventListener('click', toggleLike);
document.getElementById('exportButton').addEventListener('click', exportLikes);
// Update like button when scrolling
document.getElementById('feed-container').addEventListener('scroll', () => {{
updateLikeButton();
// Hide scroll indicator after first scroll
const scrollIndicator = document.getElementById('scrollIndicator');
if (document.getElementById('feed-container').scrollTop > 50) {{
scrollIndicator.classList.add('hide-indicator');
}}
}});
// ============================================
// INITIALIZATION
// ============================================
renderFeed();
updateLikeButton();
updateCounter();
updateExportButton();
</script>
</body>
</html>
"""
return html
def save_tiktok_feed(all_papers_by_interest, filename='tiktok_feed.html'):
"""
Generate and save TikTok-style feed from papers data.
Called by main.py after fetching papers.
"""
# Interleave papers round-robin
interleaved = interleave_papers_by_interest(all_papers_by_interest)
print(f"\n🔄 Interleaved {len(interleaved)} papers across {len(all_papers_by_interest)} interests")
# Generate HTML
html = generate_tiktok_html(interleaved)
# Save file
with open(filename, 'w', encoding='utf-8') as f:
f.write(html)
print(f"✨ TikTok feed saved to {filename}")
print("📱 Sync with your phone and open in browser!")

724
main.py Normal file
View File

@@ -0,0 +1,724 @@
import os
import time
import json
import xml.etree.ElementTree as ET
import requests
from transformers import pipeline
from datetime import datetime, timedelta
from generate_tiktok_feed import save_tiktok_feed
# ======================
# CONFIGURATION
# ======================
def load_config():
"""Load configuration from config.json file."""
config_file = "config.json"
# Default configuration (fallback)
default_config = {
"interests": {
"Efficient ML / Edge AI": {
"query": 'cat:cs.LG OR cat:cs.CV OR cat:cs.CL',
"keywords": ['efficient', 'edge', 'compression', 'quantization', 'pruning', 'distillation', 'inference', 'lightweight', 'mobile', 'accelerat']
}
},
"settings": {
"papers_per_interest": 10,
"summary_max_length": 160,
"recent_days": 7,
"fallback_days": 90,
"min_papers_threshold": 5,
"fetch_multiplier": 5,
"user_agent": "ResearchDigestBot/1.0 (github.com/wedsmoker)"
}
}
if os.path.exists(config_file):
try:
with open(config_file, 'r', encoding='utf-8') as f:
config = json.load(f)
print(f"✅ Loaded configuration from {config_file}")
return config
except Exception as e:
print(f"⚠️ Error loading config file: {e}. Using defaults.")
return default_config
else:
print(f"⚠️ {config_file} not found. Using default configuration.")
return default_config
# Load configuration
config = load_config()
INTERESTS = config.get('interests', {})
settings = config.get('settings', {})
PAPERS_PER_INTEREST = settings.get('papers_per_interest', 10)
SUMMARY_MAX_LENGTH = settings.get('summary_max_length', 160)
USER_AGENT = settings.get('user_agent', 'ResearchDigestBot/1.0')
# Date filtering: Only fetch papers from the last N days (set to 0 to disable)
RECENT_DAYS = settings.get('recent_days', 7)
FALLBACK_DAYS = settings.get('fallback_days', 90)
MIN_PAPERS_THRESHOLD = settings.get('min_papers_threshold', 5)
FETCH_MULTIPLIER = settings.get('fetch_multiplier', 5)
# Deduplication: Track papers we've already shown
SEEN_PAPERS_FILE = "seen_papers.json"
# Initialize summarizer (optional)
try:
summarizer = pipeline(
"summarization",
model="sshleifer/distilbart-cnn-12-6",
device=-1
)
except Exception as e:
print(f"⚠️ Summarizer unavailable ({e}). Using raw abstracts.")
summarizer = None
# ======================
# DEDUPLICATION HELPERS
# ======================
def load_seen_papers():
"""Load the set of previously seen paper IDs."""
if os.path.exists(SEEN_PAPERS_FILE):
try:
with open(SEEN_PAPERS_FILE, 'r') as f:
data = json.load(f)
return set(data.get('seen_ids', []))
except Exception as e:
print(f"⚠️ Error loading seen papers: {e}")
return set()
def save_seen_papers(seen_ids):
"""Save the set of seen paper IDs."""
try:
with open(SEEN_PAPERS_FILE, 'w') as f:
json.dump({
'seen_ids': list(seen_ids),
'last_updated': datetime.now().isoformat()
}, f, indent=2)
except Exception as e:
print(f"⚠️ Error saving seen papers: {e}")
def get_date_filter(days=None):
"""Generate date filter for arXiv query (last N days)."""
if days is None:
days = RECENT_DAYS
if days <= 0:
return ""
end_date = datetime.now()
start_date = end_date - timedelta(days=days)
# arXiv date format: YYYYMMDD0000 to YYYYMMDD2359
date_filter = f"submittedDate:[{start_date.strftime('%Y%m%d')}0000 TO {end_date.strftime('%Y%m%d')}2359]"
return date_filter
# ======================
# ARXIV FETCH & PARSE
# ======================
def fetch_arxiv_papers(query, max_results=5, days_back=None):
url = "http://export.arxiv.org/api/query"
# Add date filter if configured
date_filter = get_date_filter(days_back)
if date_filter:
# Combine user query with date filter using AND
query = f"({query}) AND {date_filter}"
params = {
"search_query": query,
"start": 0,
"max_results": max_results,
"sortBy": "submittedDate",
"sortOrder": "descending"
}
headers = {"User-Agent": USER_AGENT}
try:
response = requests.get(url, params=params, headers=headers, timeout=20)
response.raise_for_status()
return response.text
except Exception as e:
print(f"❌ Error fetching query '{query}': {e}")
return None
def parse_papers(xml_data):
if not xml_data:
return []
try:
root = ET.fromstring(xml_data)
except ET.ParseError:
return []
namespace = {'atom': 'http://www.w3.org/2005/Atom'}
papers = []
for entry in root.findall('atom:entry', namespace):
title_elem = entry.find('atom:title', namespace)
summary_elem = entry.find('atom:summary', namespace)
id_elem = entry.find('atom:id', namespace)
published_elem = entry.find('atom:published', namespace)
if None in (title_elem, summary_elem, id_elem):
continue
title = ' '.join(title_elem.text.strip().split())
summary = ' '.join(summary_elem.text.strip().split())
link = id_elem.text
published = published_elem.text.split('T')[0] if published_elem is not None else "Unknown"
# Extract arXiv ID
arxiv_id = link.split('/abs/')[-1].split('v')[0]
# Get primary category
primary_cat_elem = entry.find('.//{http://arxiv.org/schemas/atom}primary_category')
category = primary_cat_elem.get('term') if primary_cat_elem is not None else "unknown"
papers.append({
'title': title,
'summary': summary,
'link': link,
'pdf_link': f"https://arxiv.org/pdf/{arxiv_id}.pdf",
'arxiv_id': arxiv_id,
'category': category,
'published': published
})
return papers
def summarize_abstract(abstract):
if summarizer is None:
return abstract[:SUMMARY_MAX_LENGTH] + ("..." if len(abstract) > SUMMARY_MAX_LENGTH else "")
try:
if len(abstract.split()) < 15:
return abstract
result = summarizer(
abstract,
max_length=min(SUMMARY_MAX_LENGTH, 142),
min_length=30,
truncation=True
)
return result[0]['summary_text']
except Exception as e:
return abstract[:SUMMARY_MAX_LENGTH] + "..."
def calculate_relevance_score(paper, keywords):
"""Calculate relevance score based on keyword matches in title and abstract."""
title_lower = paper['title'].lower()
abstract_lower = paper['summary'].lower()
score = 0
matched_keywords = []
for keyword in keywords:
keyword_lower = keyword.lower()
# Title matches are worth more
if keyword_lower in title_lower:
score += 3
matched_keywords.append(keyword)
# Abstract matches
elif keyword_lower in abstract_lower:
score += 1
matched_keywords.append(keyword)
# Bonus for multiple keyword matches
if len(matched_keywords) > 2:
score += len(matched_keywords) - 2
paper['relevance_score'] = score
paper['matched_keywords'] = matched_keywords
return score
def estimate_difficulty(abstract, category):
"""Estimate paper difficulty using heuristic keyword analysis."""
abstract_lower = abstract.lower()
# Theory-heavy indicators
complexity_words = ['theoretical', 'proof', 'theorem', 'convergence', 'optimal',
'asymptotic', 'lemma', 'proposition', 'rigorous', 'formalism']
# Applied/practical indicators
applied_words = ['system', 'framework', 'application', 'dataset', 'benchmark',
'implementation', 'experiment', 'empirical', 'practical']
# Math-heavy categories
math_categories = ['math.', 'stat.', 'quant-ph']
# Calculate score
score = sum(1 for w in complexity_words if w in abstract_lower)
score -= sum(0.5 for w in applied_words if w in abstract_lower)
# Category bonus
if any(cat in category for cat in math_categories):
score += 1
# Determine difficulty level
if score > 2:
return "🔴 Theory-Heavy"
elif score > 0.5:
return "🟡 Advanced"
else:
return "🟢 Applied"
def generate_layman_context(title, abstract):
"""Generate simple layman explanation using keyword extraction and templates."""
abstract_lower = abstract.lower()
# Extract key action words and concepts
action_map = {
'improv': 'improves',
'reduc': 'reduces',
'enhanc': 'enhances',
'optimi': 'optimizes',
'acceler': 'speeds up',
'efficient': 'makes more efficient',
'novel': 'introduces a new approach to',
'outperform': 'works better than existing methods for',
'achiev': 'achieves better',
'propose': 'proposes a method for',
'present': 'presents techniques for',
'address': 'tackles the problem of',
'privacy': 'protecting data privacy in',
'federated': 'distributed machine learning across',
'emotion': 'understanding emotions in',
'embedded': 'running AI on low-power devices for',
'edge': 'running AI locally on devices for',
'compression': 'making models smaller for',
'inference': 'faster predictions in',
'generative': 'creating new content with',
'detection': 'automatically finding',
'classification': 'categorizing',
'prediction': 'forecasting'
}
# Find first matching action
action = "explores techniques in"
for keyword, phrase in action_map.items():
if keyword in abstract_lower[:300]: # Check first part of abstract
action = phrase
break
# Extract domain
domain = "machine learning"
if "language model" in abstract_lower or "llm" in abstract_lower or "nlp" in abstract_lower:
domain = "language AI"
elif "vision" in abstract_lower or "image" in abstract_lower or "visual" in abstract_lower:
domain = "computer vision"
elif "speech" in abstract_lower or "audio" in abstract_lower:
domain = "speech processing"
elif "privacy" in abstract_lower or "federated" in abstract_lower:
domain = "privacy-preserving AI"
elif "edge" in abstract_lower or "embedded" in abstract_lower or "device" in abstract_lower:
domain = "edge computing"
elif "emotion" in abstract_lower or "affective" in abstract_lower:
domain = "emotion AI"
return f"This research {action} {domain}."
# ======================
# HTML OUTPUT
# ======================
def save_html_digest(all_papers_by_interest, filename=None):
# Create archive directory if it doesn't exist
archive_dir = "arxiv_archive"
if not os.path.exists(archive_dir):
os.makedirs(archive_dir)
if filename is None:
date_str = datetime.now().strftime('%Y%m%d')
filename = os.path.join(archive_dir, f"arxiv_digest_{date_str}.html")
# Also save as latest.html for easy syncing
latest_file = "latest.html"
html = f"""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<title>arXiv Digest • {datetime.now().strftime('%Y-%m-%d')}</title>
<style>
* {{ box-sizing: border-box; }}
:root {{
--bg: #0f0f0f;
--text: #e8e8e8;
--muted: #999;
--border: #2a2a2a;
--card-bg: #1a1a1a;
--link: #6ba3ff;
--accent: #ff6b6b;
--green: #51cf66;
--yellow: #ffd43b;
--red: #ff6b6b;
--layman-bg: #1f2937;
--layman-border: #60a5fa;
}}
body {{
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
line-height: 1.5;
color: var(--text);
background: var(--bg);
margin: 0;
padding: 1rem;
}}
.container {{
max-width: 1600px;
margin: 0 auto;
}}
header {{
text-align: center;
padding: 2rem 1rem 3rem;
border-bottom: 2px solid var(--border);
margin-bottom: 2rem;
}}
h1 {{
font-weight: 900;
font-size: 2.5rem;
margin: 0;
background: linear-gradient(135deg, var(--accent), #ffa94d);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}}
.meta {{
color: var(--muted);
font-size: 0.95rem;
margin-top: 0.5rem;
letter-spacing: 0.5px;
}}
.interest-section {{
margin-bottom: 3rem;
}}
.interest-header {{
display: flex;
align-items: center;
gap: 0.8rem;
margin-bottom: 1.2rem;
padding: 0.8rem 1rem;
background: var(--card-bg);
border-radius: 12px;
border-left: 4px solid var(--accent);
}}
.interest-title {{
font-size: 1.3rem;
margin: 0;
font-weight: 700;
color: var(--text);
}}
.papers-grid {{
display: grid;
grid-template-columns: repeat(auto-fill, minmax(380px, 1fr));
gap: 1.2rem;
}}
.paper {{
background: var(--card-bg);
border: 1px solid var(--border);
border-radius: 12px;
padding: 1.2rem;
transition: all 0.2s ease;
position: relative;
display: flex;
flex-direction: column;
height: 100%;
}}
.paper:hover {{
border-color: var(--accent);
transform: translateY(-2px);
box-shadow: 0 8px 24px rgba(255, 107, 107, 0.15);
}}
.paper-header {{
display: flex;
justify-content: space-between;
align-items: flex-start;
gap: 0.8rem;
margin-bottom: 0.8rem;
}}
.difficulty-badge {{
padding: 0.3rem 0.7rem;
border-radius: 20px;
font-size: 0.7rem;
font-weight: 700;
white-space: nowrap;
flex-shrink: 0;
}}
.paper h3 {{
font-size: 1.05rem;
margin: 0 0 0.8rem 0;
font-weight: 700;
line-height: 1.4;
color: var(--text);
}}
.layman-box {{
background: var(--layman-bg);
border-left: 3px solid var(--layman-border);
padding: 0.7rem 0.9rem;
margin-bottom: 0.8rem;
border-radius: 6px;
font-size: 0.88rem;
line-height: 1.5;
color: #94a3b8;
font-style: italic;
}}
.summary {{
color: var(--muted);
margin-bottom: 1rem;
font-size: 0.88rem;
line-height: 1.6;
flex-grow: 1;
}}
.paper-footer {{
display: flex;
justify-content: space-between;
align-items: center;
padding-top: 0.8rem;
border-top: 1px solid var(--border);
margin-top: auto;
}}
.category-tag {{
background: #1e3a5f;
color: #60a5fa;
padding: 0.25rem 0.65rem;
border-radius: 15px;
font-size: 0.75rem;
font-weight: 600;
}}
.date {{
color: var(--muted);
font-size: 0.75rem;
}}
.links {{
display: flex;
gap: 1rem;
margin-top: 0.8rem;
}}
.links a {{
color: var(--link);
text-decoration: none;
font-size: 0.85rem;
font-weight: 600;
transition: color 0.2s;
}}
.links a:hover {{
color: var(--accent);
}}
.footer {{
text-align: center;
margin-top: 4rem;
padding: 2rem;
color: var(--muted);
font-size: 0.85rem;
border-top: 1px solid var(--border);
}}
@media (max-width: 768px) {{
.papers-grid {{
grid-template-columns: 1fr;
}}
h1 {{
font-size: 2rem;
}}
}}
</style>
</head>
<body>
<div class="container">
<header>
<h1>arXiv Research Digest</h1>
<div class="meta">{datetime.now().strftime('%B %d, %Y')}{sum(len(p) for p in all_papers_by_interest.values())} papers across {len(all_papers_by_interest)} interests</div>
</header>
"""
for interest_name, papers in all_papers_by_interest.items():
html += f"""<section class="interest-section">
<div class="interest-header">
<span>🔬</span>
<h2 class="interest-title">{interest_name}</h2>
</div>
"""
if not papers:
html += ' <p>No recent papers found.</p>\n'
else:
html += ' <div class="papers-grid">\n'
for paper in papers:
html += f""" <article class="paper">
<div class="paper-header">
<span class="difficulty-badge">{paper['difficulty']}</span>
</div>
<h3>{paper['title']}</h3>
<div class="layman-box">💡 {paper['layman']}</div>
<div class="summary">{paper['summary']}</div>
<div class="paper-footer">
<span class="category-tag">{paper['category']}</span>
<span class="date">{paper['published']}</span>
</div>
<div class="links">
<a href="{paper['link']}" target="_blank">Abstract ↗</a>
<a href="{paper['pdf_link']}" target="_blank">PDF ↗</a>
</div>
</article>
"""
html += ' </div>\n'
html += "</section>\n"
html += """ <div class="footer">
✨ Generated automatically • Powered by arXiv API
</div>
</div>
</body>
</html>
"""
# Save archived version
with open(filename, 'w', encoding='utf-8') as f:
f.write(html)
print(f"✨ HTML digest saved to {filename}")
# Also save as latest.html for quick access
with open(latest_file, 'w', encoding='utf-8') as f:
f.write(html)
print(f"📄 Latest digest saved to {latest_file}")
# ======================
# MAIN EXECUTION
# ======================
if __name__ == "__main__":
# Load previously seen papers
seen_papers = load_seen_papers()
print(f"📋 Loaded {len(seen_papers)} previously seen papers")
if RECENT_DAYS > 0:
print(f"📅 Fetching papers from last {RECENT_DAYS} days")
else:
print("📅 Fetching all available papers (no date filter)")
all_papers = {}
new_papers_count = 0
duplicate_count = 0
for interest_name, interest_config in INTERESTS.items():
query = interest_config['query']
keywords = interest_config['keywords']
print(f"\n🔍 Fetching papers for: {interest_name}")
xml_data = fetch_arxiv_papers(query, PAPERS_PER_INTEREST * FETCH_MULTIPLIER) # Fetch more to filter
papers = parse_papers(xml_data) if xml_data else []
print(f" → Found {len(papers)} papers")
# Filter out duplicates and calculate relevance
fresh_papers = []
for p in papers:
if p['arxiv_id'] not in seen_papers:
# Store original abstract for analysis
original_abstract = p['summary']
# Calculate relevance score FIRST (before summarization)
calculate_relevance_score(p, keywords)
# Estimate difficulty level (use ORIGINAL abstract before summarization)
p['difficulty'] = estimate_difficulty(original_abstract, p['category'])
# Generate layman context (use ORIGINAL abstract for better keyword extraction)
p['layman'] = generate_layman_context(p['title'], original_abstract)
# Generate summary (do this last to avoid losing original abstract)
p['summary'] = summarize_abstract(original_abstract)
fresh_papers.append(p)
else:
duplicate_count += 1
# Sort by relevance score (highest first)
fresh_papers.sort(key=lambda x: x['relevance_score'], reverse=True)
# Take top N papers
top_papers = fresh_papers[:PAPERS_PER_INTEREST]
# Mark these papers as seen
for p in top_papers:
seen_papers.add(p['arxiv_id'])
new_papers_count += 1
all_papers[interest_name] = top_papers
print(f"{len(top_papers)} new papers (from {len(fresh_papers)} candidates, skipped {len(papers) - len(fresh_papers)} duplicates)")
if top_papers:
print(f" 📊 Relevance scores: {[p['relevance_score'] for p in top_papers]}")
# FALLBACK: If we didn't get enough papers, try wider date range (only 1 extra request)
if len(top_papers) < MIN_PAPERS_THRESHOLD and FALLBACK_DAYS > RECENT_DAYS:
print(f" 🔄 Low yield, trying fallback search (last {FALLBACK_DAYS} days)...")
time.sleep(3) # Respect rate limit before fallback request
xml_data_fallback = fetch_arxiv_papers(query, PAPERS_PER_INTEREST * FETCH_MULTIPLIER, days_back=FALLBACK_DAYS)
papers_fallback = parse_papers(xml_data_fallback) if xml_data_fallback else []
print(f" → Found {len(papers_fallback)} papers in fallback")
# Process fallback papers
fallback_fresh = []
for p in papers_fallback:
if p['arxiv_id'] not in seen_papers:
original_abstract = p['summary']
calculate_relevance_score(p, keywords)
p['difficulty'] = estimate_difficulty(original_abstract, p['category'])
p['layman'] = generate_layman_context(p['title'], original_abstract)
p['summary'] = summarize_abstract(original_abstract)
fallback_fresh.append(p)
# Sort fallback papers by relevance
fallback_fresh.sort(key=lambda x: x['relevance_score'], reverse=True)
# Add top fallback papers to fill quota
needed = PAPERS_PER_INTEREST - len(top_papers)
additional_papers = fallback_fresh[:needed]
for p in additional_papers:
seen_papers.add(p['arxiv_id'])
new_papers_count += 1
top_papers.extend(additional_papers)
all_papers[interest_name] = top_papers
print(f" ✨ After fallback: {len(top_papers)} total papers")
# Be kind: 3-second delay between queries (arXiv recommendation)
time.sleep(3)
# Save updated seen papers
save_seen_papers(seen_papers)
print(f"\n📊 Summary:")
print(f" • Total new papers: {new_papers_count}")
print(f" • Total duplicates skipped: {duplicate_count}")
print(f" • Total tracked papers: {len(seen_papers)}")
save_html_digest(all_papers)
save_tiktok_feed(all_papers)
print("\n✅ Done! Open the HTML files in your browser.")

BIN
mobile_demo.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 223 KiB

4
requirements.txt Normal file
View File

@@ -0,0 +1,4 @@
transformers==4.46.2
torch==2.5.1
torchvision==0.20.1
requests==2.32.3

17
reset_seen_papers.py Normal file
View File

@@ -0,0 +1,17 @@
"""
Reset the seen_papers.json file to start fresh.
Run this if you want to see papers again that were previously shown.
"""
import os
import json
SEEN_PAPERS_FILE = "seen_papers.json"
if os.path.exists(SEEN_PAPERS_FILE):
# Backup old file
backup_file = SEEN_PAPERS_FILE.replace('.json', '_backup.json')
os.rename(SEEN_PAPERS_FILE, backup_file)
print(f"✅ Backed up old file to {backup_file}")
print(f"✅ Reset complete! Next run will show all papers as fresh.")
else:
print(" No seen_papers.json file found. Nothing to reset.")

50
run_digest.bat Normal file
View File

@@ -0,0 +1,50 @@
@echo off
REM ArXiv Digest Runner - Sets up environment and runs the script
cd /d "%~dp0"
REM Check if virtual environment exists
if not exist "venv\" (
echo Virtual environment not found. Creating one...
python -m venv venv
if errorlevel 1 (
echo Error creating virtual environment!
echo Make sure Python is installed and available in PATH.
pause
exit /b 1
)
echo Virtual environment created successfully.
echo Installing dependencies...
call venv\Scripts\activate.bat
python -m pip install --upgrade pip
pip install -r requirements.txt
if errorlevel 1 (
echo Error installing dependencies!
pause
exit /b 1
)
echo Dependencies installed successfully.
) else (
call venv\Scripts\activate.bat
)
echo Running arXiv digest...
python main.py
if errorlevel 1 (
echo Error running main script!
pause
goto :end
)
echo Generating index page...
python generate_index.py
if errorlevel 1 (
echo Error generating index!
pause
goto :end
)
echo Done! All files updated.
pause
:end
deactivate

1304
tiktok_feed.html Normal file

File diff suppressed because it is too large Load Diff