AI-powered video editing automation system that intelligently merges A-Roll with contextually relevant B-Roll footage through a single API call. Uses Gemini VLM for visual analysis and implements intelligent placement algorithms for optimal narrative flow.

Himanshu Gupta
Full Stack AI Engineer
About
6 months of experience. AI & Data Science student, hands-on builder. Shipped real-world AI systems like ClipSync (video pipeline), Medical Scheduling Agent (multi-API), and Math Mentor (multi-agent RAG). Python Developer Intern at EspoMedia: OCR, data workflows, model accuracy. I design APIs, integrate models, and ship scalable systems, especially interested in agent workflows and production AI.Open to freelance, contract, and full-time roles.
Experience & Ventures
EspoMedia
Jan 2026 - Present
Python Developer Intern
Python Developer Intern in the Tech Department, working on OCR pipelines, ML/DL model training, and data automation: • Assist in building OCR pipelines using EasyOCR, Tesseract, OpenCV, and preprocessing techniques • Support training and fine-tuning ML/DL models using PyTorch/TensorFlow and YOLO via Roboflow • Write Python scripts for data cleaning, automation, and dataset preparation • Use FFmpeg for video/frame extraction and media processing tasks • Work with MongoDB to store and manage OCR/model outputs • Test model outputs, document issues, and help improve accuracy and performance • Collaborate with the team, maintain clean code, and follow Git-based workflows
Selected Work
Math Mentor
Autonomous reasoning system designed to solve high-school and undergraduate level mathematics problems with high reliability. Unlike typical chat interfaces, this application decouples semantic understanding from deterministic computation. The system accepts multimodal inputs (text, image, audio) and employs a Human-in-the-Loop (HITL) workflow to handle ambiguity before it propagates to the solver.
Autonomous conversational AI agent that streamlines patient booking through natural language. Features LangGraph state management, multi-API orchestration with Groq API (Llama 3 70B), and Calendly integration for intelligent appointment scheduling.
Open Source Contributions
Blog
GitHub Activity
View ProfileStack
Frontend
React.js · Next.js · TypeScript · JavaScript · Tailwind CSS
Backend
Node.js · Express.js · Python · FastAPI
Database
MongoDB · PostgreSQL · SQLite · Supabase
Languages
Python · JavaScript · TypeScript · C++ · C · SQL
Tools
Git · GitHub · FFmpeg · Streamlit · Pandas · NumPy
AI
TensorFlow · PyTorch · OpenCV · LangGraph · Groq API · Gemini API · NLP