Available for new opportunities

Building|

Monu Kumar

I design and build high-throughput backend systems that handle millions of users. From real-time personalization engines to distributed subscription platforms and AI gateways. I obsess over latency, reliability, and elegant architecture.

5M+
Users Served
42%
Throughput Gain
99.99%
System Uptime
10×
Latency Reduction
Java / ScalaApache KafkaDistributed Systems
Experience

Where I've Shipped

3+ years building production systems at scale across India's largest matrimonial platform and graduate research at UIC.

01
Info Edge — Jeevansathi.com

2 roles · Jun 2021 Jul 2024

Senior Software Engineer

Full-time
Apr 2022 – Jul 2024Noida, India

Led backend engineering for Jeevansathi.com: One of India's largest matrimonial platforms. Owned the distributed subscription platform, billing infrastructure, and personalization services at 5M+ user scale.

  • Architected scalable distributed microservices for a subscription platform serving 5M+ users, achieving 42% throughput improvement and 35% revenue growth via caching and load-balanced services
  • Designed fault-tolerant recurring billing integrations with Apple and BillDesk using JWT-based authentication and asymmetric key exchange, driving 2× growth in premium subscriptions within a quarter
  • Built low-latency personalization and ranking services using Collaborative Filtering and XGBoost, processing 2M+ prospects daily, increasing premium conversions by 8% and sales efficiency by 15%
  • Established production observability and alerting (Prometheus, Grafana), reducing MTTR by 60% and sustaining 99.99% uptime while supporting on-call incident response
  • Mentored 3 junior engineers and led backend microservices with TDD, unit/integration tests, and documentation
5M+Users
42%Throughput ↑
35%Revenue ↑
Premium Subs
JavaSpring BootKafkaRedisAerospikeElasticsearchXGBoostPrometheusGrafanaAWSDockerKubernetes

Software Engineer

Full-time
Jun 2021 – Mar 2022Noida, India

Built core search, notification, and CRM infrastructure powering daily matchmaking for millions of users. Led a major search migration from Solr to Elasticsearch and re-architected the order management system.

  • Migrated from Solr to Elasticsearch, enabling near-real-time indexing at 6M+ reads/day (1000+ TPS)
  • Reduced P99 search latency from 6000+ ms to 600 ms via shard reconfiguration and query optimization, a 10× improvement
  • Built an event-driven notification service (Kafka, APNS, FCM) for real-time and scheduled alerts with Azkaban
  • Designed RESTful APIs and a React-based full-stack CRM platform serving 200K+ DAUs for subscription lifecycle and payment management
  • Re-architected order workflow from PHP monolith to Spring Boot microservices using RabbitMQ and Redis
6M+Reads/Day
1000+TPS
10×Latency ↓
200K+DAUs
JavaSpring BootElasticsearchSolrKafkaRabbitMQRedisReactPostgreSQLAzkabanAPNSFCM

Graduate Teaching Assistant

Part-time
Jun 2025 – PresentChicago, IL

TA for graduate-level data engineering courses. Designed hands-on labs, led applied engineering projects, and mentored students on distributed data pipelines, LLM workflows, and cloud infrastructure.

  • Designed hands-on labs in PySpark and Apache Airflow, enabling distributed, fault-tolerant ETL data pipelines
  • Led applied engineering projects integrating AWS RDS, Python web scraping, and LLM-based ingestion workflows processing 10K+ records per batch
10K+Records/Batch
2Courses
PySparkApache AirflowAWS RDSPythonLLMsETLPostgreSQL
Education

University of Illinois Chicago

Master of Science, Computer Science

Aug 2024 – May 2026 (Expected)Chicago, IL
GPA: 3.86 / 4.0

National Institute of Technology Raipur

Bachelor of Technology, Information Technology

Aug 2017 – May 2021Raipur, India
GPA: 8.63 / 10
Projects

What I've Built

Case studies in distributed systems, real-time data pipelines, and high-scale infrastructure. Not just code — engineered for impact.

AI over SMS — Distributed AI Gateway

open-source

Event-driven system enabling offline AI access via SMS

Aug 2025 – Jan 2026
0ms
Context Loss
Async
LLM Pipeline
Multi-lang
Java + Python

Problem

Billions of people lack reliable internet access but need AI capabilities. Traditional AI interfaces require stable HTTP connections, unusable for low-connectivity regions.

Solution

Designed an event-driven gateway that routes AI queries over SMS using Spring Boot, Kafka, Redis, and Twilio. Stateful conversations are maintained in Redis. Asynchronous Kafka pipelines enable cross-language (Java/Python) AI processing.

Architecture

SMS (Twilio) → Spring Boot Gateway → Kafka → Python AI Worker (Ollama/Bedrock) → Redis (conversation state) → Response back via Twilio SMS

Impact

  • Stateful conversation handling with Redis-backed caching, zero context loss across SMS turns
  • Asynchronous Kafka pipelines reduced redundant LLM calls significantly
  • Cross-language processing: Java gateway + Python AI workers on same Kafka bus
  • Deployable on minimal infrastructure, accessible in low-bandwidth environments
Spring BootApache KafkaRedisTwilioPythonJavaOllamaAWS BedrockDocker

LLM Inference Platform

open-source

Hybrid model orchestration: local + managed cloud models

Aug 2024 – Dec 2024
2
LLM Backends
gRPC
Unified API
Multi-turn
Context

Problem

Teams want to use both local (private) and cloud LLMs depending on query sensitivity and cost. Switching between models requires different APIs and destroys conversation context.

Solution

Built a hybrid LLM orchestration layer integrating local (Ollama) and managed (AWS Bedrock) models via a unified gRPC API. Persistent multi-turn conversation storage with seamless model switching.

Architecture

gRPC API → Orchestrator → {Ollama (local) | AWS Bedrock (cloud)} → PostgreSQL (conversation store) → EC2 deployment (ECR, S3, Lambda)

Impact

  • Unified gRPC API abstracts model provider — zero client changes when switching Ollama ↔ Bedrock
  • Persistent multi-turn conversation storage with PostgreSQL
  • Containerized deployment on EC2 via ECR and S3
  • Benchmarked latency and cost tradeoffs across model tiers
PythongRPCOllamaAWS BedrockAWS LambdaEC2ECRS3PostgreSQLDocker

TacoDB — Relational Database Engine

open-source

B-Tree indexing, buffer pool management, and query operators from scratch

Jan 2025 – April 2025
O(log n)
Index Lookup
Multi-M
Tuple Scale
4
Join Types

Problem

Understanding database internals deeply — how real databases handle storage, indexing, and query execution at the systems level.

Solution

Engineered a relational database engine from scratch: B-Tree indexing, clock-based buffer pool management, and modular storage architecture. Implemented core query operators including Merge Join, Index Loop Join, Aggregation, and External Sort.

Architecture

SQL Parser → Query Planner → Operators (Merge Join, Index Loop Join, Aggregation, Sort) → Buffer Pool (clock eviction) → B-Tree Index → Disk I/O (page-based storage)

Impact

  • B-Tree indexing with O(log n) lookup — validated on multi-million tuple workloads
  • Clock-based buffer pool for efficient memory management with configurable page sizes
  • Full query operator suite: Merge Join, Index Loop Join, Aggregation, External Sort
  • Tested with GoogleTest and gdb — validated correctness at scale
C++B-TreeBuffer PoolMerge JoinExternal SortGoogleTestgdb

Distributed ML Training Pipeline

open-source

LLM encoding, embedding & semantic analysis on Hadoop + AWS EMR

Sep 2024 – Dec 2024
65GB+
Corpus Size
EMR
Cluster
Parallel
Training
DL4J
Framework

Problem

Processing and training ML models on a 65GB+ text corpus on a single machine was infeasible, memory constraints, serial execution, and unpredictable runtimes blocked experimentation at scale.

Solution

Built a distributed ML training pipeline on AWS EMR using Hadoop and Spark for large-scale text processing and embedding generation. Used DeepLearning4j for parallelized model training with integrated metrics tracking. Optimized data partitioning and execution plans for predictable cluster-level performance.

Architecture

HDFS (65GB+ corpus) → Spark ETL (partitioned text processing) → DeepLearning4j (parallelized training) → Embedding generation → S3 (model artifacts + metrics)

Impact

  • Processed a 65GB+ text corpus with predictable, linearly-scaling cluster performance
  • Parallelized model training across AWS EMR cluster — eliminated serial bottlenecks
  • Integrated metrics tracking and management via DeepLearning4j parameter server
  • Optimized Spark data partitioning strategy for minimal shuffle overhead
JavaApache SparkHadoopHDFSAWS EMRDeepLearning4jS3
System Design

Deep Dives

How I think about designing systems at scale, the trade-offs, key decisions, and lessons learned from production at Jeevansathi.

Case StudyProduction

Subscription Platform at 5M+ Users

How we scaled Jeevansathi's billing infrastructure and drove 35% revenue growth

The Challenge

Design a subscription management system that handles 5M+ concurrent users, integrates with Apple IAP and BillDesk, ensures fault-tolerant billing with exactly-once semantics, and maintains high throughput under peak load, all while supporting complex recurring billing rules across subscription tiers.

Design Principles

Event-Driven Architecture

Subscription lifecycle events (created, upgraded, cancelled, expired, renewed) are published to Kafka topics. Downstream services (notifications, analytics, CRM) subscribe independently, fully decoupled fanout without tight coupling.

Layered Caching Strategy

Redis for hot subscription state (sub-millisecond reads). Aerospike as the persistent fast store for write-heavy workloads. PostgreSQL as the audit log. This three-tier approach delivered the 42% throughput gain.

JWT + Asymmetric Key Billing

Apple and BillDesk integrations use JWT-based authentication with asymmetric key exchange for webhook verification. This prevents replay attacks and ensures billing events are cryptographically authenticated.

Key Technical Decisions

Kafka for subscription events instead of direct service calls

Why: 6+ downstream consumers (notifications, analytics, CRM, fraud detection) need subscription events. Kafka provides durable fanout, replay capability, and consumer group isolation — impossible with synchronous calls.

Trade-off: Adds operational complexity of Kafka cluster management and eventual consistency between services.

Aerospike over Redis for persistent billing state

Why: Redis is volatile by default and expensive at 5M-record scale. Aerospike provides Redis-like sub-millisecond latency with native persistence, multi-GB capacity, and secondary index support.

Trade-off: Aerospike has steeper learning curve and more complex operational runbooks vs. Redis.

Asymmetric keys for Apple IAP webhook verification

Why: Apple sends billing webhooks with signed JWTs. Using asymmetric verification (public key from Apple's JWKS endpoint) eliminates shared-secret rotation risk and prevents billing event forgery.

Trade-off: Requires periodic public key refresh and adds latency for JWKS endpoint calls (mitigated with caching).

Scale Numbers

5M+
Active users
2M+
Daily billing events
42%
Throughput improvement
35%
Revenue growth
Premium sub growth
99.99%
System uptime

Lessons Learned

01.

Billing idempotency is non-negotiable, we caught 3 duplicate charge scenarios in staging with chaos testing that would have been real customer issues in production.

02.

Apple IAP webhook ordering is not guaranteed. Building idempotent handlers that process events out-of-order was critical.

03.

MTTR improvement (60% reduction via Prometheus/Grafana) had more ROI impact per engineer-hour than almost any other infra investment.

JavaSpring BootKafkaAerospikeRedisPostgreSQLPrometheusGrafanaJWTApple IAPBillDeskKubernetes
Tech Stack

Tools of the Trade

A full-stack view of my engineering toolkit — from distributed systems primitives to cloud infrastructure.

Expert
Proficient
Familiar

Languages

Java
Expert
Python
Expert
Scala
Proficient
C++
Proficient
Go
Familiar
TypeScript
Proficient

Frameworks & APIs

Spring Boot
Expert
Kafka / Kafka Streams
Expert
gRPC
Proficient
RabbitMQ
Proficient
Akka
Familiar
React
Proficient

Data & Search

Elasticsearch
Expert
Redis
Expert
PostgreSQL / MySQL
Expert
Aerospike
Expert
MongoDB / Cassandra
Proficient
Solr
Proficient

Cloud & Infra

AWS (EC2, S3, Lambda, RDS)
Proficient
Docker
Expert
Kubernetes
Proficient
Apache Spark
Proficient
Hadoop / Airflow
Proficient
Linux
Expert

Observability & ML

Prometheus
Proficient
Grafana
Proficient
XGBoost / Collaborative Filtering
Proficient
Ollama / AWS Bedrock
Familiar
PySpark
Proficient
LLM Orchestration
Familiar

Frontend

React
Proficient
Next.js
Proficient
TypeScript
Proficient
Tailwind CSS
Proficient
Framer Motion
Familiar
Streamlit
Proficient

Currently exploring

ClickHouseFlinkGoLLM InfrastructureRay
Research

Publication

Peer-reviewed research on deep learning applied to medical imaging.

Unlocking COVID-19 Patterns: Exploring Deep Learning Models for Precise Recognition and Classification of CT Images

View Paper
International Journal of Science and Research ArchiveJul 28, 2023DOI: 10.30574/ijsra.2023.9.2.0597

Proposed three deep CNN architectures (AlexNet, InceptionV3, VGG19) for COVID-19 diagnosis from CT scan images using the HUST-19 dataset (13,980 images). InceptionV3 achieved 99.95% test accuracy with precision, recall, and F1-score of 1.0 — demonstrating the potential of deep learning for rapid, reliable COVID-19 classification.

InceptionV3: 99.95% Accuracy13,980 CT Scan ImagesPeer Reviewed
Community

Giving Back

Co-founding an NGO, building for social impact, and teaching digital skills to underserved communities.

Co-Founder & Developer

Website
Magadh Mission FoundationApr 2020 – Jul 2024 · 4 yrs 4 mos

Co-founded a nonprofit focused on digital inclusion. Built and maintained the organization's website for outreach and engagement. Volunteered time teaching underprivileged children basic computer skills, hygiene awareness, and digital literacy in Delhi.

Contact

Let's Build Together

I'm open to backend roles focused on distributed systems, platform engineering, or data infrastructure. If you're building something hard — I'd love to hear about it.

Chicago, IL·Open to opportunities
Download Resume
M
monu.dev
Built with Next.js · Tailwind · Framer Motion
© 2026 Monu Kumar