Available for new opportunities

Building|

Monu Kumar

I design and build high-throughput backend systems that handle millions of users. From real-time personalization engines to distributed subscription platforms and AI gateways. I obsess over latency, reliability, and elegant architecture.

GitHub Contact Resume

5M+

Users Served

42%

Throughput Gain

99.99%

System Uptime

10×

Latency Reduction

Java / ScalaApache KafkaDistributed Systems

Experience

Where I've Shipped

3+ years building production systems at scale across India's largest matrimonial platform and graduate research at UIC.

Info Edge — Jeevansathi.com

2 roles · Jun 2021 – Jul 2024

Senior Software Engineer

Full-time

Apr 2022 – Jul 2024Noida, India

Led backend engineering for Jeevansathi.com: One of India's largest matrimonial platforms. Owned the distributed subscription platform, billing infrastructure, and personalization services at 5M+ user scale.

Architected scalable distributed microservices for a subscription platform serving 5M+ users, achieving 42% throughput improvement and 35% revenue growth via caching and load-balanced services
Designed fault-tolerant recurring billing integrations with Apple and BillDesk using JWT-based authentication and asymmetric key exchange, driving 2× growth in premium subscriptions within a quarter
Built low-latency personalization and ranking services using Collaborative Filtering and XGBoost, processing 2M+ prospects daily, increasing premium conversions by 8% and sales efficiency by 15%
Established production observability and alerting (Prometheus, Grafana), reducing MTTR by 60% and sustaining 99.99% uptime while supporting on-call incident response
Mentored 3 junior engineers and led backend microservices with TDD, unit/integration tests, and documentation

5M+Users

42%Throughput ↑

35%Revenue ↑

2×Premium Subs

JavaSpring BootKafkaRedisAerospikeElasticsearchXGBoostPrometheusGrafanaAWSDockerKubernetes

Software Engineer

Full-time

Jun 2021 – Mar 2022Noida, India

Built core search, notification, and CRM infrastructure powering daily matchmaking for millions of users. Led a major search migration from Solr to Elasticsearch and re-architected the order management system.

Migrated from Solr to Elasticsearch, enabling near-real-time indexing at 6M+ reads/day (1000+ TPS)
Reduced P99 search latency from 6000+ ms to 600 ms via shard reconfiguration and query optimization, a 10× improvement
Built an event-driven notification service (Kafka, APNS, FCM) for real-time and scheduled alerts with Azkaban
Designed RESTful APIs and a React-based full-stack CRM platform serving 200K+ DAUs for subscription lifecycle and payment management
Re-architected order workflow from PHP monolith to Spring Boot microservices using RabbitMQ and Redis

6M+Reads/Day

1000+TPS

10×Latency ↓

200K+DAUs

JavaSpring BootElasticsearchSolrKafkaRabbitMQRedisReactPostgreSQLAzkabanAPNSFCM

University of Illinois Chicago

Graduate Teaching Assistant

Part-time

Jun 2025 – PresentChicago, IL

TA for graduate-level data engineering courses. Designed hands-on labs, led applied engineering projects, and mentored students on distributed data pipelines, LLM workflows, and cloud infrastructure.

Designed hands-on labs in PySpark and Apache Airflow, enabling distributed, fault-tolerant ETL data pipelines
Led applied engineering projects integrating AWS RDS, Python web scraping, and LLM-based ingestion workflows processing 10K+ records per batch

10K+Records/Batch

2Courses

PySparkApache AirflowAWS RDSPythonLLMsETLPostgreSQL

Education

University of Illinois Chicago

Master of Science, Computer Science

Aug 2024 – May 2026 (Expected)Chicago, IL

GPA: 3.86 / 4.0

National Institute of Technology Raipur

Bachelor of Technology, Information Technology

Aug 2017 – May 2021Raipur, India

GPA: 8.63 / 10

Projects

What I've Built

Case studies in distributed systems, real-time data pipelines, and high-scale infrastructure. Not just code — engineered for impact.

AI over SMS — Distributed AI Gateway

open-source

Event-driven system enabling offline AI access via SMS

Aug 2025 – Jan 2026

0ms

Context Loss

Async

LLM Pipeline

Multi-lang

Java + Python

Problem

Billions of people lack reliable internet access but need AI capabilities. Traditional AI interfaces require stable HTTP connections, unusable for low-connectivity regions.

Solution

Designed an event-driven gateway that routes AI queries over SMS using Spring Boot, Kafka, Redis, and Twilio. Stateful conversations are maintained in Redis. Asynchronous Kafka pipelines enable cross-language (Java/Python) AI processing.

Architecture

SMS (Twilio) → Spring Boot Gateway → Kafka → Python AI Worker (Ollama/Bedrock) → Redis (conversation state) → Response back via Twilio SMS

Impact

▸Stateful conversation handling with Redis-backed caching, zero context loss across SMS turns
▸Asynchronous Kafka pipelines reduced redundant LLM calls significantly
▸Cross-language processing: Java gateway + Python AI workers on same Kafka bus
▸Deployable on minimal infrastructure, accessible in low-bandwidth environments

Spring BootApache KafkaRedisTwilioPythonJavaOllamaAWS BedrockDocker

LLM Inference Platform

open-source

Hybrid model orchestration: local + managed cloud models

Aug 2024 – Dec 2024

LLM Backends

gRPC

Unified API

Multi-turn

Context

Problem

Teams want to use both local (private) and cloud LLMs depending on query sensitivity and cost. Switching between models requires different APIs and destroys conversation context.

Solution

Built a hybrid LLM orchestration layer integrating local (Ollama) and managed (AWS Bedrock) models via a unified gRPC API. Persistent multi-turn conversation storage with seamless model switching.

Architecture

gRPC API → Orchestrator → {Ollama (local) | AWS Bedrock (cloud)} → PostgreSQL (conversation store) → EC2 deployment (ECR, S3, Lambda)

Impact

▸Unified gRPC API abstracts model provider — zero client changes when switching Ollama ↔ Bedrock
▸Persistent multi-turn conversation storage with PostgreSQL
▸Containerized deployment on EC2 via ECR and S3
▸Benchmarked latency and cost tradeoffs across model tiers

PythongRPCOllamaAWS BedrockAWS LambdaEC2ECRS3PostgreSQLDocker

TacoDB — Relational Database Engine

open-source

B-Tree indexing, buffer pool management, and query operators from scratch

Jan 2025 – April 2025

O(log n)

Index Lookup

Multi-M

Tuple Scale

Join Types

Problem

Understanding database internals deeply — how real databases handle storage, indexing, and query execution at the systems level.

Solution

Engineered a relational database engine from scratch: B-Tree indexing, clock-based buffer pool management, and modular storage architecture. Implemented core query operators including Merge Join, Index Loop Join, Aggregation, and External Sort.

Architecture

SQL Parser → Query Planner → Operators (Merge Join, Index Loop Join, Aggregation, Sort) → Buffer Pool (clock eviction) → B-Tree Index → Disk I/O (page-based storage)

Impact

▸B-Tree indexing with O(log n) lookup — validated on multi-million tuple workloads
▸Clock-based buffer pool for efficient memory management with configurable page sizes
▸Full query operator suite: Merge Join, Index Loop Join, Aggregation, External Sort
▸Tested with GoogleTest and gdb — validated correctness at scale

C++B-TreeBuffer PoolMerge JoinExternal SortGoogleTestgdb

Distributed ML Training Pipeline

open-source

LLM encoding, embedding & semantic analysis on Hadoop + AWS EMR

Sep 2024 – Dec 2024

65GB+

Corpus Size

EMR

Cluster

Parallel

Training

DL4J

Framework

Problem

Processing and training ML models on a 65GB+ text corpus on a single machine was infeasible, memory constraints, serial execution, and unpredictable runtimes blocked experimentation at scale.

Solution

Built a distributed ML training pipeline on AWS EMR using Hadoop and Spark for large-scale text processing and embedding generation. Used DeepLearning4j for parallelized model training with integrated metrics tracking. Optimized data partitioning and execution plans for predictable cluster-level performance.

Architecture

HDFS (65GB+ corpus) → Spark ETL (partitioned text processing) → DeepLearning4j (parallelized training) → Embedding generation → S3 (model artifacts + metrics)

Impact

▸Processed a 65GB+ text corpus with predictable, linearly-scaling cluster performance
▸Parallelized model training across AWS EMR cluster — eliminated serial bottlenecks
▸Integrated metrics tracking and management via DeepLearning4j parameter server
▸Optimized Spark data partitioning strategy for minimal shuffle overhead

JavaApache SparkHadoopHDFSAWS EMRDeepLearning4jS3

System Design

Deep Dives

How I think about designing systems at scale, the trade-offs, key decisions, and lessons learned from production at Jeevansathi.

Case StudyProduction

Subscription Platform at 5M+ Users

How we scaled Jeevansathi's billing infrastructure and drove 35% revenue growth

The Challenge

Design a subscription management system that handles 5M+ concurrent users, integrates with Apple IAP and BillDesk, ensures fault-tolerant billing with exactly-once semantics, and maintains high throughput under peak load, all while supporting complex recurring billing rules across subscription tiers.

Design Principles

Event-Driven Architecture

Subscription lifecycle events (created, upgraded, cancelled, expired, renewed) are published to Kafka topics. Downstream services (notifications, analytics, CRM) subscribe independently, fully decoupled fanout without tight coupling.

Layered Caching Strategy

Redis for hot subscription state (sub-millisecond reads). Aerospike as the persistent fast store for write-heavy workloads. PostgreSQL as the audit log. This three-tier approach delivered the 42% throughput gain.

JWT + Asymmetric Key Billing

Apple and BillDesk integrations use JWT-based authentication with asymmetric key exchange for webhook verification. This prevents replay attacks and ensures billing events are cryptographically authenticated.

Key Technical Decisions

Kafka for subscription events instead of direct service calls

Why: 6+ downstream consumers (notifications, analytics, CRM, fraud detection) need subscription events. Kafka provides durable fanout, replay capability, and consumer group isolation — impossible with synchronous calls.

Trade-off: Adds operational complexity of Kafka cluster management and eventual consistency between services.

Aerospike over Redis for persistent billing state

Why: Redis is volatile by default and expensive at 5M-record scale. Aerospike provides Redis-like sub-millisecond latency with native persistence, multi-GB capacity, and secondary index support.

Trade-off: Aerospike has steeper learning curve and more complex operational runbooks vs. Redis.

Asymmetric keys for Apple IAP webhook verification

Why: Apple sends billing webhooks with signed JWTs. Using asymmetric verification (public key from Apple's JWKS endpoint) eliminates shared-secret rotation risk and prevents billing event forgery.

Trade-off: Requires periodic public key refresh and adds latency for JWKS endpoint calls (mitigated with caching).

Scale Numbers

5M+

Active users

2M+

Daily billing events

42%

Throughput improvement

35%

Revenue growth

2×

Premium sub growth

99.99%

System uptime

Lessons Learned

01.

Billing idempotency is non-negotiable, we caught 3 duplicate charge scenarios in staging with chaos testing that would have been real customer issues in production.

02.

Apple IAP webhook ordering is not guaranteed. Building idempotent handlers that process events out-of-order was critical.

03.

MTTR improvement (60% reduction via Prometheus/Grafana) had more ROI impact per engineer-hour than almost any other infra investment.

JavaSpring BootKafkaAerospikeRedisPostgreSQLPrometheusGrafanaJWTApple IAPBillDeskKubernetes

Tech Stack

Tools of the Trade

A full-stack view of my engineering toolkit — from distributed systems primitives to cloud infrastructure.

Expert

Proficient

Familiar

React

Proficient

Next.js

Proficient

TypeScript

Proficient

Tailwind CSS

Proficient

Framer Motion

Familiar

Streamlit

Proficient

Currently exploring

ClickHouseFlinkGoLLM InfrastructureRay

Research

Publication

Peer-reviewed research on deep learning applied to medical imaging.

Unlocking COVID-19 Patterns: Exploring Deep Learning Models for Precise Recognition and Classification of CT Images

View Paper

International Journal of Science and Research ArchiveJul 28, 2023DOI: 10.30574/ijsra.2023.9.2.0597

Proposed three deep CNN architectures (AlexNet, InceptionV3, VGG19) for COVID-19 diagnosis from CT scan images using the HUST-19 dataset (13,980 images). InceptionV3 achieved 99.95% test accuracy with precision, recall, and F1-score of 1.0 — demonstrating the potential of deep learning for rapid, reliable COVID-19 classification.

InceptionV3: 99.95% Accuracy13,980 CT Scan ImagesPeer Reviewed

Community

Giving Back

Co-founding an NGO, building for social impact, and teaching digital skills to underserved communities.

Co-Founder & Developer

Website

Magadh Mission FoundationApr 2020 – Jul 2024 · 4 yrs 4 mos

Co-founded a nonprofit focused on digital inclusion. Built and maintained the organization's website for outreach and engagement. Volunteered time teaching underprivileged children basic computer skills, hygiene awareness, and digital literacy in Delhi.

Contact

Let's Build Together

I'm open to backend roles focused on distributed systems, platform engineering, or data infrastructure. If you're building something hard — I'd love to hear about it.

Chicago, IL·Open to opportunities

UIC Email

mkuma47@uic.edu

Personal Email

connectkumar.monu@gmail.com

GitHub

github.com/monu18

linkedin.com/in/monu18

Download Resume