MBA2026

final tech round

ctoat@9ai

💬1 Comments

Candidate Context

i have btech degree in cse and currently pursuing mtech in cse

Interview Transcript

Can you explain a real-world system you have designed or would design, focusing on scalability and reliability?

Ans

Sure. I’ll explain the design of a Document Processing and Workflow Automation System, similar to what is used in mortgage processing or enterprise document ingestion pipelines. At a high level, the system ingests documents (PDFs, images, emails), extracts structured data using AI models, validates the data against business rules, and pushes it to downstream systems like CRMs or databases. 1. High-Level Architecture The system is divided into four major layers: Ingestion Layer Processing Layer Validation & Business Logic Layer Persistence & Integration Layer Each layer is independently scalable and loosely coupled. 2. Ingestion Layer This layer handles: File uploads (via UI or API) Email ingestion SFTP or cloud storage ingestion Design choices: Use a REST API backed by a load balancer Files are immediately stored in object storage (e.g., S3-compatible storage) Metadata is saved in a database A message is pushed to a message queue (Kafka / RabbitMQ / SQS) Why this works well: Upload is fast and non-blocking Processing is asynchronous System remains responsive even under heavy load 3. Processing Layer This is the most compute-heavy part. Responsibilities: OCR (text extraction) Image preprocessing AI/ML inference Entity extraction Design approach: Workers are containerized (Docker) Deployed on Kubernetes with horizontal pod autoscaling Each worker consumes messages from the queue Key optimizations: Batch inference where possible GPU scheduling for ML-heavy workloads Retry logic with exponential backoff If a worker fails mid-processing, the message is re-queued, ensuring at-least-once processing.

How would you optimize database performance as the system grows?

Ans

Database optimization needs to be addressed at multiple levels, not just indexing. 1. Query Optimization Analyze slow queries using query plans Avoid SELECT * Use proper indexes based on access patterns Introduce composite indexes where necessary 2. Schema Design Normalize where consistency is critical Denormalize where read performance matters Separate hot and cold data 3. Scaling Strategy Vertical scaling initially Read replicas for heavy read workloads Sharding by logical tenant or document type when needed 4. Caching Redis for frequently accessed metadata Cache invalidation via events Time-based expiry for derived data 5. Async Writes Offload non-critical writes to background workers Use eventual consistency for analytics data

How do you ensure clean and maintainable code in a large team?

Ans

I focus on process, tooling, and discipline. 1. Code Standards Enforced linting and formatting Shared coding guidelines Consistent folder structure 2. Testing Strategy Unit tests for core logic Integration tests for workflows Contract tests for APIs 3. Code Reviews Mandatory peer reviews Emphasis on readability, not just correctness Encourage questioning design decisions 4. Documentation Architecture diagrams API contracts Inline comments only where logic is non-obvious 5. Ownership Culture Clear service ownership On-call rotation Post-incident reviews without blame

Any final thoughts?

Ans