Table of Contents
1. Java vs Python (compiled vs interpreted)................................................................................................................1
2. Kafka Idempotency - Producer and Consumer.........................................................................................................2
3. Kafka Consumer - significance of max polling time?................................................................................................2
4. Kafka - how to avoid duplicate messages?..............................................................................................................3
5. Kafka Consumer - how consumer commits its offset that it has processed message? Why auto-commit is not good
for critical systems?.......................................................................................................................................................3
6. Kafka Idempotency - Can we use OutboxPattern for P/C idempotency?..................................................................3
7. With 3 brokers, kafka.producer.acks=all, producer.in.sync.replicas=2. how many broker acks to wait?..................4
8. With 3 brokers, kafka.producer.acks=all, producer.in.sync.replicas=2. how many broker acks to wait?..................4
9. Web server vs application server? who hosts static pages?.....................................................................................5
10. In docker container, spring app do we use embedded server?.............................................................................5
11. Simple – Program vs Process...............................................................................................................................5
12. Simple – Process vs Threads................................................................................................................................6
13. Simple – join() vs yeild()......................................................................................................................................6
14. Simple – join() vs yeild()......................................................................................................................................6
15. Simple – Why is solid-state drive fast (SDD vs HDD)?...........................................................................................7
16. Simple – DNS lookup works?...............................................................................................................................7
17. Simple – Clearing vs settlement?.........................................................................................................................7
18. Simple – Why is Kafka fast?.................................................................................................................................7
19. Simple – Why is REDIS fast?................................................................................................................................8
20. Simple – Strongly vs Weakly Typed Language......................................................................................................8
21. Simple – SPA vs Multi Page application...............................................................................................................9
22. Simple – Concurrency vs Parallelism...................................................................................................................9
23. Simple – Latest Java version................................................................................................................................9
24. Medium – How to diagnose a mysterious process that’s taking too much CPU, memory, IO, etc?......................10
25. Medium – Lambda/Functional Programming....................................................................................................10
26. Medium – Terminal and non-terminal operations in Streams............................................................................11
1. Java vs Python (compiled vs interpreted)
Not necessary, performance depends on multiple factors - way of writing program, hardware etc
Performance - speed/execution time/memory usage etc
Compiled offers better performance than interpreted language
Java
Speed - Java is combination of compiled/translated language (JIT)
o Translated to bytecode once before its executed in JVM; JVM uses JIT to convert bytecode to
machine code at runtime, hence better for long running programs/performant.
Code=>Bytecode=>MachineCode
o Strong focus on scalability and security, better for large scale applications
Speed - Java has static typing
o Variables declared type at compile time, hence compiler checks expressions/variables before it
generates any runtime code. No runtime surprises.
o Variables must be declared with predefined types. Reliable, easier to maintain.
Memory - Java has GC
o Mature GC, effecient memory usages.
o More predicatable due to static nature and type declarations
Java is verbose
o Java big program takes longer
Use Cases
o Java is preferred for large-scale enterprise applications, high-performance systems, and Android
applications due to its robustness, performance, and maintainability.
Python
Speed - Python is interpreted language
o Executes code line by line, translate to machine code as it goes. Hence slower than java
o Code=>MachineCode
Speed - Python has dynamic typing
o Variables need not declared type at compile time; checked at runtime.
o Surprises can be at runtime, unexpected behaviour and can cause crash.
Python program can be develop quickly
o Python has huge built in libraries, hence faster to develop program.
Memory - Python too has GC
o Less effecient GC than java.
o Dynamic typing can lead to higher memory usages and overhead of maintaining dynamic types
Use Cases
o Python is preferred for rapid development, scripting, data analysis, machine learning, and web
development due to its simplicity, readability, and extensive libraries.
2. Kafka Idempotency - Producer and Consumer
Definition
An idempotent operation is one which can be performed many times without carrying different effects.
Idepotency can happen at both - producer and consumer.
Idempotency at Producer (Easy)
Kafka does support idempotent producers.
kafka.producer.enable.idempotence=true
Kafka has support for idempotence through Idempotent Producer / Exactly Once semantics.
Each producer assigned a unique Producer Id (PID). Each message produced by an idempotent producer is
assigned a sequence number.
When the acks configuration is set to all, Kafka will ensure that all in-sync replicas have acknowledged the
message before it is considered successfully sent.
max.in.flight.requests.per.connection property is set to 1 when using idempotence to prevent out-of-order
messages. This ensures that messages are sent one at a time, maintaining the correct sequence.
o Producer => sends Message with unique sequence number.
o Failure, no ack received by producer
o Producer retries => resend Message with the same sequence number.
o Broker rejects Message with same sequence number. duplicate.
o
Idempotency at Consumer (Medium)
Kafka does NOT inherently support idempotent (exactly once) consumers.
3 options by Kafka.
o At-most-once: Messages may be lost, but they are never processed more than once. This is the
default behavior.
o At-least-once: Messages are guaranteed to be processed at least once, which may lead to duplicates.
o Exactly-once: This is achieved through Kafka’s transactional capabilities and requires careful
management of producers and consumers.
Steps when message is received in kafka consumer
o Step 1: The consumer pulls the message M successfully from the Kafka's topic.
o Step 2: The consumer tries to execute the job and the job returns successfully.
o Step 3: The consumer commits the message's offset to the Kafka brokers.
Consumer can crash any step, offset is not committed.
Achieve consumer idempotency - Use DB or Use Transaction boundary(offsets committed only on boundary)
3. Kafka Consumer - significance of max polling time?
max.poll.interval.ms
Kafka consumer setting, default is 5 mins
Consumer regularly poll(), this is the max time when atleast 1 poll request should be made otherwise broker is
assumed to be dead. Repartition happens.
If your consumer is doing heavy processing for each message or a batch of messages (e.g., long-running I/O or
computations), it might not be able to call poll() within the specified interval. In such cases, Kafka might think
the consumer is dead and trigger a rebalance, causing the consumer to lose its partition assignment.
For long running tasks, set max.poll.interval.ms=10 mins
4. Kafka - how to avoid duplicate messages?
Combination of solutions
Idempotent producer
Idempotent consumer
Transaction management
5. Kafka - Transactions management
Transactions for both Producer, Consumer
Allows producers and consumers to ensure that messages are sent and received in a consistent, reliable manner
Application needing strong consistency.
Producer Transactions
Producers can start a transaction, send multiple records, and then either commit the transaction or abort it.
producer.beginTransaction(); => producer.send..... => producer.commitTransaction();
Guarantees that consumers only see complete, committed, consistent messages.
Consumer Transactions
Consumers can read from topics that have been produced with transactions. They only see the committed
messages, ensuring that they don’t process incomplete or partial data.
For consumers, transaction management ensures that offsets are committed only after messages have been
processed, avoiding the "at least once" processing semantics
6. Kafka Consumer - how consumer commits its offset that it has
processed message? Why auto-commit is not good for critical
systems?
Consumer can commit either automatically or manually
Automatically (Default is auto-commit)
enable.auto.commit = true
auto.commit.interval.ms = 5s, interval at which Kafka commits the offsets. Kafka will commit the latest offsets 5
seconds after the consumer polls the messages.
(-) Does not check if the consumer has finished processing polled messages or not before commiting.
(-) Risk of message loss, if consumer application crash before completion, the message is cannot be
reconsumed.
With auto-commit, there is no guarantee that a message is fully processed before its offset is committed.
Auto-commit is more suitable for simple, stateless applications where occasional message loss or reprocessing
is acceptable.
Manually
enable.auto.commit=false
If your application requires guarantees (like at-least-once or exactly-once processing), disable auto-commit
Manually commit offset using commitSync() or commitAsync() only after the message is fully processed.
Option 1: Keep processId in DB, track messages which are proccessed and then acknowledge.
Option 2: Use Transactions, create transaction boundary within consumer.consume(){} . Only when full
committed/rollback offsets are committed. Consumer will set property
spring.kafka.consumer.isolation-level=READ_COMMITTED
Using Kafka's transaction management capabilities can reduce the complexity of managing idempotency in
consumer applications by removing the need for additional tracking in a database.
While Kafka's transaction management can significantly simplify idempotency and state management in many
scenarios, there are cases where maintaining a separate database might still be necessary like long running
proccess, comlex state transitions
7. Kafka Idempotency - Can we use OutboxPattern for P/C
idempotency?
Outbox is necessarily for idempotency; but can complement Idempotent Producer
Outbox Pattern is not typically used to ensure idempotency on the Kafka producer or consumer side.
Although it can be used as complementary to Kafka inbuit Idempotent Producer.
Purpose of Outbox
Ensure Atomicity between Kafka and DB. If any failure in Kafka after DB commit, can cause inconsistency?
Outbox Pattern involves adding an outbox table to your DB that stores messages that need to be published to
Kafka. When a DB transaction is committed, a message is inserted into the outbox table. A separate process
then reads messages from the outbox table and publishes them to Kafka.
No nessages are not lost once database transaction is committed, even if Kafka fails, it can be retried.
Use other kafka features for idempotency
Idempotency on the Kafka producer side, can use Kafka's built-in idempotency guarantees
Idempotency on the Kafka consumer side, can use Kafka's consumer groups and offset management features to
ensure that each message is processed exactly once.
Outbox Pattern helps ensure idempotency and atomicity on the Kafka producer side.
8. With 3 brokers, kafka.producer.acks=all,
producer.in.sync.replicas=2. how many broker acks to wait?
Definition
producer.acks=all: How many acks "producer needs" or wait from kafka cluster/brokers/replicas (including
leader and followers) before considering the write as successful. All brokers have received and persisted the
message. hence producer.acks=all is must.
min.insync.replicas=2: This specifies the minimum number of replicas that must acknowledge the message for it
to be considered successful. Stronger durability guarantees because at least 2 replicas are in-sync
Problem with acks=2:
The key issue with acks=2 is that it does not check if the follower is an in-sync replica (ISR).
It simply waits for 2 brokers (the leader + one follower) to acknowledge the write, but the follower could be
lagging behind the leader.
The follower might acknowledge the write even if it hasn't fully replicated the latest data yet.
acks=all + min.insync.replicas=2
acks =2 Leader + 1 follower (could be lagging)
acks=all + insync.replicas=2 Leader + 1 in-sync follower (must be up-to-date with leader)
ISR significance
ISR ensures that a certain number of replicas are up-to-date with the leader before considering a write
successful. Data Consistency
Why need insycn.replicas=2, cant we achieve same with kafka.producer.acks=2 ?
When you set acks=2, the producer will only wait for acknowledgments from 2 replicas, including the leader.
Both insync.replicas and acks are needed to achieve a balance between performance and reliability.
insync.replicas ensures that a minimum number of replicas are in sync with the leader, while acks determines
the level of acknowledgment the producer requires.
Setting acks=all along with an appropriate value for insync.replicas provides strong guarantees of data durability
and fault tolerance.
9. With 3 brokers, kafka.producer.acks=all,
producer.in.sync.replicas=2. how many broker acks to wait?
Increase durability, avoid data loss
insycn.replicas=2
producer.acks=all
Increase Throughput
max.in.flight.requests.per.connection=5
o Maximum number of unacknowledged requests that can be in flight (i.e., sent but not yet
acknowledged) per connection to a Kafka broker.
o Producer can send up to 5 requests to a broker without waiting for the responses of the previous
requests.
o If using acks=all and there are multiple in-flight requests, there may be a risk of reordering in the
case of retries, particularly if some requests fail and need to be retried while others are still pending.
Kafka producers use a sequence number to maintain order, but if multiple requests are outstanding,
o
and a retry is required, it can affect the order in which messages are processed.
Ensure uniqueness/one time delivery by producer
enable.idempotence=true
o Ensures that each message is delivered exactly once, even in the presence of retries.
o Achieved by assigning a unique identifier to each message and ensuring that the broker only
processes each message once.
o When idempotence is enabled, the producer assigns a unique sequence number to each message.
The broker uses this sequence number to ensure that each message is processed exactly once, even
if the message is retried.
10. Web server vs application server? who hosts static pages?
Web servers:
Primary Role: Serves static content such as HTML files, CSS, JavaScript, images, and videos directly to the user’s
browser.
Can handle many concurrent client connections, making it ideal for serving static resources.
Nginx is commonly used as a web server due to its efficient handling of static content and its ability to manage a
large number of concurrent connections.
Application Server:
Primary Role: Processes dynamic content, executes server-side logic, and handles complex business processes.
Interacts with databases, processes inputs, and executes the core logic of the application.
How They Work Together:
Client Request from browser => slow internet
Web Server (Nginx) => static content (directly from cache)
If the request is for dynamic content (e.g., an API request), Nginx forwards it to the appropriate application
server (e.g., a Java server running on Tomcat, a Python server using Flask/Django).
Application Server => business logic => dynamic response => sends back to Nginx.
Web Server (Nginx) => client’s browser over the internet.
This separation allows each server to specialize in what it does best: the web server (Nginx) efficiently handles static
content and client connections, while the application server focuses on processing complex dynamic requests and
business logic.
11. In docker container, spring app do we use embedded
server?
Embedded servers (Tomcat/Jetty/Undertow):
Embedded server in a Docker container is common and widely accepted in production environments, especially
with Spring Boot applications.
Spring Boot applications typically include an embedded server like Tomcat, Jetty, or Undertow. This server
performs the roles of both a web server and an application server:
Work as both web and app server
As a Web Server:
- Handles HTTP Requests: ES listen to specific port, forward HTTP requests to handler
- Serves Static Content: like HTML, CSS, JavaScript, and images packed in static folder
As an Application Server:
- Processes Dynamic Content: ES can execute servlets, filters, and server-side components like controller
- Supports Java EE Features: Although not a full Java EE application server, it provides key features needed for
enterprise applications, such as servlet handling, session management, and integration with Spring
frameworks.
12. Simple – Program vs Process
Program
An executable file containing a set of instructions and passively stored on disk.
One program can have multiple processes. For example, the Chrome browser (program) creates a different
process for every single tab.
Process
A program is in execution.
When a program is loaded into the memory and becomes active, the program becomes a process.
The process requires some essential resources such as registers, program counter, and stack.
13. Simple – Process vs Threads
Definition
An independent program with its own memory space and resources.
A smaller unit within a process, sharing the same resources with other threads.
Isolation
Processess are isolated; P1 does not share memory with P2
Threads within same P shares memory and resources
Communication
Processes needs IPC mechanism
Threads communicate easily through shared memory. wait(), notify() etc
Concurrency
Processes run idependently, parallely
Threads run concurrently within a process; concurrent tasks
14. Simple – join() vs yeild()
Both involve current threads giving control to other threads
join()
Wait for the thread to complete its execution before current thread continues
main(){
t1.join();
}
main thread waits for t1 to complete
yeild()
Temporay pause current thread execution, give priority to other threads. Good T to Passenger T
main(){
Thread.yeild(); t1.start();
}
main thread gives up CPU if another thread is ready to run
15. Simple – join() vs yeild()
Both involve current threads giving control to other threads
join()
Wait for the thread to complete its execution before current thread continues
main(){
t1.join();
}
main thread waits for t1 to complete
yeild()
Temporay pause current thread execution, give priority to other threads. Good T to Passenger T
main(){
Thread.yeild(); t1.start();
}
main thread gives up CPU if another thread is ready to run
16. Simple – Why is solid-state drive fast (SDD vs HDD)?
SDD read 10 times faster, writes 20 times faster than HDD
SSD is a flash-memory based data storage device. Bits are stored into cells, made of floating-gate transistors.
SSDs are made entirely of electronic components, there are no moving or mechanical parts like in HDD
No moving parts in SDD
SSDs have no mechanical components like spinning disks and read/write heads found in HDDs. This absence of
moving parts allows SSDs to access data almost instantly, eliminating seek times and rotational delays.
Flash Memory
SSDs use NAND flash memory for data storage. Flash memory is non-volatile and retains data even when the
power is off. This enables rapid data access without the need for spinning up disks.
17. Simple – DNS lookup works?
To achieve better scalability, the DNS servers are organized in a hierarchical tree structure.
The browser looks up the IP address for the domain with a domain name system (DNS) lookup - data is cached at
different layers: browser cache, OS cache, local network cache and ISP cache.
3 basic levels of DNS servers
Root name server (.). It stores the IP addresses of Top-Level Domain (TLD) name servers. There are 13 logical
root name servers globally.
TLD name server. It stores the IP addresses of authoritative name servers. There are several types of TLD
names. For example, generic TLD (.com, .org), country code TLD (.us), test TLD (.test).
Authoritative name server. Actual answers to the DNS query. Register authoritative name servers with domain
name registrar such as GoDaddy, Namecheap, etc.
18. Simple – Clearing vs settlement?
Clearing
Verifying and confirming the details of a financial transaction between two parties, ensuring agreement on
transaction specifics.
Customer A (from Bank A) wants to pay $100 to Customer B (from Bank B). They initiate the transaction through
their respective banks. The clearing process takes place within the banks and involves validating transaction
details. Bank A and Bank B confirm that Customer A has sufficient funds, and the transaction details match. This
ensures both parties agree on the payment.
Settlement
Actual transfer of funds or assets between parties after clearing has occurred
Post clearing completion
Bank A transfers $100 from Customer A's account to Bank B. Simultaneously, Bank B credits Customer B's
account with $100. This exchange of funds between banks represents the settlement of the transaction.
19. Simple – Why is Kafka fast?
Traditional Data Copying
In many data processing systems, when data is read from a source (e.g., a file or a network socket), it is typically
copied from the source buffer to an application buffer in memory. Similarly, when data is written to a
destination (e.g., a Kafka topic), it is copied from the application buffer to the destination buffer.
Zero Copy
Zero copy optimizes this process by eliminating the intermediate copy step. Instead of copying data from the
source buffer to the application buffer and then to the destination buffer, zero copy mechanisms allow the data
to be transferred directly from the source to the destination without intermediate copies.
Zero copy is a shortcut to save the multiple data copies between application context and kernel context. This
approach brings down the time by approximately 65%.
Distributed architecture
Partitioning
In memory storage
Batching
20. Simple – Why is REDIS fast?
RAM based
Redis stores data entirely in RAM, hence eliminating disk I/O. RAM is 1000 faster than disk I/O
Single threaded event loop
Redis is single-threaded, it uses an event-driven architecture that minimizes context-switching overhead.
This single-threaded design simplifies synchronization and allows for extremely low-latency operations.
Efficient lower-level data structures.
Redis offers a variety of efficient, built-in data structures such as strings, lists, sets, hashes, and more.
These structures are finely tuned for speed and memory efficiency.
21. Simple – Strongly vs Weakly Typed Language
Describe how programming language handle data types and type conversion
Strongly type Language
Strict rules enforced regarding data types.
Variables tied to data type. Explicit casting required.
E.g., java, c++
Weakly type Language
Flexible data types.
Language performs implicit type conversion. No explicit casting required.
E.g., javascript, PHP, Perl
22. Simple – SPA vs Multi Page application
SPA
Load single page and dynamically update content as user interacts
User visits one page and load all other content using java script
Usually implemented with JavaScript framework like react, angular
E.g., YouTube, Gmail
MPA
Load separate HTML pages for each interaction
23. Simple – Concurrency vs Parallelism
Concurrency is illusion
Conceptual design of program to make order independent
Parallelism
Multiple computing resources like multi core
24. Simple – Latest Java version
19
Preallocated hashmaps – new methods
Map<K,V> map = new HashMap(100) , does it store 100 elements, no because load factor .75
On 76th value insert, rehashing will be done. So to store 100, we needed new HashMap(140)
In java 19, we have new HashMap(100)without caring for rehashing.
Conceptual design of program to make order independent
Virtual threads
Scale by RT library, not OS context switching
Thread.builder().virtual().factory()
25. Medium – How to diagnose a mysterious process that’s
taking too much CPU, memory, IO, etc?
Vmstat
reports information about processes, memory, paging, block IO, traps, and CPU activity.
Iostat
reports CPU and input/output statistics of the system.
Netstat
displays statistical data related to IP, TCP, UDP, and ICMP protocols.
Lsof
lists open files of the current system.
Pidstat
monitors the utilization of system resources by all or specified processes, including CPU, memory, device IO, task switching,
threads, etc
26. Medium – Lambda/Functional Programming
Lambda Expression (LE)
Short block of code code/function which takes parameters and returns value
LE basically express instances of functional interfaces
Functional Interface
- An interface with a single abstract method is called a functional interface
Features of Lambda Expression
Immutability
Pure functions
Pure function
Output/return based only on input
No side effect of state change
Java Streams
Sequence of values
Declarative style
Way of processing collections. Specifying what we want, leaving to scheduler how to do, whether parallel or not
(+) Readability
(+) Encourages immutability
Lambda Expression vs Anonymous Class
Class vs methods
- Anonymous classes => creating instances of classes without explicitly defining a named class.
- Lambda expressions => way to represent anonymous functions/methods, typically for functional
interfaces, providing shorter syntax for single-method interfaces.
Instantiation and declaration
- Anonymous class can be instantiated, and we can declare instance variables
- Lambda expressions cannot be instantiated, and we cannot declare instance variables
Use case
- Add or extend functionality
e.g., button.addListener(new ActionListener (){public void action(){….}})
- Concise/cleaner way to create for loop/stream values
List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5)
numbers.forEach(n -> System.out.println(n))
Functional programming vs OOPS
Functional programming is the form of programming that attempts to avoid changing state and mutable data.
In a functional program, the output of a function should always be the same, given the same exact inputs to the
function.
How to achieve in Java
Lambda Expression
Functional Interface
Stream API
Immutability
Functional vs OOPS
Declarative vs Imperative
- Functional programming follows declarative approach
- Functional programming uses immutable data to tell the program exactly what to do.
- OOPS tells the program how to achieve results through objects altering the program's state.
Immutability vs Mutability
- Functional: Fundamental unit is function and input. Data is immutable
- OOPS: Fundamental elements and objects and methods. Data state changes.
27. Medium – Terminal and non-terminal operations in
Streams
Terminal
Works on stream but NOT generate next streams
Terminate
E.g., println() , collect(), reduce()
Non-Terminal
Operations or functions produces streams and don’t terminate itself
E.g., map().println();
Map() vs flatMap()
Map() is used for transformation.
Transform a stream to another stream
E.g., List<String> names = Arrays.asList(“abc”,”bks”);
names.stream().map(String::toUpperCase) //TRANSFORMATION
FlatMap() is used for transformation + flattening
Produces a stream of stream value
List<List<String>> nestedList = Arrays.asList(Arrays.asList(“a”, “c”), Arrays.asList(“b”,
“k”));
List<String> flatList = nestedList.stream()
.flatMap(list -> list.stream())
.collect(Collectors.toList());
// Result: [a, c, b, k]