Ilya Verbitskiy

API Rate Limiting by IP in ASP.NET Core

Ilya Verbitskiy — Tue, 20 Oct 2020 00:00:00 GMT

When I build REST API, I often want to control user requests' frequency to prevent the API from being abused. A common approach is to enforce a rate limit on the number of API calls coming from an IP address over some time. The IP rate limit can help lower risks of DoS attacks or make your web-site scraping via a REST API a bit more complicated. In this article, I will show you how to implement rate-limiting in ASP.NET Core 3.1. It is relatively easy nowadays, let's see.

First, let's create a Web API project and implement the test API. Run your favorite shell and create a new project using dotnet tool.

dotnet new webapi -o RateLimiting

Test API has two endpoints: http://localhost:5000/sample/time and http://localhost:5000/sample/status.

[ApiController]
[Route("[controller]")]
public class SampleController : ControllerBase
{
    [HttpGet]
    [Route("time")]
    public TimeResponse GetTime()
    {
        var response = new TimeResponse { Time = DateTime.Now };
        return response;
    }

    [HttpGet]
    [Route("status")]
    public IActionResult GetStatus()
    {
        return Ok("OK");
    }
}

The Time API returns the current local server's time. The Status API simply always returns HTTP 200 OK.

Let's implement the following requirements:

The Time API allows only 2 requests/minute per IP. This rate does not make sense in the real world, but it is OK for testing purposes to see the actual rate limiting errors.
The Status API has no restrictions.

SP.NET Core has a solution already. The library is called AspNetCoreRateLimit. It is an open-source project hosted on GitHub and available on NuGet. AspNetCoreRateLimit adds rate limit support to ASP.NET Core applications based on the user's IP address or Client ID. In the article, I focus on IP-based rate limits. You can find the Client ID rate limits details in the project's wiki.

Go ahead and add the library to the test project.

dotnet add package AspNetCoreRateLimit

The library comes with IpRateLimitMiddleware middleware that should be configured in the project's Startup.cs file.

public void ConfigureServices(IServiceCollection services)
{
    // 1. add in-memory cache to store rate limit counters and ip rules
    services.AddMemoryCache();

    // 2. load general configuration from appsettings.json
    services.Configure<IpRateLimitOptions>(Configuration.GetSection("IpRateLimiting"));

    // 4. inject counter and rules stores
    services.AddSingleton<IIpPolicyStore, MemoryCacheIpPolicyStore>();
    services.AddSingleton<IRateLimitCounterStore, MemoryCacheRateLimitCounterStore>();

    services.AddControllers();

    // 5. the clientId/clientIp resolvers use IHttpContextAccessor.
    services.AddSingleton<IHttpContextAccessor, HttpContextAccessor>();

    // 6. AspNetCoreRateLimit configuration (resolvers, counter key builders)
    services.AddSingleton<IRateLimitConfiguration, RateLimitConfiguration>();
}

public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
{
    if (env.IsDevelopment())
    {
        app.UseDeveloperExceptionPage();
    }

    app.UseHttpsRedirection();

    // 7. enable AspNetCoreRateLimit middleware
    app.UseIpRateLimiting();

    app.UseRouting();

    app.UseAuthorization();

    app.UseEndpoints(endpoints =>
    {
        endpoints.MapControllers();
    });
}

Keep in mind that you should enable an ASP.NET Core cache. I use the simplest in-memory cache in the example. Also, you need to register IHttpContextAccessor to get the client's IP address.

The next step is adding IpRateLimiting configuration section to your appsettings.json.

"IpRateLimiting": {
    "EnableEndpointRateLimiting": false,
    "StackBlockedRequests": false,
    "RealIpHeader": "X-Real-IP",
    "HttpStatusCode": 429,
    "GeneralRules": [
        {
        "Endpoint": "*",
        "Period": "1m",
        "Limit": 2
        }
    ]
}

If EnableEndpointRateLimiting is set to false, then the limits will apply globally. It means that both GET and POST requests are counted towards 2 req/sec rate limit. If EnableEndpointRateLimiting is set to true, then the limits will apply for each endpoint as in {HTTP_Verb}{PATH}. It means that a user can call 2 GET requests and 2 POST requests to http://localhost:5000/sample/time API.

The RealIpHeader is used to extract the client IP when your Kestrel server is behind a reverse proxy. For example, NGINX uses the X-Real-IP header by default.

You will find more information in the project's wiki.

Everything is ready. Let's run the application and test it.

dotnet run

curl -v http://localhost:5000/sample/time

If the request passes the rate limit, you should see the following HTTP headers in the response.

X-Rate-Limit-Limit: 1m
X-Rate-Limit-Remaining: 0
X-Rate-Limit-Reset: 2020-10-20T09:13:10.6507500Z

Otherwise, the request is blocked, and you will get HTTP 429 Too Many Requests status code.

HTTP/1.1 429 Too Many Requests
Retry-After: 57
API calls quota exceeded! maximum admitted 2 per 1m.

Retry-After HTTP header tells you that you can retry the API call in 57 seconds. You can customize the response by changing HttpStatusCode and QuotaExceededMessage options in IpRateLimiting configuration section.

The rate limit is applied to all APIs in the application, and you cannot use the Status API more than twice in a minute.

curl -v http://localhost:5000/sample/time

HTTP/1.1 429 Too Many Requests
Retry-After: 57
API calls quota exceeded! maximum admitted 2 per 1m.

Let's implement the second requirement: the Status API has not restriction. You should add EndpointWhitelist option to IpRateLimiting configuration section and restart the application.

"EndpointWhitelist": [ "*:/sample/status" ]

Now, if you request http://localhost:5000/sample/status, you will not see X-Rate-Limit-* HTTP headers anymore.

AspNetCoreRateLimit has a lot more advanced scenarios like Client ID rate limits, etc., and I highly recommend checking their documentation.

API Rate Limiting by IP with NGINX

Ilya Verbitskiy — Sun, 17 Jan 2021 00:00:00 GMT

Last time I wrote about the HTTP requests rate limit in ASP.NET Core. It works well when you are hosting a simple REST API or website on Kestrel web-server. But you can achieve even more by hosting your application behind a reverse proxy server—the most popular one nowadays in NGINX. According to W3Techs it hosts 32.7% of websites at the beginning of 2021.

Before showing you how to set up the NGINX rate limits, I would like to discuss why Kestrel is not enough and why you may use a reverse proxy server. Kestrel is an excellent high-performance web-server. Well, it is even at the top of TechEmpower Web Framework Benchmarks. Kestrel supports HTTPS, HTTP/2 and comes with .NET Core. But, as with most built-in web-servers on the market, it does not support more advanced features like load balancing. Also, Kestrel is a bit slow when hosting static files. I had performance boosts for a few projects in the past by moving static content out of the ASP.NET Core application to NGINX and IIS. Of course, moving to a CDN, e.g., Amazon CloudFront, will give you the best performance.

Let's reuse the API I built in the previous article

[ApiController]
[Route("[controller]")]
public class SampleController : ControllerBase
{
    [HttpGet]
    [Route("time")]
    public TimeResponse GetTime()
    {
        var response = new TimeResponse { Time = DateTime.Now };
        return response;
    }

    [HttpGet]
    [Route("status")]
    public IActionResult GetStatus()
    {
        return Ok("OK");
    }
}

Our deployment will include a public NGINX proxy server that passes traffic to the .NET API. The next step is to prepare a Dockerfile that builds and runs our application on .NET 5.

# build the app
FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src

COPY . .
RUN dotnet restore
RUN dotnet publish -c release -o /app --no-restore

# build the final image
FROM mcr.microsoft.com/dotnet/aspnet:5.0
WORKDIR /app
COPY --from=build /  app ./
ENTRYPOINT ["dotnet", "RateLimiting.dll"]

The next step is to run the NGINX proxy server. I usually use the official NGINX image from Docker Hub. By default, it is ready to serve static content, but you can customize it by placing your default.conf or any domain-specific configuration files to /etc/nginx/conf.d folder.

server {
    listen       80;
    server_name  localhost;

    location /time {
        proxy_pass http://rate_api/sample/time;
    }

    location /status {
        proxy_pass http://rate_api/sample/status;
    }
}

It is the Dockerfile to build the NGINX image for the project.

FROM nginx
WORKDIR /etc/nginx/conf.d/
COPY default.conf .

The final step is to create a Docker Compose file to run the services.

version: "3.9"
services:
  web:
    build: ./nginx
    ports:
      - "8080:80"
    links:
      - rate_api
  rate_api:
    build: ./RateLimiting

Let's implement the requirements from the previous article:

The Time API allows only two requests/minute per IP. This rate does not make sense in the real world, but it is OK for testing purposes to see the actual rate limiting errors.
The Status API has no restrictions.

NGINX support three possible limits:

The number of connections per IP address
The request rates limit, e.g., total requests per IP address per second.
The download speed per client connection

You should configure the limit using limit_conn_zone and limit_req directives. First, you use the limit_conn_zone to define a key (usually an IP address), a shared memory zone to store each IP address's state and how often the URL has been requested, and the expected requests rate. The request rate value is either requests/second (r/s) or requests/minute (r/m) if a rate of less than one request per second is desired.

The next step is to apply the desired rate limit to a route, e.g., /time endpoint, using limit_req directive withing a location context.

limit_req_zone $binary_remote_addr zone=time_api:10m rate=2r/m;

server {
    ...
    location /time {
        limit_req zone=time_api;
    }
    ...
}

In the following example, I allocated 10 MB to keep requests counter per IP address. Please, pay attention to the fact that I used $binary_remote_addr variable as a key instead of $remote_addr, which also holds a client's IP address. The reason for doing this is $binary_remote_addr variable holds the binary representation of IP address, which requires less memory and more efficient.

Sometimes you may want to test the limits first before enabling them on a production server. You can do that by adding limit_req_dry_run on; directive to your context.

location /time {
    limit_req zone=time_api;
    limit_req_dry_run on;
    ...
}

Once an IP address reaches the limit, you should see the error messages in NGINX logs, but the request will pass through anyway.

web_1       | 172.27.0.1 - - [17/Jan/2021:13:29:58 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"
web_1       | 2021/01/17 13:29:59 [error] 28#28: *3 limiting requests, dry run, excess: 0.963 by zone "time_api", client: 172.27.0.1, server: localhost, request: "GET /time HTTP/1.1", host: "localhost:8080"
web_1       | 172.27.0.1 - - [17/Jan/2021:13:29:59 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"
web_1       | 2021/01/17 13:30:00 [error] 28#28: *5 limiting requests, dry run, excess: 0.918 by zone "time_api", client: 172.27.0.1, server: localhost, request: "GET /time HTTP/1.1", host: "localhost:8080"
web_1       | 172.27.0.1 - - [17/Jan/2021:13:30:00 +0000] "GET /time HTTP/1.1" 200 55 "-" "curl/7.74.0" "-"

By default, once the number of requests exceeds the specified rate, NGINX will respond with an error.

$ curl -v http://localhost:8080/time
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:8080...
* Connected to localhost (::1) port 8080 (#0)
> GET /time HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 503 Service Temporarily Unavailable
< Server: nginx/1.19.6
< Date: Sun, 17 Jan 2021 13:37:35 GMT
< Content-Type: text/html
< Content-Length: 197
< Connection: keep-alive
<
{ [197 bytes data]
100   197  100   197    0     0  19700      0 --:--:-- --:--:-- --:--:-- 21888<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body>
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx/1.19.6</center>
</body>
</html>

* Connection #0 to host localhost left intact

Sometimes you may want to keep the requests beyond the allowed limit in a queue and execute them later. It is doable with the burst parameter of the limit_req directive.

The final NGINX configuration file looks as follows:

limit_req_zone $binary_remote_addr zone=time_api:10m rate=2r/m;

server {
    listen       80;
    server_name  localhost;

    location /time {
        limit_req zone=time_api burst=10;
        proxy_pass http://rate_api/sample/time;
    }

    location /status {
        proxy_pass http://rate_api/sample/status;
    }
}

Now you can run the services on your own and test them.

$ docker-compose up -d
Creating network "nginx_demo_default" with the default driver
Creating nginx_demo_rate_api_1 ... done
Creating nginx_demo_web_1      ... done
$ curl -v http://localhost:8080/time
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:8080...
* Connected to localhost (::1) port 8080 (#0)
> GET /time HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 200 OK
< Server: nginx/1.19.6
< Date: Sun, 17 Jan 2021 13:44:50 GMT
< Content-Type: application/json; charset=utf-8
< Transfer-Encoding: chunked
< Connection: keep-alive
<
{ [55 bytes data]
100    44    0    44    0     0   3142      0 --:--:-- --:--:-- --:--:--  3142{"time":"2021-01-17T13:44:50.7810712+00:00"}
* Connection #0 to host localhost left intact

In this article, I showed how you could set up an ASP.NET Core REST API behind the NGINX proxy, run it using Docker and Docker Compose, and use NGINX built-in features to setup request rate limits. Please consult with NGINX official documentation if you need more details on rate limits capabilities or message me if you a looking for a consultant to help with a project.

Building AI-Native Engineering Teams Without Losing Engineering Discipline

Ilya Verbitskiy — Sat, 23 May 2026 00:00:00 GMT

How small, senior teams can ship in weeks — without turning the codebase into a liability

AI is changing the economics of software delivery. The visible change is speed: engineers can now generate code, tests, documentation, refactors, migrations, prototypes, and debugging hypotheses far faster than before. But the bigger change is not speed alone. The bigger change is that software development is moving from a human-only production system to a human-plus-agent production system.

That changes the operating model.

A traditional engineering team is organized around people writing, reviewing, testing, and shipping code. An AI-assisted team adds coding tools to this existing workflow. An AI-native team goes further. It redesigns the workflow so that AI participates across planning, design, implementation, testing, review, documentation, deployment, and operations.

This distinction matters because the bottleneck is moving. In many teams, writing code is no longer the slowest part of the process. The slower and more valuable work is deciding what should be built, making that intent unambiguous, constraining implementation, verifying correctness, and keeping the system coherent as the amount of generated change increases.

The winning AI-native teams will not be the teams that generate the most code. They will be the teams that can turn business intent into precise specifications, domain models, tests, architectural constraints, and production systems faster than competitors while preserving engineering discipline.

This is the difference between “vibe coding” and AI-native engineering.

Vibe coding is useful for exploration. It helps founders, product managers, and engineers move from idea to prototype with remarkable speed. For early discovery, that is valuable. But production software has a longer life than the prompt that created it. It must be operated, debugged, extended, secured, audited, and understood by people who were not present when the first version was generated.

That is where discipline returns.

The paradox of AI-native engineering is that the faster the team moves, the more discipline it needs. No more bureaucracy. Not a heavyweight process. Not architecture theatre. But sharper engineering discipline: clearer domain language, stronger specifications, test-first development, architectural boundaries, automated validation, human review at the right points, and a curated knowledge base that both humans and AI can use.

The goal is not to slow teams down. The goal is to build teams that can ship meaningful products in weeks, not months, without accumulating architectural debt so quickly that the second product becomes harder than the first.

AI changes engineering velocity, but not engineering responsibility.

For the last decade, most software organizations optimized around developer throughput. They adopted better frameworks, cloud platforms, CI/CD, DevOps, infrastructure-as-code, reusable design systems, and agile delivery practices. AI changes the equation again by compressing many implementation tasks that used to consume large parts of the engineering calendar.

An AI-native team can ask agents to draft a service, generate test cases, refactor a module, explain unfamiliar code, create migration scripts, identify edge cases, write documentation, summarize logs, or propose implementation plans. This does not mean all of that output is production-ready. It means the cost of producing a first draft has fallen dramatically.

That creates a new form of leverage. A small team can now cover more surface area than before. A three- or four-person team with strong product judgment, engineering fundamentals, and AI fluency can often outperform a much larger traditional team, especially in early-stage product development.

But there is a dangerous misunderstanding here. AI-native does not mean “replace engineers with agents.” It means engineers increasingly design the system of work in which agents operate.

In a traditional workflow, a developer receives a task, writes code, tests it, reviews it, and ships it. In an AI-native workflow, the developer spends more time defining the problem, shaping the domain model, writing the specification, designing the verification strategy, reviewing generated output, and deciding whether the implementation fits the system.

That does not make engineering easier. It makes weak engineering more visible.

If a team has unclear requirements, weak architecture, poor tests, inconsistent coding standards, no security discipline, and tribal knowledge scattered across Slack, tickets, and people’s heads, AI will not fix that. It will amplify it. The team will ship faster, but it will also generate mistakes faster.

AI amplifies the team's operating system. Good teams get faster. Undisciplined teams get messier.

Faster teams can create architectural debt faster

The most common failure mode in AI-assisted development is not that the AI writes obviously broken code. The obvious problems are usually caught. The deeper failure mode is that the AI writes plausible code that subtly changes behavior, introduces inconsistent patterns, violates architectural boundaries, mishandles edge cases, or makes assumptions nobody reviewed.

This happens because AI fills gaps.

When intent is not written down, the model infers it. It chooses retry behavior, error handling, naming, state transitions, validation rules, dependency patterns, and abstractions based on the prompt, the codebase, and its training. Those decisions may look reasonable in isolation but still be wrong for the product, the architecture, or the business domain.

This is why AI-native engineering requires a stronger system of record for intent.

In many traditional teams, the real intent behind a feature lives in conversations, ticket comments, product intuition, and code review discussions. That is already risky with human-only delivery. With AI-generated implementation, it becomes a structural problem because the “author” of new behavior may not understand the unstated reasoning behind the system.

A prompt is not a system of record. A chat thread is not architecture. A generated diff is not an intent.

If a team wants AI-native speed without long-term damage, it needs durable artifacts that clearly express intent for humans and agents to reuse. These artifacts do not need to be heavy. In fact, they must be lightweight enough to maintain. But they need to exist.

The AI-native team’s first discipline, therefore, is not coding. It is making intent explicit.

What engineering discipline still means

Engineering discipline in AI-native teams is not nostalgia for old processes. It is not an argument for heavy Scrum, large architecture committees, or months of upfront design. It is the minimum structure required to make fast work safe.

The core disciplines remain familiar. The architecture discipline keeps the system coherent as it grows. Testing discipline verifies that the generated code behaves correctly. Review discipline catches errors before production. Operational discipline ensures the code can be deployed, monitored, patched, and maintained. Security discipline prevents generated change from expanding the attack surface.

This is the natural path of maturation for any powerful engineering movement. Early excitement focuses on speed and accessibility. Sustainable practice adds quality, architecture, operations, and security. Agile matured this way. Cloud matured this way. AI-native engineering is maturing the same way.

The early phase is excitement. The mature phase is discipline.

For founders and CTOs, this matters because AI can create an illusion of progress. A prototype that looks impressive in week two may hide structural weaknesses that become expensive in month six. The question is not “Can the team generate a working demo?” The better question is: “Can the team generate a working product that remains understandable, secure, testable, and extensible after ten more iterations?”

That requires an engineering system, not just tools.

Domain-Driven Design becomes the language between humans and AI

Domain-Driven Design should become one of the foundations of serious AI-native engineering.

DDD is often misunderstood as an enterprise architecture technique or something only large organizations need. In reality, its value becomes even more important when AI is involved because DDD gives humans and AI a shared language for the business.

The central concept is ubiquitous language: a precise vocabulary shared by domain experts, product people, engineers, and now AI agents. Terms such as “Order,” “Policy,” “Claim,” “Settlement,” “Subscription,” “Entitlement,” “Risk Score,” or “Invoice Adjustment” should mean the same thing in conversations, specifications, tests, code, and documentation.

This matters because AI performs better when the team gives it structured, domain-specific context. A vague prompt such as “fix the payment logic” leaves too much room for interpretation. A domain-aware instruction, such as “Update the Billing context so the Invoice aggregate applies LateFeePolicy only after the grace period has expired,” gives the model a much stronger frame.

Bounded contexts are equally important. AI agents struggle when asked to reason across too much code, too many concepts, and too many responsibilities at once. DDD helps by slicing the system into coherent domains with clear boundaries. The model does not need to understand the entire company to implement a change in Billing, Identity, Inventory, Scheduling, or Compliance. It needs the relevant context, rules, interfaces, and constraints for that bounded context.

This is not only good architecture. It is good context engineering.

AI-native teams should therefore treat DDD as a practical communication system. The domain model describes the business concepts. The bounded context defines where those concepts apply. The aggregate protects invariants and transactional consistency. The ubiquitous language keeps humans and AI aligned. The tests express expected behavior in domain terms. The spec turns business intent into an implementable contract.

In other words, DDD becomes the grammar of AI-native development.

This is especially powerful in startups moving from prototype to product. Early-stage systems often start as thin CRUD applications or workflow automations. As customers, pricing, permissions, compliance needs, integrations, and operational edge cases accumulate, the domain becomes more complex. Without explicit modeling, the codebase becomes a patchwork of AI-generated behavior. With DDD, the team has a structure for deciding where complexity belongs.

A specialized AI-native team should not merely prompt agents to “build features.” It should teach agents the domain language and constrain them to work inside the domain model.

Spec-driven development turns intent into a contract

If DDD gives the team a shared language, spec-driven development gives the team a contract.

In AI-native engineering, a good spec is not a forty-page requirements document. It is a structured, reviewable artifact that clearly describes a slice of system behavior so that two competent engineers, or two different agents, would build roughly the same thing.

A useful feature spec usually covers the business intent, relevant domain context, user or system behavior, edge cases, error-handling rules, security and privacy constraints, acceptance criteria, required tests, and any rollout or migration considerations. The purpose is not to create documentation for its own sake. The purpose is to prevent the model from inventing behavior.

This changes how teams review work. Instead of asking only, “Does this code look right?” the reviewer asks, “Does this implementation satisfy the spec?” If not, there are two possibilities: the implementation is wrong, or the spec was incomplete. Either way, the durable artifact improves.

For high-velocity startup teams, a spec-anchored model is usually more practical than an extreme spec-as-source model. In a spec-anchored workflow, the spec lives alongside the code and evolves as the behavior changes. Humans can still edit code, but behavioral changes require spec updates. This gives the team enough discipline without turning the entire system into a code generation experiment.

The definition of done should include a simple rule:

If behavior changed, the spec changed.
If the spec changed, the tests changed.
If the tests changed, CI proves the system still works.

This is how teams move fast without losing the plot.

Test-driven development becomes non-negotiable

Test-driven development becomes more important in AI-native engineering, not less.

When humans write code manually, tests verify human implementation. When agents generate code, tests also become steering constraints. They tell the agent what correct behavior means. Without tests, the agent is optimizing for plausibility. With tests, the agent is optimizing against executable expectations.

This is why a strong AI-native workflow should often start with failing tests before production code. The process starts with a clear specification. The key scenarios are then translated into failing tests. The agent implements until those tests pass. A human reviewer, and often a second model, challenges both the implementation and the tests. Only after that should the full pipeline decide whether the change is ready to merge.

The tests should be human-readable. Generated implementation may be verbose or mechanically structured, but tests must remain understandable because they are the executable expression of intent. A founder, CTO, product engineer, or senior developer should be able to read the test names and scenarios and understand what the system promises to do.

The test pyramid may also shift. For AI-native product teams, end-to-end tests and integration tests often become more important because agents can easily produce code that passes isolated unit tests while failing across real workflows. Unit tests still matter, especially around domain logic and aggregates, but the system needs strong verification at the user journey, API contract, integration, and security boundary levels.

A practical AI-native testing strategy should include:

domain-level unit tests for aggregates, policies, and rules
integration tests for service interactions and persistence behavior
contract tests for APIs and external dependencies
end-to-end tests for critical user journeys
security tests for authentication, authorization, injection, secrets, and data exposure

The team should also convert production defects into regression tests and consider property-based or fuzz testing for complex input spaces.

This is not overhead. This is the control system that lets the team move quickly.

The faster the code is produced, the more automated verification matters.

The AI-native development lifecycle

AI-native delivery still has familiar stages: problem definition, design, implementation, testing, review, documentation, deployment, and maintenance. What changes is how much of the middle can be accelerated and how much structure is needed at the boundaries.

A strong lifecycle starts with problem framing. The team defines the customer problem, business outcome, constraints, risks, and success measures. AI can help analyze customer feedback, support tickets, usage data, competitor flows, or product notes, but humans decide what matters.

The next stage is domain modeling. The team identifies the relevant bounded context, domain concepts, invariants, workflows, and language. AI can propose models, but domain experts and senior engineers validate them. This is where business understanding and technical design begin to merge.

The specification then turns the domain understanding into an implementable contract. AI can draft and critique the spec, but humans approve it before implementation. This is an important boundary. Once implementation starts, ambiguity becomes expensive.

After the spec is approved, the team designs the tests. The critical scenarios become failing tests before production code is written. AI can generate test cases, but humans must ensure those tests cover real business risk, not just happy paths.

Implementation is then delegated as much as possible, but not without constraints. Agents should work inside a defined context: relevant files, architecture rules, coding standards, dependency policies, security requirements, and test expectations. Open-ended prompts such as “build the feature” are weaker than targeted implementation tasks grounded in the spec and domain model.

Review is not a casual glance at a generated diff. Generated code must be reviewed against the spec, architecture, security requirements, and tests. A second model can be useful for critique, especially for edge cases and security concerns, but human review remains essential for judgment.

Deployment must also be disciplined. CI/CD should validate formatting, types, tests, security scanning, dependency checks, infrastructure changes, and deployment safety. Feature flags, staged rollouts, preview environments, and rollback procedures reduce blast radius when teams move quickly.

Finally, production learning should feed the system. Telemetry, defects, user behavior, support tickets, and operational incidents should be used to improve specs, tests, documentation, and runbooks. In a mature AI-native team, every release leaves the system easier to understand and safer to change.

This lifecycle is not waterfall. It is iterative. The difference is that each loop produces durable artifacts: better specs, better tests, better domain models, better context, and better operational knowledge.

Context engineering is the infrastructure most teams are missing

AI-native teams do not only need coding tools. They need context infrastructure.

Agents are only as useful as the context they receive. In a real product, context lives across source code, specs, architecture decisions, tickets, documentation, design files, incidents, logs, deployment history, and team memory. Simply connecting an agent to everything does not solve the problem. It often makes the problem worse because the agent drowns in outdated, irrelevant, or contradictory information.

Productive teams need curated knowledge: architecture components, dependencies, naming conventions, code style guides, implementation patterns, security protocols, and the project knowledge a capable engineer needs to be productive. This is one of the most underappreciated parts of AI-native engineering.

A serious team should maintain a project knowledge layer. At minimum, this can be a well-structured documentation directory that explains the product, domain glossary, bounded contexts, main architecture decisions, coding standards, testing strategy, security rules, infrastructure model, API contracts, and operational runbooks.

The team should also maintain agent instruction files, such as AGENTS.md, CLAUDE.md, or equivalent tool-specific context files. These should not be generic motivational notes. They should tell the agent how the system is structured, what patterns to follow, what not to do, what commands to run, how to test, how to handle migrations, and which security rules are mandatory.

Context engineering is not “prompt engineering” in the narrow sense. It is the design of the knowledge environment in which agents operate.

For advanced teams, this evolves further into internal retrieval systems, repository-aware agents, code indexing, documentation generation, architectural rule checking, and knowledge graphs. This is where graph databases can become valuable. They can model relationships between domain concepts, services, APIs, data entities, owners, dependencies, incidents, and requirements. For complex products, this can help both humans and agents navigate the system more reliably.

The team that invests in context gets compounding returns. Each new feature improves the knowledge base. Each incident creates new tests and runbooks. Each architectural decision imposes stronger constraints on future agents. Over time, the team builds a delivery system that becomes easier to work with, not harder.

Team design: small, senior, specialized, and highly accountable

AI-native teams should be small, but not junior.

The ideal core team is often three to five people with overlapping capabilities: a product-minded technical lead, one or two senior full-stack or product engineers, a platform or security-minded engineer, and a designer or product person, depending on the product stage. For more specialized products, the team may also need domain experts, data engineers, ML engineers, or compliance specialists. But the organization should avoid recreating a large traditional delivery model with separate queues for product, design, backend, frontend, QA, DevOps, security, and architecture.

AI-native teams work best when they own full product capabilities and can make local decisions quickly.

This does not mean everyone does everything poorly. It means the team is accountable for the whole outcome. Specialists still matter, but handoffs must be reduced.

The most important human skills are not disappearing. They are becoming more valuable. Product judgment helps the team choose the right problems to solve. Domain modeling helps the team clearly express business reality. Architectural thinking keeps the system coherent. Security awareness prevents avoidable risk. Testing discipline turns intent into verification. Code review skill protects maintainability. AI tool fluency accelerates execution. Context design improves agent performance. Operational maturity keeps the product reliable after release.

Taste also matters. AI can generate many possible implementations. The team needs people who can choose the right one: simpler, safer, more maintainable, more aligned with the product, and more consistent with the domain.

The role of leadership changes as well. The CTO or engineering leader must not become the bottleneck for every decision. Instead, leadership defines principles, constraints, standards, and review mechanisms. Teams should be empowered to move quickly inside clear guardrails.

Meetings should be kept to a minimum, but alignment must be strong. The team needs fewer status meetings and more design reviews, spec reviews, architecture reviews, and quality retrospectives when the system shows drift.

This is not less management. It is a different management shape: more clarity upfront, fewer interruptions during execution, and stronger verification at boundaries.

Guardrails allow autonomy

Guardrails are what allow autonomy.

An AI-native team should not rely solely on individual disciplines. It needs automated controls that make the right behavior easy and risky behavior visible. Strong typing, strict linting, formatting, pre-commit checks, CI test gates, dependency scanning, secret scanning, static analysis, infrastructure policy checks, code ownership rules, required review for sensitive areas, feature flags, observability standards, and rollback procedures all become more important when the volume of generated change increases.

These practices matter because AI can drive significant change quickly. A human may hesitate before adding a new dependency. An agent may add one because it solves the immediate problem. A human may remember a security rule from a past incident. An agent may not unless that rule is in context and enforced by tooling.

For startups, the right question is not “How much governance do we need?” The right question is “Which controls let us ship faster without creating avoidable risk?”

Good guardrails reduce review burden. They also make AI more effective by allowing agents to receive fast feedback. If a generated implementation violates typing, linting, tests, or policy checks, the agent can quickly correct it.

This is why AI-native teams should invest early in CI/CD, test environments, preview deployments, containerized development, and automated quality checks. These practices were always valuable. AI makes them urgent.

Shipping in weeks, not months

To ship products in weeks, not months, the team needs to compress the right parts of the lifecycle while protecting the parts that require judgment.

Implementation can be compressed. Boilerplate can be compressed. Test generation can be compressed. Documentation drafts can be compressed. Refactoring can be compressed. Environment setup and operational analysis can often be compressed.

Product judgment should not be compressed. Domain understanding should not be compressed. Security thinking should not be compressed. Architecture decisions should not be compressed. Acceptance criteria and verification should not be compressed.

A practical two-week AI-native feature cycle might start with one or two days of product framing, domain discussion, and risk identification. The next step is a spec draft, AI critique, and human refinement. Once the spec is stable, the team creates an architecture sketch, test plan, and acceptance criteria. Implementation then proceeds through agent-assisted TDD, with humans reviewing at important boundaries. The final days are used for integration, security review, UX polish, staging deployment, observability checks, customer or internal validation, and production release behind feature flags.

This is achievable for well-scoped product slices. It is not achievable if every feature starts with vague requirements, unclear ownership, missing test infrastructure, and a codebase that agents cannot understand.

Speed is a system property.

The teams that ship in weeks don't improvise every time. They have reusable patterns, templates, domain language, test frameworks, deployment pipelines, agent instructions, and product decision mechanisms. AI accelerates the work because the work is already structured.

Metrics for AI-native engineering

Founders and executives should be careful with productivity metrics. Lines of code, number of commits, number of pull requests, or story points can become even more misleading in the AI era. Generated output is cheap. Impact is not.

A better measurement system should combine delivery speed, product impact, and engineering health. On the delivery side, leaders should track the lead time from approved specification to production release, the percentage of work shipped with updated specs, and the time it takes to move from customer signal to validated product change. These metrics show whether AI is actually shortening the path from intent to working software, rather than simply increasing development activity.

Quality metrics should focus on whether the team is preserving reliability as speed increases. Escaped defects, change failure rate, production incidents related to recent changes, and mean time to recovery are more useful than raw output measures. In an AI-native team, these indicators reveal whether generated code is being properly constrained, tested, and reviewed before it reaches users.

The team should also measure the health of its AI-native operating model. Useful indicators include the percentage of behavioral changes accompanied by updated specs, test coverage for critical workflows, review findings by category, and the amount of rework caused by unclear requirements. Over time, the team should expect fewer repeated review comments, fewer ambiguous implementation debates, and a higher success rate for well-scoped agent tasks.

The goal is not to prove that AI makes every engineer “10x.” The goal is to understand whether the organization is delivering more customer value with equal or better quality.

AI-native engineering should be measured by product outcomes and system health, not by activity.

AI-native does not mean undisciplined

AI-native engineering is not about replacing professional engineering with prompts. It is about redesigning engineering so that humans and AI work together at the right level.

AI is very good at generating possibilities. It is increasingly good at implementation, refactoring, testing, documentation, and analysis. But product companies do not win by generating the most possibilities. They win by choosing the right problems, clearly expressing intent, building coherent systems, and learning faster than competitors.

That requires discipline.

The AI-native team of the near future will look less like a large ticket-processing machine and more like a small, senior product engineering cell with strong domain language, explicit specs, rigorous tests, curated context, automated guardrails, and high ownership. It will use AI heavily, but it will not outsource judgment. It will ship quickly because the system around the agents is designed for speed and verification.

The companies that understand this will move faster without becoming fragile. They will turn AI from a productivity toy into an engineering capability. They will ship in weeks, not months, because they will stop treating AI as an autocomplete layer and start treating it as part of a disciplined delivery system.

The future of software delivery is not vibe coding. It is not a heavyweight process either.

It is disciplined AI-native engineering.

Stop Burning Tokens: A Practical Guide to Using Claude and Claude Code Efficiently

Ilya Verbitskiy — Fri, 12 Jun 2026 00:00:00 GMT

Claude and Claude Code can dramatically improve the way business users, analysts, architects, product managers, and software developers work. They can write documents, analyze requirements, review code, design systems, automate repetitive tasks, and help teams move faster. But there is a hidden operational discipline behind effective usage: token management.

Most people do not fail with Claude because they write “bad prompts.” They fail because they unintentionally create expensive conversations. They paste too much context. They keep long sessions alive after the useful work is finished. They attach large documents when a brief excerpt would suffice. They ask Claude Code to “look around the repo” instead of pointing it to the right files. They let MCP servers, plugins, skills, subagents, and memory files accumulate until every message carries unnecessary baggage. Then they are surprised when they hit usage limits, consume too much quota, or see API costs grow faster than expected.

This article explains how to use Claude and Claude Code effectively from a token-usage perspective. It is written for both business users and software developers. Business users need to understand how conversations, files, documents, and repetitive work consume tokens. Developers need to understand how Claude Code uses context, how codebase exploration becomes expensive, how model choice affects usage, and how to structure sessions so the model spends its budget on reasoning and implementation rather than on re-reading irrelevant history.

1. The core idea: tokens are not just what you type

The first mental model to understand is that token usage is not limited to the words in your latest prompt.

A token is a small unit of text processed by the model. In practice, a token can be part of a word, a whole word, punctuation, whitespace, code syntax, or structured data. For everyday understanding, it is enough to think of tokens as the “text units” Claude reads and writes.

The mistake many users make is assuming that a short prompt is always cheap. It is not. In a long conversation, Claude not only processes your latest message. It also needs relevant conversation history, system instructions, tool definitions, memory files, attached content, previous outputs, and sometimes other context loaded by the application or development environment. This is why a simple follow-up question in a long session can cost much more than the same question in a fresh session.

For business users, this means a conversation about a strategy document, proposal, contract, or report can become expensive if it contains many previous drafts, attachments, rewrites, comments, and side discussions. For developers, this means a Claude Code session can become expensive because the model may carry previous file reads, command outputs, logs, diffs, test failures, architectural discussion, and implementation attempts into later turns.

The practical lesson is simple: token usage compounds with context. The longer and noisier the context, the more every future message costs.

Claude Code already includes cost-management features such as prompt caching, auto-compaction, usage reporting, model selection, context inspection, and background summarisation. These features help, but they do not remove the need for disciplined workflow design. The best users treat context as a working set, not as an infinite notebook.

2. Why token discipline matters for business users

Business users often experience token waste differently from developers do. They may not see a terminal or token counter. They simply notice that the assistant becomes slower, less focused, or hits usage limits. The root causes are usually predictable.

The first cause is large attachments. A PDF, Word document, spreadsheet, slide deck, screenshot, or exported web page may contain far more hidden content than the user expects. A document may include metadata, formatting, tables, repeated headers, footers, comments, images, and irrelevant sections. When users upload the whole file and ask a narrow question, Claude may have to process far more information than the task requires.

The second cause is repeated rewriting. Business users often ask for “make it better,” “make it more executive,” “make it shorter,” “now make it more formal,” “now add more detail,” and so on. Each iteration may carry the full previous conversation and previous drafts. If the user asks Claude to rewrite the entire document every time, output tokens grow quickly. A better approach is to work section by section and ask for targeted changes.

The third cause is unclear scope. A vague request such as “analyze this business case” or “review this strategy” encourages the model to read widely, infer context, and produce broad commentary. A precise request, such as “review only the executive summary for clarity, decision logic, and missing financial assumptions,” is usually cheaper and better.

The fourth cause is unnecessary politeness and filler. This does not mean users should be rude. It means they should avoid long ritual prompts full of non-functional text. Claude does not need two paragraphs of ceremony before every instruction. In a long session, repeated fillers add up.

For business users, token-efficient prompting usually means:

Provide the minimum context required for the decision.
Identify the exact output format.
Specify what not to rewrite.
Ask for changes to a section rather than regenerating the whole document.
Start a new conversation when the topic changes.
Summarise or extract only the relevant parts of large files before requesting analysis.
Avoid keeping old drafts, side discussions, and unrelated decisions in the same chat.

This is not only about cost. It improves quality. Claude performs better when the important signal is not buried inside an irrelevant context.

3. Why token discipline matters even more in Claude Code

Claude Code is more powerful than a normal chat session because it can work with your repository, read files, run commands, edit code, analyze errors, and iterate. That power also creates more ways to spend tokens.

A software development session may include:

project instructions from CLAUDE.md
conversation history
files Claude has read
search results
shell command outputs
compiler errors
test output
logs
diffs
tool definitions
MCP server metadata
plugin context
subagent summaries
planning notes
implementation attempts
user corrections

If you ask Claude Code to make a change without giving a clear scope, it may inspect many files, run broad searches, read irrelevant modules, execute tests with verbose output, and carry all of that context forward. The result can be high token usage before any useful code is written.

This does not mean Claude Code is inefficient. It means agentic coding needs a workflow. A human developer does not open every file in the repository before changing one function. A good developer narrows the problem. Claude Code needs the same guidance.

A poor prompt is:

Fix the authentication system.

A better prompt is:

The refresh token flow returns 401 after the access token's expiry. Start with src/auth/refresh.ts, src/auth/session.ts, and the tests under tests/auth. Do not refactor unrelated login code. First, explain the likely cause, then propose a minimal change and test plan.

The second prompt saves tokens by narrowing the search space. It also reduces the chance of expensive rework.

4. Track usage before optimizing blindly

The first step is measurement. Without measurement, token optimization becomes superstition.

Claude Code provides the /usage command. It shows token usage statistics for the current session. For API users, it can estimate costs based on local token counts, though actual billing should be verified in the Claude Console. For Pro, Max, Team, or Enterprise plans, /usage also shows plan usage information, activity statistics, and usage breakdowns. It can attribute recent usage to skills, subagents, plugins, and individual MCP servers. The numbers are approximate and local to the machine, but they are useful for understanding what is consuming context.

Developers should use /usage regularly, especially after:

opening a large repository;
adding MCP servers;
enabling plugins or skills;
running large test suites;
reading logs;
spawning subagents;
working in a long session;
using plan mode for a complex task;
attaching documents or screenshots.

Claude Code also supports /context, which helps identify what is consuming the context window. This is important because token waste is often hidden. The user may think the prompt is small, but the session may already contain a large CLAUDE.md, active MCP definitions, plugin context, previous command outputs, and a long conversation history.

For teams, measurement should be part of the rollout. Anthropic’s official cost guidance recommends establishing a baseline with a small pilot group before wider adoption. Per-developer cost varies by model choice, codebase size, usage patterns, the number of instances, automation, and agent teams. A small pilot reveals whether the team’s usage pattern is lightweight, moderate, or heavy.

For organizations using API billing, workspace spend limits and usage reporting should be configured in the Claude Console. For subscription users, the relevant experience is plan usage and usage credits. On some plans, /usage-credits can be used to set monthly spend limits for usage credits. For enterprise environments using Bedrock, Vertex, or other gateways, organizations may need external tracking because usage metrics may not be returned from the cloud provider.

The key principle is: do not optimize in the abstract. Measure first, then reduce the biggest sources of waste.

5. Model choice: use the right model for the job

Model selection is one of the most important cost decisions.

Opus model is the premium reasoning model and should be treated as a scarce resource. It is excellent for difficult architectural reasoning, complex planning, ambiguous debugging, deep design trade-offs, and high-stakes decisions. But not every action in a coding session needs Opus-level reasoning.

Sonnet handles most coding tasks well and is more cost-effective. For many implementation, refactoring, test-writing, documentation, and routine analysis tasks, Sonnet is the right default. Haiku can be useful for simple subagent tasks where speed and low cost matter more than deep reasoning.

A practical Claude Code model strategy is:

Use Sonnet as the default model for normal development.
Reserve Opus for complex reasoning, architecture, design, and planning.
Use Haiku for simple, isolated subagent tasks where appropriate.
Switch models intentionally with /model.

The most important workflow pattern is using Opus only where it creates the most value: planning.

In Claude Code, you can use:

/model opusplan

This sets the model behavior so that Opus is used in plan mode and Sonnet otherwise, and it can be saved as your default for new sessions. This is a powerful token-efficiency pattern because it lets you use Opus 4.8 for the thinking-heavy part of the workflow while using Sonnet for the execution-heavy part.

The idea is simple:

Planning is where mistakes are expensive.
Implementation involves many tokens, file edits, test runs, and follow-ups.
Opus is valuable for deciding the right approach.
Sonnet is usually sufficient for carrying out the approach.

This is similar to using a senior architect for the design review and a strong engineering team for implementation. You do not need the most expensive reasoning model for every file edit, every diff response, or every routine test update.

6. Use plan mode before expensive implementation

Plan mode is one of the best ways to prevent token waste in Claude Code.

Complex coding tasks often become expensive because Claude starts implementing too early, discovers missing information, changes direction, makes errors, modifies the wrong abstraction, or refactors too broadly. Every mistaken step consumes tokens: file reads, code edits, command outputs, test failures, correction prompts, and follow-up diffs.

Plan mode reduces this by forcing an analysis-first workflow. Before editing code, Claude explores the relevant parts of the project and proposes an approach. The user can approve, reject, or adjust the plan. This is especially valuable for tasks involving architecture, migrations, security changes, data models, cross-cutting refactoring, performance issues, or unfamiliar codebases.

With /model opusplan, the workflow becomes even stronger:

Enter plan mode.
Let Opus reason about the problem and propose the plan.
Review and correct the plan.
Exit plan mode and let Sonnet implement.
Test incrementally.
Stop early if the implementation drifts.

This avoids paying the premium reasoning cost for every execution step while still benefiting from strong reasoning where it matters most.

A good planning prompt looks like this:

Use plan mode. I need to add password-reset support to the FastAPI backend.

Scope:
- auth routes only
- email token generation
- token expiry validation
- tests for success, expired token, invalid token
- no frontend changes yet

First, inspect the relevant files and propose a minimal implementation plan.
Do not edit files until I approve the plan.

This prompt saves tokens by narrowing the task, preventing premature edits, and giving Claude a clear boundary.

7. Manage context proactively: `/clear`, `/compact`, `/resume`, and `/rename`

Context management is the foundation of token efficiency.

Claude Code provides several commands that help manage session context:

/clear starts fresh.
/compact summarises the current conversation.
/resume returns to a previous session.
/rename gives a session a meaningful name before clearing or switching.

Use /clear when switching to an unrelated task. If you finished working on authentication and now want to redesign a reporting engine, do not drag the old authentication context into the new task. Stale context wastes tokens on every future message and may confuse the model.

Use /compact when you are continuing the same broader task but want to reduce accumulated noise. Compaction summarises the session so you can continue with a smaller context. It is useful after completing a phase: investigation, design, implementation, test repair, or documentation. The best time to compact is before the session becomes overloaded, not after Claude starts losing track.

A practical pattern is:

/compact Focus on code changes, test results, open decisions, and remaining TODOs.

Or:

/compact Focus on API usage, files modified, architectural decisions, and failing tests.

Custom compaction instructions matter. A generic compact may preserve too much irrelevant detail or lose important task state. If you tell Claude what to preserve, you get a more useful summary.

You can also place compact instructions in CLAUDE.md, for example:

# Compact instructions

When compacting, preserve:

- files changed
- tests added or modified
- current failing tests
- architectural decisions
- unresolved questions

Discard:

- command noise
- successful test logs
- repeated explanations
- obsolete implementation attempts

However, this creates a trade-off. Anything in CLAUDE.md is loaded into context. Keep it concise.

The decision between /clear and /compact is simple:

Use /clear when the next task is unrelated.
Use /compact when the next task continues the same line of work.
Use /rename before clearing if you may need to find the session later.
Use /resume when returning to prior work.

Do not use /compact as a substitute for discipline. If the session contains too much irrelevant material, sometimes the best optimization is to clear and start with a clean, explicit prompt.

8. Keep `CLAUDE.md` small, stable, and useful

CLAUDE.md is one of the most useful Claude Code features, but it is also one of the easiest ways to waste tokens.

Claude Code loads CLAUDE.md automatically as project memory. This is ideal for stable project instructions: architecture overview, coding conventions, test commands, repository structure, style rules, domain terminology, and non-negotiable constraints. It saves you from retyping the same context in every session.

But because CLAUDE.md is loaded into context, it becomes a token tax. A large file is paid for repeatedly. If it contains long task notes, obsolete decisions, huge implementation guides, or detailed documentation for workflows you rarely use, it bloats every session.

A good CLAUDE.md should be concise. Anthropic’s official guidance recommends keeping it under 200 lines and moving specialized instructions into skills. Some practitioner materials are more permissive and mention larger thresholds, but the stronger discipline is to keep the file small enough that every line earns its place.

A good CLAUDE.md contains:

project purpose
key directories
architecture summary
build and test commands
coding standards
security rules
important domain terms
compact instructions
“do not” rules that prevent expensive mistakes

A poor CLAUDE.md contains:

long historical notes
old task state
full API documentation
large examples
detailed migration guides
many alternative workflows
verbose onboarding content
content only needed once a month

For example:

# Project overview

This is a FastAPI + React application for Roman coin identification.
The backend is in `backend/`.
The frontend is in `frontend/`.
MongoDB is used for catalog and vector search.
OpenAI Vision is used for image-based coin identification.

# Commands

Backend tests:
`cd backend && pytest`

Backend lint:
`cd backend && ruff check . && mypy .`

Frontend checks:
`cd frontend && npm run lint && npm run typecheck`

# Rules

- Do not refactor unrelated modules.
- Prefer minimal, testable changes.
- Add or update tests for backend behaviour changes.
- Before large changes, use plan mode.
- Preserve public API compatibility unless explicitly asked.

# Compact instructions

Preserve files changed, tests run, failing tests, decisions, and remaining TODOs.
Discard successful command noise and obsolete attempts.

This kind of file helps Claude work efficiently without becoming a dumping ground.

For specialized workflows, use separate files or skills. For example, instead of placing a full database migration manual inside CLAUDE.md, create a skill or a separate document and reference it only when needed. This keeps the base context small.

9. Move specialised instructions into skills

Skills are useful because they can provide domain-specific or workflow-specific guidance on demand. Unlike CLAUDE.md, which is loaded at the start of the session, skills can be invoked when relevant.

This matters for token usage. If your project has detailed instructions for PR reviews, database migrations, release notes, AWS IAM policy generation, security threat modeling, performance testing, or documentation generation, those instructions do not need to be present in every coding session.

A good division is:

CLAUDE.md contains a stable, universal project context.
Skills contain specialized, situational instructions.
Separate documentation files contain long reference material.
Prompts pull in only what is needed for the current task.

For example, a “database-migration” skill might include the full migration checklist, naming conventions, rollback requirements, and test strategy. Claude only needs that when working on a migration. It should not be loaded when fixing a frontend button.

A “codebase-overview” skill can also reduce exploration costs. Instead of forcing Claude to rediscover the project structure by reading many files, the skill can provide a curated map of the architecture, key directories, conventions, and common workflows. This turns repeated expensive exploration into a smaller, reusable context asset.

The principle is: do not make every session pay for every possible workflow.

10. Reduce MCP server overhead

MCP servers can be extremely useful. They connect Claude Code to external tools, systems, data sources, and workflows. But every integration has a context cost.

Anthropic’s documentation notes that MCP tool definitions are deferred by default, so only tool names enter context until Claude uses a specific tool. This helps. But MCP servers can still add overhead, especially when many are configured and available. Developers often add MCP servers as they discover them: GitHub, Supabase, browser tools, Figma, cloud tools, databases, observability systems, and internal services. Over time, the environment becomes heavy.

The practical rule is not “avoid MCP.” The rule is “use MCP intentionally.”

Run /context to see what is consuming space. Run /mcp to inspect configured servers. Disable servers that are not actively needed for the current project or task.

Prefer CLI tools when available and appropriate. Tools such as gh, aws, gcloud, kubectl, sentry-cli, psql, or local scripts can be more context-efficient because they do not require loading large tool schemas into the model context. Claude can run CLI commands directly and return concise outputs.

A good workflow is:

Keep project-specific MCP configuration separate.
Enable only the servers needed for that project.
Disable experimental or rarely used servers.
Prefer CLI commands for simple retrieval.
Use MCP when it provides high-value structured access.
Inspect context regularly.

MCP is powerful, but “connect everything” is not a cost strategy.

11. Use code intelligence plugins for typed languages

When Claude Code explores an unfamiliar codebase, it may use text search and file reads to understand symbols, references, and dependencies. This can be expensive. In typed languages, code intelligence plugins can reduce this overhead by giving Claude more precise navigation.

A “go to definition” operation can replace a broad grep followed by reading several candidate files. Type information can help Claude understand interfaces, function signatures, errors, and dependencies without scanning as much text. Language servers can also report type errors after edits, allowing Claude to catch mistakes without repeatedly running full builds or reading long compiler output.

This is especially useful in TypeScript, Java, C#, Go, Rust, Python with type hints, and other typed or partially typed codebases. It turns code navigation from “search and inspect” into “ask the language server.”

The result is not only lower token usage. It also improves accuracy. Claude is less likely to modify the wrong symbol, miss an overload, or misunderstand a type relationship.

12. Delegate verbose work to subagents carefully

Subagents can be useful because they operate in their own context window. This means verbose operations can be isolated from the main conversation. For example, a subagent can inspect logs, run tests, search documentation, or explore part of the repository, then return only a concise summary to the main session.

This can save the main context from being polluted with thousands of lines of output.

Good subagent tasks include:

“Run the test suite and summarise only failing tests.”
“Inspect this module and return the public API surface.”
“Search for usages of this function and summarise the call sites.”
“Read the migration files and identify patterns.”
“Compare these logs and return the top three error signatures.”

Bad subagent tasks include:

tiny shell commands
simple git status checks
trivial one-file edits
anything where the subagent overhead is larger than the task
broad, vague exploration with no summary format

Subagents are not automatically cheaper. They are separate Claude instances with their own context. If used casually, they can increase usage. They are cost-effective when they prevent large, noisy outputs from entering the main conversation.

When spawning a subagent, keep the prompt focused. Do not pass the entire project history. Give it the specific task, the files or commands it needs, and the required summary format.

Example:

Use a subagent to run backend tests.

Scope:

- run `cd backend && pytest`
- do not attempt fixes
- return only:
  1. number of tests run
  2. failing test names
  3. top error message for each failure
  4. likely affected files

This keeps the main conversation clean.

13. Be careful with agent teams

Agent teams can multiply token usage because each teammate runs as a separate Claude Code instance with its own context window. Anthropic’s documentation notes that token usage scales with the number of active teammates and how long each one runs. Agent teams may use approximately 7 times as many tokens as standard sessions when teammates run in plan mode.

This does not mean agent teams are bad. They are useful for parallel work, larger tasks, and role-based collaboration. But they require cost discipline.

To manage agent team costs:

Keep teams small.
Use Sonnet for teammates where possible.
Keep spawn prompts focused.
Avoid giving every teammate the full project history.
Clean up teams when work is done.
Do not leave idle teammates running.
Use agent teams for tasks that genuinely benefit from parallelism.

A business analogy is hiring a team of consultants. If every consultant attends every meeting, reads every document, and writes a separate report, costs rise quickly. The same applies to agent teams.

Use them when parallel work saves meaningful time or improves quality, not for routine edits.

14. Offload preprocessing to hooks and scripts

One of the best ways to save tokens is to prevent noisy data from reaching Claude in the first place.

Logs, test outputs, build outputs, stack traces, JSON dumps, CSV files, and generated files can be huge. Claude does not need to read everything. It usually needs the failing lines, error messages, relevant stack frames, changed files, or summary statistics.

Anthropic’s guidance recommends using hooks to preprocess data before Claude sees it. For example, instead of allowing Claude to read a 10,000-line test output, a hook or shell script can filter only failures. Instead of pasting the full log file, run a command to extract error lines and their surrounding context.

Examples:

pytest 2>&1 | grep -A 5 -E "(FAIL|ERROR|AssertionError)" | head -100

grep -i "error" application.log | tail -50

jq '.errors[] | {code, message, path}' response.json

git diff --stat

git diff -- src/auth/refresh.ts tests/auth/test_refresh.py

The principle is simple: use deterministic tools to reduce raw data before involving the model.

Claude is excellent at reasoning over meaningful context. It should not be used as an expensive grep, tail, or jq replacement when simple tools can reduce the input first.

This applies to business users too. Before uploading a 100-page document, extract the relevant section. Before asking Claude to analyze a whole spreadsheet, provide the relevant rows, columns, or summary. Before uploading screenshots, describe the issue or crop the image to the relevant area.

15. Write specific prompts

Prompt specificity is one of the cheapest forms of token optimization.

A vague prompt causes Claude to infer scope. In Claude Code, this may trigger broad repository exploration. In business writing, it may trigger broad analysis and long outputs. In both cases, ambiguity becomes a matter of token usage.

Bad:

Improve this.

Better:

Rewrite only the executive summary.
Keep the meaning unchanged.
Make it clearer for a project and program managers.
Limit it to 250 words.
Do not change the recommendations section.

Bad:

Fix the tests.

Better:

Fix only the failing tests in `tests/auth/test_refresh.py`.
Do not change production code unless the test failure reveals a real bug.
Run only the auth test file first.
Return a short summary of the change.

Bad:

Review this codebase.

Better:

Review the authentication module for security issues.
Focus on token expiry, refresh-token storage, password reset, and error handling.
Do not review frontend styling or unrelated API routes.
Return findings ranked by severity.

The more precise the prompt, the less Claude needs to explore, guess, and generate.

A useful structure is:

Aask
Scope
Files or sections
Constraints
Output format
Verification target
What not to do

Example:

Task: Add validation to the invoice creation endpoint.

Scope:
- backend only
- files: `src/invoices/routes.ts`, `src/invoices/schema.ts`, tests under `tests/invoices`
- validate customer ID, line-item quantity, unit price, and currency

Constraints:
- do not refactor the invoice service
- preserve the existing API response shape
- use the current validation library

Verification:
- add tests for invalid quantity, missing customer ID, and unsupported currency
- run only invoice tests first

Output:
- brief plan
- then implementation
- then test result summary

This kind of prompt often saves more tokens than any clever trick.

16. Control output length

Output tokens are expensive and can be wasteful. Claude often tries to be helpful by explaining what it did, restating the problem, providing alternatives, and adding next steps. Sometimes this is useful. Sometimes it is just extra text.

When you do not need a long answer, say so.

Examples:

Answer in no more than 10 bullet points.

Return only the changed code block.

Return only a unified diff.

Do not explain unless there is a risk or trade-off.

Summarise the result in 5 lines.

Only rewrite section 3. Keep all other sections unchanged.

For business writing, avoid regenerating entire documents unnecessarily. If section 3 is weak, ask Claude to rewrite section 3. If the introduction needs a stronger hook, ask for three alternative introductions. If the conclusion is too long, ask only for a shorter conclusion.

For developers, avoid asking Claude to print entire files after edits unless necessary. Diffs are usually better. Summaries are often enough. Fully regenerated files burn output tokens and make review harder.

17. Work incrementally and test early

Large tasks become expensive when errors are discovered late. The model may implement many changes, run tests, discover failures, inspect logs, revise the approach, and rewrite code. Each loop costs tokens.

A better pattern is incremental development:

Plan.
Make a small change.
Run a focused test.
Fix immediately.
Expand scope.
Run broader tests.
Compact after a completed phase.

This reduces token waste because failures are caught while the relevant context is still small.

For example, instead of asking Claude Code to “implement the full reporting engine,” break it into phases:

Define template schema.
Implement parser.
Render simple text blocks.
Add tables.
Add pagination.
Add headers and footers.
Add charts.
Add tests.
Add documentation.

Each phase should have acceptance criteria. After each phase, either compact or clear, depending on whether the next phase needs the same context.

This is especially important for LLM-assisted development because the cost of ambiguity compounds. A wrong architecture implemented across ten files is expensive to unwind. A wrong plan corrected before implementation is cheap.

18. Course-correct early

When Claude starts moving in the wrong direction, stop it early.

In Claude Code, pressing Escape can interrupt a response. /rewind or double-tap Escape can restore conversation and code to a previous checkpoint. This is not only a quality feature but also a cost-control feature. Letting Claude finish a long, wrong implementation wastes tokens and creates more context that later has to be corrected or ignored.

Users often wait too long because they think, “Let’s see where it goes.” That may be fine for brainstorming, but in coding, it can be expensive. If you see Claude reading irrelevant files, refactoring too broadly, changing public APIs without permission, or running the wrong command, stop it.

Then give a correction:

Stop. This is going too broad.

Only modify `src/auth/refresh.ts`.
Do not change login, registration, or middleware.
The issue is specifically refresh-token expiry handling.
Propose a narrower plan before editing.

Early correction is one of the highest-value behaviors in Claude Code.

19. Use documents and attachments carefully

Large documents are one of the most common token traps for business users.

Before uploading or pasting a document, ask:

Does Claude need the whole document?
Can I paste only the relevant section?
Can I provide a summary first?
Can I extract the table or paragraph that matters?
Can I ask Claude to analyze one chapter at a time?
Can I remove boilerplate, headers, footers, and appendices?
Can I convert a screenshot into text?

PDFs, slide decks, screenshots, and Word documents can contain hidden token overhead. Screenshots can be especially expensive compared with text, and they may include irrelevant visual information. If the task is textual, text is usually better than an image.

For repeated reference material, avoid uploading the same document into many separate chats. In Claude Projects or similar environments, persistent project knowledge may be more efficient when the same material is reused frequently. For Claude Code, stable project context belongs in CLAUDE.md, skills, or referenced files, but only when it is genuinely useful.

A good business workflow is:

Extract the relevant section.
Ask Claude to summarise or analyze it.
Store the summary as a working context.
Continue with the summary instead of the original large document.
Only return to the full document when needed.

For developers, the same principle applies to logs, generated files, minified files, lock files, and large JSON payloads. Do not feed raw noise to the model.

20. Understand extended thinking

Extended thinking improves performance on complex reasoning tasks but consumes additional output tokens. Anthropic’s documentation explains that thinking tokens are billed as output tokens, and the default budget can be large depending on the model. For simpler tasks, reducing the effort level or disabling thinking, where available, can reduce costs.

The practical guidance is:

Use higher thinking effort for architecture, planning, debugging, and complex trade-offs.
Use lower effort for routine edits, simple rewrites, formatting, and small code changes.
Avoid deep reasoning settings when the task is mechanical.
Use /effort or model configuration where supported.
Understand that some models may use adaptive reasoning and ignore nonzero fixed budgets.
Know that not all models allow thinking to be disabled.

Business users should also understand this pattern. Asking for “deep analysis” of a large document invites longer reasoning and output. That is useful when making an important decision, but unnecessary for simple summarisation.

For example:

Summarise this in 5 bullets.

should not require the same reasoning effort as:

Assess this acquisition strategy, identify hidden risks, challenge the assumptions, and recommend whether the board should approve it.

Use reasoning depth intentionally.

21. Team-level governance for Claude Code usage

For organizations, token efficiency should not be left entirely to individual behavior. Teams need lightweight governance.

A good team rollout should include:

Baseline usage measurement with a pilot group.
Recommended default model configuration.
Guidance on when to use Opus.
Default /model opusplan recommendation for complex development.
Project-level CLAUDE.md standards.
MCP server review process.
Approved skills and plugins.
Examples of good prompts.
Guidance for /clear, /compact, and /usage.
Rules for handling sensitive data.
Rate limits and spend limits were applicable.

Anthropic’s official documentation provides rate-limit recommendations by team size, with token-per-minute and request-per-minute guidance decreasing per user as the organization size grows. The reason is that not all users are active simultaneously in large organizations. This means capacity planning should consider concurrency, not just headcount.

Organizations should also pay special attention to training sessions. A live workshop where many developers use Claude Code simultaneously can create unusually high concurrent usage. This may require higher temporary limits or careful scheduling.

For API-based usage, workspace limits and cost reporting are important. For subscription usage, usage bars and plan limits matter. For enterprise cloud environments, external tracking may be required if usage metrics are not automatically sent back.

The goal is not to restrict Claude Code to the point that developers stop using it. The goal is to prevent avoidable waste while preserving productivity.

22. Recommended personal workflow for developers

Here is a practical Claude Code workflow optimized for token efficiency.

At project setup

Create a lean CLAUDE.md.

Include:

Project overview
Key directories
Commands
Coding rules
Testing rules
Compact instructions

Do not include:

Long documentation
Temporary task notes
Old decisions
Detailed manuals
Large examples

Configure model defaults. A strong default is:

/model opusplan

This allows Opus to be used for plan mode and Sonnet otherwise, and it can be saved as your default for new sessions.

Review MCP servers. Keep only what you need for the project.

Install relevant code intelligence plugins for typed languages.

At the start of a task

Use a scoped prompt.

Include:

Task
Relevant files
Boundaries
Output format
Test target
Whether to use plan mode

For complex tasks, enter plan mode first.

During implementation

Do not let Claude explore broadly without reason.

Stop early if it goes off track.

Run focused tests before broad tests.

Ask for diffs or summaries instead of full files.

Filter logs and test output.

Use subagents only for verbose isolated work.

Between phases

Use /compact with specific instructions.

Example:

/compact Preserve files changed, tests added, current failures, decisions, and TODOs.

Use /clear when switching to an unrelated task.

Use /rename before clearing important sessions.

Use /usage and /context regularly.

At the end

Ask Claude to produce a concise handoff summary:

Summarise:
- files changed
- behavior changed
- tests added
- commands run
- remaining risks
- suggested next task

Save this summary in a project progress file only if it is genuinely useful. Do not paste every session summary into CLAUDE.md.

23. Recommended workflow for business users

Business users can apply a similar discipline.

Start with the outcome

Instead of:

Help me with this document.

Use:

Review this proposal for executive clarity.
Focus only on:
- decision logic
- financial assumptions
- risks
- missing next steps

Return:
- top 5 issues
- suggested rewrite of the executive summary
- questions I should answer before sending

Work in sections

Do not ask Claude to regenerate a whole paper or proposal after every small change. Work section by section.

Rewrite only the “Commercial Rationale” section.
Keep the argument unchanged.
Make it more concise and board-level.
Limit to 300 words.

Reduce documents before uploading

If the document is large, provide the relevant extract first. If Claude needs more, it can ask for the specific missing section.

Avoid endless chat drift

When the topic changes, start a new chat. Long conversations are useful for continuity, but expensive when the old context is no longer relevant.

Ask for concise outputs

Give me only the final version, no explanation.

or:

Give me three options, each under 100 words.

Preserve reusable context intentionally

If you repeatedly work on the same business area, maintain a short reusable brief:

Company context
Audience
Tone
Product description
Key constraints
Preferred writing style

Keep it short. Do not paste a full company handbook into every prompt.

24. Common mistakes and better alternatives

Mistake 1: Keeping one endless conversation

Long conversations feel convenient, but they become token furnaces. Every turn may carry old context.

Better: clear or start fresh when the task changes. Compact when continuing the same task.

Mistake 2: Bloated `CLAUDE.md`

A huge CLAUDE.md feels helpful, but taxes every session.

Better: keep only stable essentials in CLAUDE.md and move specialized instructions into skills or separate files.

Mistake 3: Using Opus for everything

Opus is powerful, but using it for every routine edit is inefficient.

Better: use /model opusplan and reserve Opus for planning and complex reasoning, use Sonnet for execution.

Mistake 4: Asking Claude Code to “look around”

Broad exploration consumes tokens quickly.

Better: point Claude to likely files and define the scope.

Mistake 5: Pasting full logs

Raw logs are noisy.

Better: filter logs with grep, tail, jq, or scripts before giving them to Claude.

Mistake 6: Regenerating whole documents

Full rewrites burn output tokens and make review difficult.

Better: revise specific sections.

Mistake 7: Too many MCP servers

Every integration can add overhead.

Better: enable only relevant MCP servers and inspect context with /context.

Mistake 8: Using subagents for tiny tasks

Subagents have overhead.

Better: use them for noisy, isolated work, not trivial commands.

Mistake 9: Waiting too long to correct Claude

A wrong implementation path grows expensive.

Better: interrupt early and redirect.

Mistake 10: Not measuring usage

Without /usage, optimization is guesswork.

Better: check usage and context regularly.

25. The operating model: spend tokens where they create value

The best token strategy is not “use as few tokens as possible.” That would be the wrong goal. The goal is to spend tokens where they create value.

Good token spending:

Opus is thinking through a hard architecture decision.
Claude is reading the right files to fix a serious bug.
Generating tests that prevent regressions.
Reviewing a high-stakes proposal.
Summarising a complex but relevant document.
Comparing design alternatives.
Producing a useful implementation plan.

Bad token spending:

Re-reading stale conversation history.
Carrying obsolete logs.
Loading unused MCP servers.
Keeping bloated CLAUDE.md content.
Rewriting whole documents unnecessarily.
Exploring unrelated repository areas.
Printing full files when a diff is enough.
Using Opus for routine edits.
Letting agent teams run without a clear scope.
Asking vague questions that force broad inference.

This distinction matters because token optimization should not reduce quality. In fact, the best practices usually improve quality. Clear scope, smaller context, better model selection, early planning, focused tests, and concise outputs make Claude more effective.

26. Practical checklist

Use this checklist before and during Claude Code work.

Before starting

Is this task related to the current session?
Should I /clear first?
Is the task complex enough for plan mode?
Is /model opusplan enabled?
Is CLAUDE.md lean and relevant?
Are unnecessary MCP servers disabled?
Do I know the relevant files?
Can I provide acceptance criteria?

During work

Is Claude reading relevant files only?
Is output too verbose?
Should I ask for a diff instead of a full file?
Are logs filtered?
Are tests focused?
Should a verbose task be delegated to a subagent?
Should I stop Claude because it is drifting?

After a phase

Should I /compact?
What should compaction preserve?
Should I save a short progress note?
Should I /clear before the next task?
What did /usage show?
What did /context show?

For business writing

Am I asking for the whole document or only one section?
Did I provide the target audience?
Did I define the output length?
Did I specify what should not change?
Can I extract relevant text instead of uploading a large file?
Is this conversation still focused?

Conclusion

Claude and Claude Code are most effective when treated not as magical chat boxes, but as context-driven reasoning systems. Tokens are the fuel for that reasoning. If the context is clean, scoped, and relevant, tokens are spent on useful work. If the context is bloated, stale, and vague, tokens are wasted before the model even begins solving the problem.

For business users, the discipline is to provide focused context, work in sections, avoid unnecessary attachments, control output length, and start fresh when the topic changes.

For developers, the discipline is to keep CLAUDE.md lean, use /model opusplan, reserve Opus for planning and hard reasoning, use Sonnet for most implementation, inspect usage with /usage, inspect context with /context, compact proactively, clear between unrelated tasks, filter noisy outputs, manage MCP servers, and use subagents only when their isolation saves the main context from noise.

The practical philosophy is simple:

Use the strongest model for the thinking that matters.
Use the cheaper model for routine execution.
Keep context small.
Be specific.
Measure usage.
Stop wrong work early.
Do not make every future prompt pay for every past detail.

Claude Code rewards users who work like good engineers and good managers: clear scope, clean context, deliberate tools, early feedback, and disciplined execution. That is how you save tokens without sacrificing quality.

Debugging DllNotFoundException on Linux and Containers or DLL Hell in 2023

Ilya Verbitskiy — Sun, 05 Nov 2023 00:00:00 GMT

Today, I want to share my experience finding a bug you rarely see in .NET applications.

As you know, .NET uses managed code to run our applications. .NET managed code is any code written to run under the supervision of the Common Language Runtime (CLR), which is the heart of the .NET Framework. When you build a C# program, it is compiled into an Intermediate Language (IL), not into machine-specific code. The IL code is then compiled into native code by the Just-In-Time (JIT) compiler of the CLR at runtime. This approach allows safety, cross-platform support, and cross-language integration. For example, because all .NET languages compile to the same IL and use the same runtime, it's relatively easy to mix and match languages, calling code written in one language from another.

But living in this sandbox, you may forget that under the hood, .NET still runs machine code and talks to native libraries. Those libraries are platform-specific and may behave differently based on the platform. Surprisingly, it may cause almost forgotten DLL hell issues you rarely expect in a .NET application.

DLL Hell refers to a common issue in older versions of the Windows operating system where applications could interfere with each other by overwriting or updating shared Dynamic Link Libraries (DLLs). It could lead to various problems, including application failures, system instability, and version conflicts, as different programs might require different versions of the same DLL. The term also encompasses difficulties arising from the Windows registry's management of DLL information and the potential for installation programs to inadvertently disrupt the System by installing incorrect DLL versions.

The invention of .NET was the way to avoid the DLL Hell problem on Windows, and it mostly achieved it (at least from my experience). But what would you say if you saw it on Linux in 2023? Let's take a look at the sample project.

The project is a simple console application that reads an image and writes its dimensions to standard output. I used the SkiaSharp library since the project should support Windows, Linux, and MacOS. SkiaSharp is a cross-platform 2D graphics API for .NET platforms based on the Skia Graphics Library, an open-source graphics engine used by Chrome, Android, and other media. It provides a comprehensive set of drawing features ranging from shapes to complex path operations, text rendering, and image manipulation.

using SkiaSharp;
using static System.Console;

var file = new FileInfo("cover.jpg");
WriteLine($"INPUT: {file.Name}");

using var stream = file.OpenRead();
var coverImage = SKBitmap.Decode(stream);
WriteLine($"Got image: {coverImage.Width} x {coverImage.Height} {coverImage.Info.BitsPerPixel} ppi - from file {file.Name}");

Currently, the project references only SkiaSharp 2.88.6 package and works well on Windows.

INPUT: cover.jpg
Got image: 617 x 800 32 ppi - from file cover.jpg

Let's run it on a Docker container using Alpine Linux. I use Alpine Linux for its simplicity, security, and efficiency, particularly in resource-constrained environments or when building minimal Docker containers due to its small footprint.

FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["SkiaSharpTest.csproj", "."]
RUN dotnet restore "SkiaSharpTest.csproj"
COPY . .
WORKDIR "/src"
RUN dotnet build "SkiaSharpTest.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "SkiaSharpTest.csproj" -c Release -o /app/publish /p:UseAppHost=false

FROM base AS final
RUN apk add --no-cache icu-libs
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "SkiaSharpTest.dll"]

docker build -t skia .
docker run --rm skia

INPUT: cover.jpg
Unhandled exception. System.TypeInitializationException: The type initializer for 'SkiaSharp.SKAbstractManagedStream' threw an exception.
 ---> System.DllNotFoundException: Unable to load shared library 'libSkiaSharp' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: Error loading shared library liblibSkiaSharp: No such file or directory
   at SkiaSharp.SkiaApi.sk_managedstream_set_procs(SKManagedStreamDelegates procs)
   at SkiaSharp.SKAbstractManagedStream..cctor()
   --- End of inner exception stack trace ---
   at SkiaSharp.SKAbstractManagedStream..ctor(Boolean owns)
   at SkiaSharp.SKManagedStream..ctor(Stream managedStream, Boolean disposeManagedStream)
   at SkiaSharp.SKCodec.WrapManagedStream(Stream stream)
   at SkiaSharp.SKCodec.Create(Stream stream, SKCodecResult& result)
   at SkiaSharp.SKCodec.Create(Stream stream)
   at SkiaSharp.SKBitmap.Decode(Stream stream)
   at Program.$(String[] args) in /src/Program.cs:line 9

It is a common problem. SkiaSharp is a native library that must be shipped with your .NET application. By default, the SkiaSharp 2.88.6 package includes only Windows and MacOS binaries. Since I need Linux support, I have two options: install SkiaSharp.NativeAssets.Linux 2.88.6 or SkiaSharp.NativeAssets.Linux.NoDependencies 2.88.6. I recommend using the second option if you do not need fancy font support because it does not require additional Linux packages shipped with your image.

Everything worked after I added the SkiaSharp.NativeAssets.Linux.NoDependencies 2.88.6 package to my project and rebuilt the image.

INPUT: cover.jpg
Got image: 617 x 800 32 ppi - from file cover.jpg

Whatever I have shown till this point is the standard use case of SkiaSharp. But now, imagine you are working on a large project with tens or hundreds of dependencies. You did the above steps, but your app still generates a runtime exception.

Let's dive deeper into "System.DllNotFoundException: Unable to load shared library 'libSkiaSharp' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: Error loading shared library liblibSkiaSharp: No such file or directory" error message.

The System.DllNotFoundException in .NET is thrown when a program tries to load a dynamic link library (DLL) that cannot be found due to reasons like the DLL being absent, located in the wrong directory, the program running on an incompatible architecture, the DLL's dependencies being missing, permission restrictions, or the DLL file being corrupted. Resolving this error requires verifying that the DLL exists, is correctly placed, has the necessary permissions, is not corrupted, and is compatible with the System's architecture.

The easiest way to investigate the issue is to get access to the container's or pod's terminal through Docker or Kubernetes. I am debugging a standalone container, so let's run it and get access to the shell. I have to access the shell on the container startup because the container only runs a standalone app. Keep in mind that Alpine Linux uses sh instead of Bash.

docker run --rm -it --entrypoint /bin/sh skia

If you are debugging a microservice or web application running within a container, you can access the container's shell using the docker exec command. For example, this is the command to get access to the Redis shell when it is running in a container.

$ docker run --name redis -d -p 6379:6379 redis
$ docker ps

CONTAINER ID   IMAGE     COMMAND                  CREATED          STATUS          PORTS                    NAMES
73546b9cd887   redis     "docker-entrypoint.s…"   4 minutes ago    Up 4 minutes    0.0.0.0:6379->6379/tcp   redis
613199b8912a   skia      "/bin/sh"                16 minutes ago   Up 16 minutes                            wizardly_saha

$ docker exec -it redis redis-cli
127.0.0.1:6379>

Let's move back to the SkiaSharp issue. First, you must verify that SkiaSharp's native libraries are on the container. Remember that Alpine Linux uses the musl C runtime, so you must verify the linux-musl-x64 runtime identifier.

app # ls
HarfBuzzSharp.dll                 SkiaSharp.dll                     SkiaSharpTest.dll                 SkiaSharpTest.runtimeconfig.json  cover.jpg
SkiaSharp.HarfBuzz.dll            SkiaSharpTest.deps.json           SkiaSharpTest.pdb                 Topten.RichTextKit.dll            runtimes
/app # ls runtimes/
linux-arm       linux-arm64     linux-musl-x64  linux-x64       osx             win-arm64       win-x64         win-x86
/app # ls runtimes/linux-musl-x64/native/
libHarfBuzzSharp.so  libSkiaSharp.so

As you can see, the SkiaSharp native library exists. In this case, you must verify that the Operating System can load it. Use the ldd command to do it on Linux.

/app # cd runtimes/linux-musl-x64/native/
/app/runtimes/linux-musl-x64/native # ldd libSkiaSharp.so
        /lib/ld-musl-x86_64.so.1 (0x7fabcfc23000)
Error loading shared library libfontconfig.so.1: No such file or directory (needed by libSkiaSharp.so)
        libc.musl-x86_64.so.1 => /lib/ld-musl-x86_64.so.1 (0x7fabcfc23000)
Error relocating libSkiaSharp.so: FcFontSetDestroy: symbol not found
Error relocating libSkiaSharp.so: FcPatternAddString: symbol not found
Error relocating libSkiaSharp.so: FcInitLoadConfigAndFonts: symbol not found
Error relocating libSkiaSharp.so: FcPatternFilter: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetLangSet: symbol not found
Error relocating libSkiaSharp.so: FcConfigCreate: symbol not found
Error relocating libSkiaSharp.so: FcCharSetDestroy: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetCharSet: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetBool: symbol not found
Error relocating libSkiaSharp.so: FcDefaultSubstitute: symbol not found
Error relocating libSkiaSharp.so: FcPatternAddCharSet: symbol not found
Error relocating libSkiaSharp.so: FcPatternRemove: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetInteger: symbol not found
Error relocating libSkiaSharp.so: FcCharSetHasChar: symbol not found
Error relocating libSkiaSharp.so: FcCharSetAddChar: symbol not found
Error relocating libSkiaSharp.so: FcConfigGetFonts: symbol not found
Error relocating libSkiaSharp.so: FcCharSetCreate: symbol not found
Error relocating libSkiaSharp.so: FcGetVersion: symbol not found
Error relocating libSkiaSharp.so: FcPatternAddWeak: symbol not found
Error relocating libSkiaSharp.so: FcConfigDestroy: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetString: symbol not found
Error relocating libSkiaSharp.so: FcPatternCreate: symbol not found
Error relocating libSkiaSharp.so: FcFontSetAdd: symbol not found
Error relocating libSkiaSharp.so: FcPatternReference: symbol not found
Error relocating libSkiaSharp.so: FcPatternEqual: symbol not found
Error relocating libSkiaSharp.so: FcFontSetCreate: symbol not found
Error relocating libSkiaSharp.so: FcConfigSubstitute: symbol not found
Error relocating libSkiaSharp.so: FcPatternAddLangSet: symbol not found
Error relocating libSkiaSharp.so: FcObjectSetBuild: symbol not found
Error relocating libSkiaSharp.so: FcLangSetHasLang: symbol not found
Error relocating libSkiaSharp.so: FcPatternAddInteger: symbol not found
Error relocating libSkiaSharp.so: FcObjectSetDestroy: symbol not found
Error relocating libSkiaSharp.so: FcStrCmpIgnoreCase: symbol not found
Error relocating libSkiaSharp.so: FcPatternGet: symbol not found
Error relocating libSkiaSharp.so: FcPatternDestroy: symbol not found
Error relocating libSkiaSharp.so: FcFontRenderPrepare: symbol not found
Error relocating libSkiaSharp.so: FcPatternDuplicate: symbol not found
Error relocating libSkiaSharp.so: FcFontMatch: symbol not found
Error relocating libSkiaSharp.so: FcPatternGetMatrix: symbol not found
Error relocating libSkiaSharp.so: FcLangSetDestroy: symbol not found
Error relocating libSkiaSharp.so: FcConfigGetSysRoot: symbol not found
Error relocating libSkiaSharp.so: FcLangSetAdd: symbol not found
Error relocating libSkiaSharp.so: FcLangSetCreate: symbol not found
Error relocating libSkiaSharp.so: FcFontSetMatch: symbol not found

The output tells that multiple functions are not found! I would expect I don't need to install additional Alpine packages since I used SkiaSharp.NativeAssets.Linux.NoDependencies 2.88.6 Nuget package to build my project. It's time to move back to Visual Studio and check project dependencies.

As you can see, SkiaSharp's native binaries are references twice, and the SkiaSharp.NativeAssets.Linux ones overwrite the SkiaSharp.NativeAssets.Linux.NoDependencies are the ones that I expected to use. In this case, I have no choice but to set up additional Alpine Linux packages during the image build process.

I fixed the problem as follows:

I used SkiaSharp.NativeAssets.Linux Nuget package since it is required by other libraries anyway.
I updated my Dockerfile to install fontconfig Alpine Linux package, which includes missing functions discovered above.

RUN apk add --no-cache icu-libs fontconfig

FROM mcr.microsoft.com/dotnet/aspnet:6.0-alpine AS base
WORKDIR /app

FROM mcr.microsoft.com/dotnet/sdk:6.0 AS build
WORKDIR /src
COPY ["SkiaSharpTest.csproj", "."]
RUN dotnet restore "SkiaSharpTest.csproj"
COPY . .
WORKDIR "/src"
RUN dotnet build "SkiaSharpTest.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "SkiaSharpTest.csproj" -c Release -o /app/publish /p:UseAppHost=false

FROM base AS final
RUN apk add --no-cache icu-libs fontconfig
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "SkiaSharpTest.dll"]

After the fix, everything worked well.

$docker build -t skia .
[+] Building 7.6s (19/19)
...

$ docker run --rm skia
INPUT: cover.jpg
Got image: 617 x 800 32 ppi - from file cover.jpg

P. S. I experienced the above problem with a microservice running on Kubernetes. So, ensure you can access your pod's terminal when needed because you can follow the same step from this article to troubleshoot the runtime error.

Extract text from images using Amazon Textract

Ilya Verbitskiy — Mon, 20 Feb 2023 00:00:00 GMT

No one can deny the digitalization of our world: most use smartphones for daily communications, reading news, and taking photos and notes. Many people thought we would go 100% digital, but we still use pen and paper instead. I was one of them when I saw one of the first Palm PDAs. But the expectations have yet to become real, and I still use my pen and notebook to take notes.

Handwritten notes are excellent when taking them during a meeting or lecture, but they are not searchable, hard to edit, etc. That's why I always try to convert the important ones to a digital form by scanning to PDF and converting them into text using Amazon Textract service.

Amazon Textract is a machine learning service in that Amazon Web Services (AWS) automatically extracts text and data from scanned documents, PDFs, and images. It uses advanced optical character recognition (OCR) technology and machine learning algorithms to identify and extract text, tables, forms, and other document data.

Amazon Textract can work with various file formats, including PDF, PNG, and JPG, and it can extract data from structured and unstructured documents. The service also includes table and form identification features, which can help automate data entry and reduce errors.

Amazon Textract has two processing modes: synchronous and asynchronous. In synchronous mode, Textract processes the document and returns the results immediately. The application requesting the document analysis will wait for Textract to complete the processing and return the results before continuing. The main limitation of this mode is you can parse only one page per request. It does not work for me since I usually scan all pages to one PDF document and convert it to text.

In asynchronous mode, Textract processes the document in the background and returns the results later, either by writing the results to an S3 bucket or by sending a notification to an Amazon Simple Notification Service (SNS) topic.

Usually, I use Python for my automated scripts. The sample code also utilizes Boto3 library (AWS SDK for Python). Make sure you installed it.

pip install boto3

First, you need to create your input and output S3 buckets. Make sure that your user or role has permission to access them. At a minimum, you will need the following policies attached to your IAM user or role:

AmazonTextractFullAccess
AmazonS3ReadOnlyAccess
AmazonSNSFullAccess
AmazonSQSFullAccess

Of course, if you use the same user to upload the PDF, it must have write access to S3 (s3:PutObject action).

Second, you need to create a service role for Amazon Textract to be able to send SNS notifications.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "textract.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "ArnLike": {
          "aws:SourceArn": "arn:aws:textract:*:000000000000:*"
        },
        "StringEquals": {
          "aws:SourceAccount": "000000000000"
        }
      }
    }
  ]
}

You will need to replace 000000000000 with your AWS Account ID.

It is unusual to see the Conditions sections in service roles, but it is a recommended practice for Cross-service confused deputy prevention

Cross-service confused deputy prevention is a security feature in Amazon Web Services (AWS) that helps prevent a "confused deputy" attack. In this attack, a trusted service or resource is tricked into acting on behalf of an attacker, who can exploit a vulnerability in the trusted service or resource to gain unauthorized access.

Third, you should grant iam:PassRole permission to your IAM user or role that you will use to start the image recognition job. For example, you can create it as an inline policy associated with the IAM User.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "iam:PassRole",
      "Resource": "arn:aws:iam::000000000000:role/TextractSampleRole"
    }
  ]
}

Finally, create and subscribe to the Amazon SNS topic to ensure you get job completion notifications. When you create the Amazon SNS topic, you must use the prefix AmazonTextract, for example, AmazonTextractSampleCompletionNotifications.

We are ready to run the job using Python. Please make sure you use your parameter in the script.

import boto3
from pprint import pprint

FILENAME = "sample.pdf"
S3_BUCKET = "textract-sample-000000000000"
INPUT_PREFIX = f"input/{FILENAME}"
ROLE_ARN = "arn:aws:iam::000000000000:role/TextractSampleRole"
SNS_TOPIC = "arn:aws:sns:us-east-1:000000000000:AmazonTextractSampleCompletionNotifications"


def main():
    # Upload sample PDF file to S3
    s3 = boto3.client("s3")
    s3.upload_file(FILENAME, S3_BUCKET, INPUT_PREFIX)
    print(f"Uploaded sample.pdf to s3://{S3_BUCKET}/{INPUT_PREFIX}")

    # Use Amazon Textract to detect text in the sample PDF file
    textract = boto3.client("textract")
    response = textract.start_document_text_detection(
        DocumentLocation={"S3Object": {
            "Bucket": S3_BUCKET, "Name": INPUT_PREFIX}},
        OutputConfig={"S3Bucket": S3_BUCKET, "S3Prefix": "output"},
        NotificationChannel={"RoleArn": ROLE_ARN, "SNSTopicArn": SNS_TOPIC},
    )
    pprint(response)
    print("DONE.")


if __name__ == "__main__":
    main()

All source code from the sample is available on GitHub

If everything is OK, the app will return the Job ID you may use to get the extracted text.

$ python3 ./recognize.py
Uploaded sample.pdf to s3://textract-sample-234234234/input/sample.pdf
{'JobId': '5e3147084930cc653ec657b7a653e619566e2ef3d76cc7bf4ea8382a8c0f4c5d',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '76',
                                      'content-type': 'application/x-amz-json-1.1',
                                      'date': 'Sun, 22 Jan 2023 07:10:48 GMT',
                                      'x-amzn-requestid': '81312252-d875-4797-924f-1606798c8cea'},
                      'HTTPStatusCode': 200,
                      'RequestId': '81312252-d875-4797-924f-1606798c8cea',
                      'RetryAttempts': 0}}
DONE.

Once you get a notification through SNS, you can run another Python script to create a text document from the Textract's output.

import time
from pprint import pprint
import boto3

JOB_ID = "5e3147084930cc653ec657b7a653e619566e2ef3d76cc7bf4ea8382a8c0f4c5d"


def main():
    textract = boto3.client("textract")

    next_token = None
    while True:
        # You cannot pass NextToken = "" or NextToken = None to the API since it will throw errors:
        # "Invalid type for parameter NextToken, value: None, type: , valid types: "
        # "Invalid length for parameter NextToken, value: 0, valid min length: 1"
        if next_token is None:
            response = textract.get_document_text_detection(JobId=JOB_ID)
        else:
            response = textract.get_document_text_detection(
                JobId=JOB_ID, NextToken=next_token)
        if response["JobStatus"] == "SUCCEEDED":
            # Extract text from the response
            for block in response["Blocks"]:
                if block["BlockType"] == "LINE":
                    print(block["Text"])

            # Check if there are more pages to process
            if "NextToken" in response:
                next_token = response["NextToken"]
            else:
                break
        elif response["JobStatus"] == "FAILED":
            raise Exception(f"Job {JOB_ID} failed.")
        else:
            print("Waiting for job to complete...")
            time.sleep(5)

    print("DONE.")


if __name__ == "__main__":
    main()

Make sure you passed the correct Job ID to the script. The Job ID will be available only for 7 days!

My use case is simple, but if you require more advanced operations with the response, you may check some open source libraries provided by AWS. Otherwise, consider reading Amazon Textract Response Objects specification.

Summary

We discussed the benefits of using Amazon Textract to convert handwritten notes into digital form. Textract is an Amazon Web Services (AWS) machine learning service that automatically extracts text and data from scanned documents, PDFs, and images. It uses advanced optical character recognition (OCR) technology and machine learning algorithms to identify and extract text, tables, forms, and other document data. In the article, I explained the two processing modes available in Textract, synchronous and asynchronous, and provided a sample code for setting up a Textract job. The article also highlights the security features available in AWS, including cross-service confused deputy prevention, to prevent attacks on trusted services or resources.

References

How to pass the AWS Certified Solutions Architect - Associate (SAA-C02) exam

Ilya Verbitskiy — Thu, 02 Sep 2021 00:00:00 GMT

In the past article, I shared some tips for preparing for the AWS Cloud Practitioner exam. This time I would like to share my experience passing the AWS Certified Solutions Architect - Associate (SAA-C02) exam. In my opinion, it is a "must-have" certification for anyone working with AWS cloud: system administrators, developers, solution architects.

Moreover, I think it should be the first exam to take after you've cleared the Cloud Practitioner one because the exam is comprehensive and covers most of the AWS services you will use daily:

S3
EC2
Load balancers
Autoscaling
RDS
DynamoDB
Serverless computing
Amazon CloudWatch
Security services and lots more

The exam preparation is worth it because it will help you build a mental model about different AWS services and how you can combine them to solve your day-to-day problems. As a developer, I especially enjoyed answering questions to make the existing application scalable and resilient without code changes. It is hard to imagine in real-world applications, but it is "easily doable" in the test, so keep it in mind.

First, let's look at the exam guide. You need to answer 65 questions in 2 hours 10 minutes. It is only 2 minutes per question, that isn't much. The exam will cover four major AWS topics:

Design Resilent Archtectures
Design High-Performing Architectures
Design Secure Applications and Architectures
Design Cost-Optimized Architectures

Each question will be a single or multi-choice use case where you need to pick the best solution for the problem. Honestly, I like how AWS structures questions because they are close to the real-life scenarios you may experience at work. If you read the Appendix of the exam guide, you will see a few dozens of AWS services you have to know for the exam. The good news is you must know them, but not in detail to pass the certifications. From my test experience, you will need an in-depth knowledge of the following services: IAM, EC2, ECS, S3, RDS, EBS, EFS, DynamoDB, VPC (and all about networking including VPC endpoints, ENI, Elastic IP), Route 53, CloudFront, Load Balancing, Autoscaling, SNS, SQS, Lambda, and Amazon CloudWatch. It would help if you had some general understanding of other services, e.g., their use cases, benefits, and overall cost.

Amazon suggests having at least one year of hands-on experience with AWS before you will try to pass the exam. The experience is optional, but you will benefit a lot from it. Since the question use cases are close to real-life scenarios, you would probably see them at work before the exam. The project experience helps to choose the best option for a problem. You could probably pass the tests only with theoretical knowledge and labs experience, but I am afraid it will be more difficult.

Let's move on to the preparation part. I cannot imagine someone can pass the exam after reading all AWS recommended materials. I would recommend the three steps preparation plan:

Learn theory (AWS whitepapers, video courses)
Practice (either work experience or labs)
Do as many practice tests as you can

The first step is to learn theory. Depending on your preferred learning style, it could be either AWS documentation, in-class, or Internet training. I like to have a mixture of text and video materials. I decided to get A Cloud Guru course. The content was well structured and easy to follow. Unfortunately, it wasn't deep enough, so I had to read AWS documentation for the main exam topics I was talking about above. I want to share a few links that I found in particular useful for the preparation.

AWS Well-Architected Framework. It would help to read at least the framework overview, but it is beneficial to read about each pillar in detail.
Overview of Amazon Web Services. It is a long, and a bit boring whitepaper that tells about each service there is on AWS. It won't help you as-is, but you may recall some high-level details about a particular service once you are on the exam.
AWS Storage Services Overview. This is a good whitepaper because the exam will have a lot of questions about different ways of storing data in the cloud.
Plan for Disaster Recovery (DR) All FAQs recommended by AWS: EC2, S3, RDS, SQS, Route53
Migrating Your Databases to Amazon Aurora
Manual scaling for Amazon EC2 Auto Scaling
Dynamic scaling for Amazon EC2 Auto Scaling
Predictive scaling for Amazon EC2 Auto Scaling
Controlling which Auto Scaling instances terminate during scale in
Amazon EBS volume performance on Linux instances
Tutorials Dojo AWS Cheat Sheets. They have a lot more information than you need to pass the Solution Architect Associate exam. So, focus only on the services you need to know. Also, I found out those cheat sheets being a good refresher, for example, when you prepare for a job interview.

The second preparation step is hands-on experience at work or in a lab. Seriously, the best way to learn AWS is to get your hands dirty and do something. AWS provides Free Tier for many services, but it is easy to go beyond the Free Tier when preparing for the exam. That's the reason why I use A Cloud Guru service. First, they offer labs as part of the course you're taken. Second, they also provide you a temporary AWS sandbox for your experimentation. If you experiment with AWS a lot, the yearly subscription price is worth it.

And the final step of the preparation is problem-solving. If you want to learn how to solve quizzes, you should solve quizzes. That's why solving practice tests will probably take up 40% - 50% of the preparation. A Cloud Guru course comes up with a practice exam, but I found it to be easier than the actual exam questions. Probably it won't be enough. After researching the community recommendations, I decided to get AWS Certified Solutions Architect Associate Practice Exams from Tutorials Dojo. Believe me or not, it helped a lot. The questions were more complicated than in A Cloud Guru's course and even more challenging than AWS exam questions. Also, each question comes up with an in-depth explanation of why you have to choose the particular answer.

In the end, let me share a few exam tips.

Read the question and each answer twice. Sometimes questions have small nuances that a crucial to choose the correct answer.
Don't spend too much time on a particular question. If you don't know the answer, use the "Mark for Review" feature and answer the questions in the second round. An additional benefit of this approach is you may answer the marked for review question based on the other questions you will see later. It may save you a few points.
Usually, you can quickly identify wrong answers to the question. Try to do it as soon as possible, and then focus on the potential solutions.

I know that the learning process looks overwhelming, and you will have to learn many AWS nuances. But at the end of the day, you will benefit from having a solid understanding of the AWS platform. Good luck with your studies!

How to Pass the AWS Cloud Practitioner Exam

Ilya Verbitskiy — Mon, 31 May 2021 00:00:00 GMT

The Cloud is not the future anymore. It is the present. As a developer or a system engineer, you have no choice but to adopt it.

First, you have to decide what Cloud provides suits your needs. There are three major players on the market: Amazon AWS, Microsoft Azure, and Google Cloud. All of them have similar capabilities and prices. In my opinion, once you have learned one provider, learning another one will be easy.

So, which one should you choose? I will learn the Cloud provides that my clients or the employer use or planning to use. If there aren't any plans yet, I would learn Amazon AWS for the following reasons:

The eldest Cloud provider on the market with excellent documentation and technical support
Probably the largest community on the market, so you will have better chances to find a job or a gig

When I need to learn new things, I have to have a goal and try to make the process as hands-on as possible. I did it with my AWS learning as well. I decided to pass certification by the end of the study and formulated my AWS study journey as a SMART goal. Well, I think that certification is a good thing in general as a learning tool. I wouldn't rely too much on ads that certified engineers are getting better paid. I don't know anyone who git paid because of certifications or was hired because he or she is certified. But it is a fantastic tool to either learn new things or structure your existing knowledge because each exam comes up with the learning objectives and the preparation plan.

Amazon has four different certification levels: foundational, associate, professional, and specialty. Each level contains multiple exams, but if you have no AWS experience, I'd recommend starting with the AWS Cloud Practitioner track. Don't worry about the requirement that you need at least six months of experience in AWS to pass the exam. I believe if you study well and doing many hands-on exercises, you will pass it. But for higher certification levels, I'd recommend getting at least some real-world experience before starting the preparation.

Let's move on to the exam structure. The exam's goal is to challenge you on the cloud basics, high-level AWS architecture, security best practices, and core AWS services like VPC, EC2, RDS, and S3. You should also demonstrate a broad knowledge of AWS services (e.g., which service does what) and a solid understanding of the pricing model.

If you are new to AWS, I suggest starting your journey with the AWS official free training AWS Cloud Practitioner Essentials. It is a free digital course that does not have hands-on exercises or practice exams. You will have to register an AWS account that gives you access to some free services that are enough to prepare for the Cloud Practitioner certification. I think the course is good to start with but is not enough to pass the exam.

After the free course, I didn't feel ready for the test and decided to sign up for the Cloud Guru website. It ended up being a good decision since the training materials are excellent and very easy to follow. Each chapter has hands-on labs you need to do in the real cloud environment and practice tests with questions similar to what you will get on the actual exam. The course takes some time to finish, and I would advise you to keep focus and not spend too much time on it. Otherwise, you may forget what you've learned in the first chapters. At the end of the course, you will have a practice exam. Try to do it as much time as possible before you start getting 95%-100% before going for the real one.

I would also recommend reading two AWS whitepapers:

Overview of Amazon Web Services. It is the list of all services available at AWS with a brief description. You may found it very dry and dull, but I HIGHLY recommend reading it since the exam has many questions about what service you can use to solve a problem. Let's talk if you need advice on cloud architecture for your project and what services may help you solve your problem in a fast and cost-effective way.
How AWS Pricing Works. The whitepaper is VERY IMPORTANT not only because there are many questions on pricing in the exam, but also because you must understand it for real-life projects. Cloud solutions may cost you a lot when you do not pay attention to the usage. Let's talk if you need any help reducing your cloud workloads' cost.

By this time, you should be ready for your first AWS certification exam. Good luck!

The fastest way to expose AWS Lambda to Internet via Function URL

Ilya Verbitskiy — Fri, 21 Oct 2022 00:00:00 GMT

From what I can see, Serverless applications, especially ones heavily utilizing AWS Lambda, are getting increasingly popular nowadays. Lambda is an excellent tool for almost any project size when developers do not want to do any computing planning and want to focus on writing code. It is handy: you create your function, and it "magically" appears in the Cloud.

One of the "limitations" lambdas have is that they were not intended to be used for REST API development. The usual solution to this problem is bringing in Amazon API Gateway in front of your functions to handle REST. This approach works very well for any project size: from a startup with one API to an enterprise with thousands of APIs. But sometimes it feels like an overkill. For example, you develop a "micro" project with one or two lambdas that you want to use. For this use case having an API Gateway feels too much from an operations and cost perspective.

Fortunately, AWS Lambdas have a feature known as Function URLs. When you enable this feature, Amazon will create a dedicated HTTPS endpoint for your function. Function URL format follows the same pattern:

https://[url-id].lambda-url.[region].on.aws/

I will create a lambda function and expose it to the public endpoint in the following example. The lambda function accepts one parameter name and returns a JSON object with the message "Hello, [name]!"

from typing import Dict, Any
import json


def error(message: str) -> Dict[str, Any]:
    body = json.dumps({"error": message})
    return {"statusCode": 400, "body": body}


def lambda_handler(event: Dict[str, Any], context: Any) -> Dict[str, Any]:
    req = event.get("queryStringParameters", None)
    if req is None:
        return {"statusCode": 400, "body": error("Bad Request")}

    name = req.get("name", "").strip()
    if name == "":
        return {"statusCode": 400, "body": error("Name is required")}

    res = {"message": f"Hello, {name}!"}
    return {"statusCode": 200, "body": json.dumps(res)}

The function's name is my_public_hello_world. I will use this name in the scripts below.

As you may note, the function expected a particular event structure. The event format is well defined in AWS Lambda's Function URLs documentation. I recommend taking a look if you haven't done it yet.

The lambda function is ready, and it is time to make it publicly available. As everything is AWS, there are multiple ways of doing it (AWS Console, AWS CLI, CloudFormaton, etc.) I prefer command line configuration, so let's use the AWS CLI.

aws lambda create-function-url-config --function-name my_public_hello_world --auth-type NONE

You might be curious what --auth-type NONE means. Function URLs support two security models:

NONE. This authentication model means the lambda does not provide any security check, but it will require a lambda's resource policy granting lambda:InvokeFunctionUrl permission to "*" (all users).
AWS_IAM. This model tells lambda to use IAM to authenticate users. So, as a developer, you will create users or roles in IAM and grant them access to call the lambda over the Internet. I show its sample below after we sort out the NONE authentication model example.

Your function's output should be similar to the following JSON.

{
  "FunctionUrl": "https://b65ugjirvcpyl6rfyyhru63x7q0ztjha.lambda-url.us-east-1.on.aws/",
  "FunctionArn": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:my_public_hello_world",
  "AuthType": "NONE",
  "CreationTime": "2022-10-20T15:53:23.180321Z"
}

The most important information here is the FunctionUrl attribute that tells use the endpoint URL to call the lambda function. Let's give it a try using CURL.

$ curl https://b65ugjirvcpyl6rfyyhru63x7q0ztjha.lambda-url.us-east-1.on.aws
{"Message":"Forbidden"}

The problem here is my sample lambda function does not have a resource policy that allows anyone to execute it. Let's add it.

aws lambda add-permission --function-name my_public_hello_world --action lambda:InvokeFunctionUrl --statement-id https --principal "*" --function-url-auth-type NONE --output text

Now our function is open to all Internet. Please make sure you are opening up lambdas very carefully. Anyone can access it as long as she knows the URL. It is your responsibility to develop customer authentication and authorization, e.g., validate the captcha if you develop an open API, or secure your endpoint with JWT tokens, etc.

$ curl "https://b65ugjirvcpyl6rfyyhru63x7q0ztjha.lambda-url.us-east-1.on.aws/?name=John"
{"message": "Hello, John!"}

Custom security mechanisms development is not an easy task, and developers want to avoid it if possible. AWS Lambda's Function URL supports IAM-based authentication, which may simplify your life. If you decide to use this mode, your app's users or microservices will be authenticated via IAM to call lambda. You will be able to use the full power of AWS IAM to allow or deny access to your endpoints.

First, let's remove the old endpoint.

aws lambda delete-function-url-config --function-name my_public_hello_world

And create a new one based on IAM.

aws lambda create-function-url-config --function-name my_public_hello_world --auth-type AWS_IAM

If everything works well, you should see a similar output:

{
  "FunctionUrl": "https://gpjrjlfev5cwef6ydaetd2gx2m0stsln.lambda-url.us-east-1.on.aws/",
  "FunctionArn": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:my_public_hello_world",
  "AuthType": "AWS_IAM",
  "CreationTime": "2022-10-20T16:58:39.807565Z"
}

As you may expect, the CURL request to the endpoint fails.

$ curl "https://gpjrjlfev5cwef6ydaetd2gx2m0stsln.lambda-url.us-east-1.on.aws/?name=Johm"
{"Message":"Forbidden"}

The first step to solving this problem is making the HTTP request via a tool that supports AWS Signature Version 4 (SigV4). I use awscurl, which is available on Brew if you use macOS.

Second, you need to define either an identity-based policy or a resource-based policy for the lambda. In my example, I will attach an identity-based policy to my test user.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "https",
      "Effect": "Allow",
      "Action": "lambda:InvokeFunctionUrl",
      "Resource": "arn:aws:lambda:us-east-1:XXXXXXXXXXXX:function:my_public_hello_world",
      "Condition": {
        "StringEquals": {
          "lambda:FunctionUrlAuthType": "AWS_IAM"
        }
      }
    }
  ]
}

The Condition attribute IS VERY IMPORTANT. Your user won't be able to call the function if it has not been provided.

Once the correct policy is in place, my test user is capable of calling the URL using awscurl utility.

$ awscurl --service lambda "https://gpjrjlfev5cwef6ydaetd2gx2m0stsln.lambda-url.us-east-1.on.aws/?name=John"
{"message": "Hello, John!"}

Last but not least is mentioning Function URLs support CORS, which is a crucial feature when you develop web applications. Please follow Function URLs CORS article in AWS documentation.

Finally, I want to share my opinion on when you will need this feature. Each function has its unique endpoint, so the more lambdas you have, the more endpoints your client application will have to consume. That's why I wrote at the beginning that this approach works well for "micro" applications, e.g., you need a backend for the contact form for your blog, as I have on mine. Another use case is a microservices architecture when each function is a microservice. In this case, Function URL offers a cheaper solution (otherwise, you need an API gateway per function). For more sophisticated use cases, like when you have to expose multiple lambdas within the same project, consider Amazon API Gateway.

All sample code is available on my GitHub repository

Howto extend EBS volume on Amazon Linux 2

Ilya Verbitskiy — Sat, 22 Oct 2022 00:00:00 GMT

AWS Elastic Block Store (or EBS) is Amazon's highly available, low-latency block storage solution. The service is used together with EC2 to persist data on the instances. EBS provides multiple settings that allow fine-tuning your application to meet your performance, storage size, and cost requirements.

One of the best advantages of EBS is the fact that it is elastic. It means you can always add more gigabytes to your existing volumes if the current ones run out of space.

In the following article, I will provide the exact steps to achieve it on Amazon Linux 2 that I usually use to run my workloads. Also, please remember that my servers run on the Amazon Nitro platform. You can verify if you run on Nitro or an elder Xen-based virtualization platform by running the following command.

$ aws ec2 describe-instance-types --instance-type t3.small --query "InstanceTypes[].Hypervisor"
[
    "nitro"
]

My demo instance uses the t3.small instance type on the us-east-1 region. As you can see from the output, it runs on the Nitro virtualization platform.

Let's check how much disk space I use.

$ df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  960M     0  960M   0% /dev
tmpfs          tmpfs     969M     0  969M   0% /dev/shm
tmpfs          tmpfs     969M  348K  968M   1% /run
tmpfs          tmpfs     969M     0  969M   0% /sys/fs/cgroup
/dev/nvme0n1p1 xfs        30G  1.7G   29G   6% /

My attached EBS volume is 30 Gb. Let's extend it to 40 Gb. First, you need to modify the EBS volume itself. I found it handy to use AWS CLI to do the operation, but it is also doable from the AWS Management Console. DO NOT FORGET TO TAKE EBS SNAPSHOT BEFORE YOU CONTINUE WITH THE NEXT STEPS!

$ aws ec2 modify-volume --size 40 --volume-id vol-0c87855625e650f4d
{
    "VolumeModification": {
        "VolumeId": "vol-0c87855625e650f4d",
        "ModificationState": "modifying",
        "TargetSize": 40,
        "TargetIops": 120,
        "TargetVolumeType": "gp2",
        "TargetMultiAttachEnabled": false,
        "OriginalSize": 30,
        "OriginalIops": 100,
        "OriginalVolumeType": "gp2",
        "OriginalMultiAttachEnabled": false,
        "Progress": 0,
        "StartTime": "2022-10-21T15:30:44+00:00"
    }
}

vol-0c87855625e650f4d is my EBS vloume ID. You can find yours on the EC2 instance's details page in AWS Management Console. Now you need to wait till the volume is ready. You should wait till your EBS volume's Volume status is Okay, and the size is your new size (40 Gb in my example) on the EBS Volumes page in the Management Console.

Now, log in to your EC2 instance and list partitions.

$ sudo lsblk
NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1       259:0    0  40G  0 disk
├─nvme0n1p1   259:1    0  30G  0 part /
└─nvme0n1p128 259:2    0   1M  0 part

My nvme0n1p1 is 30 Gb only, but I want to extend it to 40 Gb. Use growpart command to extend the nvme0n1p1 partition in a partition table to fill available space.

$ sudo growpart /dev/nvme0n1 1
CHANGED: partition=1 start=4096 old: size=62910431 end=62914527 new: size=83881951 end=83886047

Verify the successful completion of the operation.

$ sudo lsblk
NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1       259:0    0  40G  0 disk
├─nvme0n1p1   259:1    0  40G  0 part /
└─nvme0n1p128 259:2    0   1M  0 part

But the fact that you extended the partition does not mean that you increased the size of the existing file system. For example, my root ("/") is still 30 Gb.

$ df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  960M     0  960M   0% /dev
tmpfs          tmpfs     969M     0  969M   0% /dev/shm
tmpfs          tmpfs     969M  348K  968M   1% /run
tmpfs          tmpfs     969M     0  969M   0% /sys/fs/cgroup
/dev/nvme0n1p1 xfs        30G  1.7G   29G   6% /

Since my EC2 instance uses XFS, I can use xfs_growfs command to increase the size of the file system.

sudo xfs_growfs -d /

Finally, verify the successful completion of the operation and my root is 40 Gb.

$ df -hT
Filesystem     Type      Size  Used Avail Use% Mounted on
devtmpfs       devtmpfs  960M     0  960M   0% /dev
tmpfs          tmpfs     969M     0  969M   0% /dev/shm
tmpfs          tmpfs     969M  348K  968M   1% /run
tmpfs          tmpfs     969M     0  969M   0% /sys/fs/cgroup
/dev/nvme0n1p1 xfs        40G  1.7G   39G   5% /

If you use a Xen-based instance, follow a deeper tutorial from AWS documentation.

Lightweight search for .NET Core

Ilya Verbitskiy — Sat, 19 Sep 2020 00:00:00 GMT

Effective search is an essential component of a successful e-Commerce website. The latest research shows that people who use search are more likely to purchase products than those who are just browsing the products catalog. That's why I have been dealing with full-text search for years. If you live in the .NET ecosystem, then the obvious choice is to use Solr or Elasticsearch. Both of them are excellent scalable open-source options, but sometimes their usage is an overkill.

Few times a year, I need to build a full-text search functionality. While Solr or Elasticsearch are the right choices, bringing a standalone server is too much hassle. Another option is to use Amazon Elasticsearch Service, but sometimes I need a free solution. If the project uses SQL Server or MongoDB, you can use built-in full-text search indexes. That's an excellent option for small or medium-size projects, and I use it quite often.

Sometimes you can face the situation when the above options are not available. In such cases, I ended up building a full-text search microservice using Node.js and Lunr. Designed to be small, yet full-featured, Lunr enables you to provide a great search experience without external, server-side search services. I had been using it till the last month when I found the LunrCore project.

It is a port of Lunr to .NET Core. LunrCore is a small, full-text search library for use in small and medium-size applications. It indexes documents and provides a simple search interface for retrieving documents that best match text queries. Let's create a simple console application that filters out product names from the nopCommerce database. NopCommerce is one of the most popular shopping carts for ASP.NET that I am using quite a lot for e-Commerce clients. I will use Dapper to retrieve the data. I think this is still the fastest way if you like writing SQL.

First, let's add LunrCore and Dapper to our project.

dotnet add package LunrCore --version 2.3.8.5
dotnet add package Dapper --version 2.0.35

Now, let's create a Product model that we will get from the database and index.

class Product
{
    public int Id { get; set; }

    public string Name { get; set; }
}

The next step is loading all products from the database and indexing them. You can do it by using the Index class from LunrCore.

private static async Task<Index> IndexProducts()
{
    using var connection = new SqlConnection(ConnectionString);
    var products = await connection.QueryAsync<Product>("select Id, Name from dbo.Product");
    var index = await Index.Build(async builder =>
    {
        builder.AddField(new Field<string>("name"));

        foreach (var product in products)
        {
            await builder.Add(new Document
            {
                ["id"] = product.Id,
                ["name"] = product.Name
            });
        }
    });

    return index;
}

I want to add a few comments about this code. First, use the AddField method to create an index structure. My index has only one field called name. Second, the Add method accepts a document to index. A Document is an object implementing IDictionary interface. The document must have a field called id. This field contains an entity identifier returned from the index if the document matches the search criteria.

Let's move on and implement search functionality. The index object has the Search method that returns a collection of matched documents.

var results = index.Search(query);

Each match object has DocumentReference, Score, and MatchData properties. MatchData contains the information about what term was found wherein the document. The Score property contains the document's relevance, and the DocumentReference includes the document's identifier. Keep in mind that DocumentReference is a string. In my example, it should be converted to an integer to be used in the SQL query.

var ids = new List<int>();
await foreach(var item in results)
{
    ids.Add(int.Parse(item.DocumentReference));
}

var products = await FindProducts(ids);

private static async Task<IEnumerable<Product>> FindProducts(IEnumerable<int> ids)
{
    using var connection = new SqlConnection(ConnectionString);
    var products = await connection.QueryAsync<Product>(
        "select Id, Name from dbo.Product where Id in @ids",
        new { ids });
    return products;
}

LunrCore Query Syntax

'red' - find all documents that have a 'red' word in it.
'red apple' - find all documents with either a 'red' or an 'apple' word in it. The search terms are combined with OR.
'name:red' - find all documents with a 'red' word in its name field. The field names are defined in the AddField method's parameter.
'bla*' - the wildcards search. A wildcard is represented as an asterisk (*) and can appear anywhere in a search term. In this example, the term might be blank, blanket, or black.
'+red +apple -green' - To indicate that a term must be present in matching documents, the term should be prefixed with a plus (+), and to indicate that a term must be absent, the term should be prefixed with a minus (-). In our example, the search algorithm will return all red apples, but won't give you green apples.

Index Persistence

Last but not least feature is index persistence. LunrCore can store the index in JSON format to a stream and load it back from the stream.

using var productsStream = File.OpenWrite(IDX);
await index.SaveToJsonStream(productsStream);
productsStream.Close();

using var stream = File.OpenRead(IDX);
index = await Index.LoadFromJsonStream(stream);
stream.Close();

Unfortunately, the only supported format is JSON, but there is an issue on GitHub to add additional serialization mechanisms later.

That's all I wanted to say about my experience suing LunrCore. Give it a try!

Migrating Amazon ECR Images Between AWS Accounts with a Simple Bash Script

Ilya Verbitskiy — Fri, 05 Jun 2026 00:00:00 GMT

Moving container images between AWS accounts is a common task during platform migrations, account restructuring, environment promotion, mergers, acquisitions, or when separating production and non-production workloads.

Amazon Elastic Container Registry, or Amazon ECR, makes it easy to store and distribute container images inside AWS. But when you need to copy images from one AWS account to another, especially across multiple repositories and tags, the process can become repetitive.

To make this easier, I created a Bash script that migrates ECR images from a source AWS account to a destination AWS account.

GitHub repository: https://github.com/ilich/aws-samples/tree/main/ecr/migrate-repositories

The script lets you copy one or more repository:tag images from one ECR registry to another while preserving the repository name and tag. It also supports multi-platform images, such as images built for both linux/amd64 and linux/arm64.

What the Script Does

The script migrates container images between two Amazon ECR registries.

At a high level, it:

Authenticates Docker to the source ECR registry.
Authenticates Docker to the destination ECR registry.
Iterates through one or more repository:tag values.
Copies each image from the source registry to the destination registry.
Preserves the image tag.
Preserves multi-architecture image manifests.
Reports any failed image copies at the end.

This is useful when you want to move images between AWS accounts without manually pulling, tagging, and pushing each image one by one.

Why This Is Useful

A traditional image migration workflow often looks like this:

docker pull source-account.dkr.ecr.region.amazonaws.com/myapp:latest
docker tag source-account.dkr.ecr.region.amazonaws.com/myapp:latest destination-account.dkr.ecr.region.amazonaws.com/myapp:latest
docker push destination-account.dkr.ecr.region.amazonaws.com/myapp:latest

That works, but it has several drawbacks.

First, it requires local disk space because the image layers are pulled to your machine. Second, it can be slower for large images. Third, it may not preserve multi-platform manifests correctly if you only pull a single platform image locally. Finally, it becomes tedious when you need to migrate many images.

The script avoids this by using Docker Buildx imagetools create, which copies the image manifest directly between registries. This means the image does not need to be pulled to local disk first.

Prerequisites

Before using the script, you need the following installed and configured:

AWS CLI v2
Docker
Docker Buildx

You also need AWS CLI profiles configured for both the source and destination accounts.

The source AWS account needs permissions to read from ECR, including:

ecr:GetAuthorizationToken
ecr:BatchGetImage
ecr:GetDownloadUrlForLayer

The destination AWS account needs permissions to write to ECR, including:

ecr:GetAuthorizationToken
ecr:InitiateLayerUpload
ecr:UploadLayerPart
ecr:CompleteLayerUpload
ecr:PutImage

The target repositories must already exist in the destination account. The script does not create ECR repositories automatically.

For example, if you are migrating:

myapp:latest

from the source account, then the destination account must already have an ECR repository named:

myapp

Basic Usage

The script uses the following syntax:

./migrate-ecr-images.sh [OPTIONS] -s SOURCE_ACCOUNT -d DEST_ACCOUNT -r REGION repo:tag [repo:tag ...]

Required arguments:

-s SOURCE_ACCOUNT   AWS account ID of the source account
-d DEST_ACCOUNT     AWS account ID of the destination account
-r REGION           AWS region, for example us-east-1
repo:tag            One or more repository and tag pairs to migrate

Optional arguments:

--src-profile PROFILE   AWS CLI profile for the source account
--dst-profile PROFILE   AWS CLI profile for the destination account
-h, --help              Show help

If no profiles are provided, the script uses the default AWS CLI profile for both accounts.

Example: Migrating a Single Image

To migrate a single image using the default AWS CLI profile, run:

./migrate-ecr-images.sh \
  -s 111122223333 \
  -d 444455556666 \
  -r us-east-1 \
  myapp:latest

This copies:

111122223333.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

to:

444455556666.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

Example: Migrating Multiple Images

You can also migrate multiple images in one command:

./migrate-ecr-images.sh \
  -s 111122223333 \
  -d 444455556666 \
  -r us-east-1 \
  myapp:latest myapp:v1.2.3 worker:v3.1

The script processes each image independently. If one image fails, the script records the failure and continues processing the remaining images.

Example: Using Named AWS Profiles

In many real-world environments, you will use different AWS CLI profiles for each account.

For example:

./migrate-ecr-images.sh \
  -s 111122223333 --src-profile prod-readonly \
  -d 444455556666 --dst-profile staging-admin \
  -r us-east-1 \
  myapp:latest myapp:v1.2.3 worker:v3.1

In this example, prod-readonly is used to authenticate to the source account, and staging-admin is used to authenticate to the destination account.

This is especially helpful when migrating from production to staging, from a legacy AWS account to a new landing zone account, or from one organizational unit to another.

Why Docker Buildx?

The key part of the script is Docker Buildx, specifically:

docker buildx imagetools create --tag "$dst_image" "$src_image"

I used Docker Buildx because it allows the script to copy the image reference from the source registry to the destination registry without doing a traditional local pull, tag, and push workflow.

This has a few important benefits.

First, it avoids unnecessary local disk usage. The image does not need to be downloaded to the machine running the script before it is pushed to the destination registry.

Second, it is better suited for multi-platform images. Many modern container images support more than one CPU architecture, for example linux/amd64 and linux/arm64. A regular docker pull may pull only the platform-specific image for the local machine. With docker buildx imagetools create, the manifest list can be copied as-is, preserving the multi-architecture image in the destination registry.

Third, it keeps the migration workflow simple. The script only needs to authenticate to both ECR registries and ask Docker Buildx to create the destination image reference from the source image reference.

This makes the script lightweight, repeatable, and practical for moving multiple ECR images between AWS accounts.

Repository Naming Requirements

The script preserves repository names.

That means this source image:

source-account.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

is copied to:

destination-account.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

The repository name remains:

myapp

The tag remains:

latest

Because of this, the destination ECR repository must already exist with the same name. If the destination repository does not exist, the image copy will fail.

This design keeps the script focused on image migration only. Repository creation, lifecycle policies, scanning configuration, encryption settings, and access policies can be managed separately using Terraform, AWS CDK, CloudFormation, or another infrastructure-as-code tool.

When to Use This Script

This script is useful for scenarios such as:

Migrating images from one AWS account to another
Promoting images between isolated AWS accounts
Moving workloads into a new AWS organization or landing zone
Copying production images into a staging or disaster recovery account
Preserving multi-architecture images during migration
Batch-copying multiple tagged images

It is especially helpful when you want a simple, repeatable command-line workflow without introducing a larger migration tool.

Things to Keep in Mind

There are a few important considerations before running the script.

First, the destination repositories must already exist. The script does not create missing repositories.

Second, the repository names are preserved. If you need to rename repositories during migration, you would need to modify the script.

Third, the script works within a single AWS region per execution. If you need to migrate images across regions, run the script with the appropriate region or extend it to support separate source and destination regions.

Fourth, make sure both AWS profiles have the required ECR permissions. Source access must be able to read image manifests and layers. Destination access must be able to write image manifests and upload layers.

Finally, validate migrated images before updating production workloads to consume them from the new account.

Conclusion

Migrating ECR images between AWS accounts does not need to be complicated.

This Bash script provides a simple way to copy one or more tagged images from a source AWS account to a destination AWS account. It authenticates to both registries, preserves repository names and tags, supports multi-platform images, and avoids local image pull and push operations by using Docker Buildx imagetools create.

For teams working with multi-account AWS environments, this provides a lightweight and repeatable approach for ECR image migration.

Neo4j Cypher Cheat Sheet

Ilya Verbitskiy — Tue, 20 May 2025 00:00:00 GMT

Graph databases like Neo4j are gaining traction as powerful alternatives to traditional relational databases—especially when navigating complex relationships in data. Unlike SQL databases, which rely on foreign keys and multi-table joins to express connections, graph databases store data as nodes and relationships, enabling intuitive and high-performance traversal of deeply interconnected structures. It makes them ideal for use cases like recommendation engines, fraud detection, knowledge graphs, and social networks—where relationships matter as much as the data.

Neo4j, the most popular graph database, uses a declarative query language called Cypher, designed to make relationship-focused queries both expressive and readable. But how does this graph-based approach compare to traditional SQL in practice?

To answer that, this post will take the classic Northwind database, a well-known sample dataset originally built for relational systems, and reimagine it as a graph. I’ll describe how common queries are expressed in SQL and Cypher, highlighting the differences in modeling, query complexity, and real-world usability. Whether you’re a SQL veteran curious about graph databases or looking to understand where Neo4j fits in the broader database ecosystem, this comparison offers a practical side-by-side view.

The Northwind database models the internal operations of a fictional company that imports and exports specialty foods worldwide. It includes data about customers, orders, products, employees, suppliers, and shipping details, offering a realistic yet manageable snapshot of a typical business’s sales and supply chain workflow. For example, customers place orders that contain multiple products, each supplied by different vendors. Employees are organized hierarchically and are assigned to manage specific orders, while shipments are routed through carriers to fulfill customer requests.

Despite its relatively small size, Northwind’s schema reflects a rich network of relationships. Products belong to categories and are supplied by vendors. Orders are linked to customers, employees, and shipping methods. It creates a network of interconnected entities that mimics the real-world complexity of business data. In a traditional SQL database, these relationships are enforced through foreign keys and JOIN operations. In a graph database like Neo4j, they become direct relationships between nodes—making Northwind an ideal candidate for comparing the relational and graph data models side-by-side.

In Neo4j, the Northwind database is modeled as a graph, where each entity, such as a customer, order, product, or employee, is represented as a node, and the connections between them become relationships (edges). Instead of using foreign keys, relationships in Neo4j are explicit and stored directly between nodes, allowing for fast and intuitive traversal. Each node is assigned one or more labels to indicate its type, such as :Customer, :Order, :Product, or :Supplier. Relationships between these nodes use relationship types like :PURCHASED (between a customer and an order), :ORDERS (between an order and its products), or :SUPPLIES (between a supplier and a product).

This approach mirrors the business logic naturally, making queries about connected data more expressive and often more efficient than traditional SQL joins.

To demonstrate the practical differences between SQL and Cypher, the next section of this post will serve as a Cypher cheat sheet, translating common SQL queries into their graph-based equivalents. These examples will be drawn directly from typical use cases in the Northwind database—such as finding a customer’s order history, retrieving products from a specific supplier, or analyzing co-purchased items. By comparing each SQL query side-by-side with its Cypher counterpart, you’ll see how graph queries eliminate complex joins and bring clarity to relationship-driven logic. Whether you’re just starting with Neo4j or looking to deepen your understanding, this cheat sheet offers a hands-on look at how Cypher approaches problems differently from traditional relational databases.

Get all columns from the Customers table.

SELECT * FROM Customers

MATCH (c:Customer) RETURN c

Get the top 25 Customers alphabetically by Country and name.

SELECT TOP 25 *
FROM Customers
ORDER BY Country, ContactName

MATCH (c:Customer)
ORDER BY c.country, c.contactName
RETURN c
LIMIT 25;

Get the count of all Orders made during 1997.

SELECT COUNT(*)
FROM Orders
WHERE YEAR(OrderDate) = 1997

MATCH (o:Order)
WHERE o.orderDate.year = 1997
RETURN COUNT(o);

Get all orders placed on the 19th of May, 1997.

SELECT *
FROM Orders
WHERE OrderDate = '19970319'

MATCH (o:Order)
WHERE o.orderDate = date('1997-03-19')
RETURN o;

Create a report for all the orders of 1996 and their Customers.

SELECT *
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE YEAR(o.OrderDate) = 1996

MATCH p=(c:Customer)-[:PURCHASED]->(o:Order)
WHERE o.orderDate.year = 1996
RETURN p;

Create a report for all 1996 orders and their Customers. Return only the Order ID, Order Date, Customer ID, Name, and Country.

SELECT o.OrderID, o.OrderDate, c.CustomerID, c.ContactName, c.Country
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE YEAR(o.OrderDate) = 1996

MATCH (c:Customer)-[:PURCHASED]->(o:Order)
WHERE o.orderDate.year = 1996
RETURN o.orderID, o.orderDate, c.customerID, c.contactName, c.country;

Create a report that shows the number of customers from each city.

SELECT c.City, COUNT(*)
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
GROUP BY c.City

MATCH (c:Customer)-[:PURCHASED]->(o:Order)
RETURN c.city, COUNT(*);

Create a report that shows the total quantity of products ordered. Only show records for products for which the quantity ordered is fewer than 200

SELECT p.ProductName, SUM(od.Quantity) as Quantity
FROM OrderDetails od
INNER JOIN Products p ON od.ProductID = p.ProductID
GROUP BY p.ProductName
HAVING SUM(od.Quantity) < 200
ORDER BY Quantity

MATCH (o:Order)-[od:ORDERS]->(p:Product)
WITH p.productName as productName, SUM(od.quantity) as quantity
WHERE quantity < 200
ORDER BY quantity
RETURN productName, quantity;

Create a report that shows the total number of orders by Customer since December 31, 1996. The report should only return rows for which the total number of orders is greater than 15

SELECT c.ContactName, COUNT(o.OrderID) as TotalOrders
FROM Orders o
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE OrderDate > '1996-12-31'
GROUP BY c.ContactName
HAVING COUNT(o.OrderID) > 15

MATCH (c:Customer)-[:PURCHASED]->(o:Order)
WHERE o.orderDate > date("1996-12-31")
WITH c.contactName as contactName, COUNT(o.orderID) as totalOrders
WHERE totalOrders > 15
RETURN contactName, totalOrders;

Which UK Customers have paid more than 1000 dollars

SELECT c.ContactName, SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) as Paid
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
INNER JOIN OrderDetails od ON o.OrderID = od.OrderID
WHERE c.Country = 'UK'
GROUP BY c.ContactName
HAVING SUM(od.UnitPrice * od.Quantity * (1 - od.Discount)) > 1000

MATCH (c:Customer)-[:PURCHASED]->(o:Order)-[od:ORDERS]->(p:Product)
WHERE c.country = 'UK’
WITH c.contactName as contactName, SUM(toFloat(od.unitPrice) * od.quantity * (1 - toFLoat(od.discount))) as paid
WHERE paid > 1000
RETURN contactName, paid;

Insert yourself into the Customers table Include the following fields: CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax

INSERT INTO Customers (CustomerID, CompanyName, ContactName, ContactTitle, Address, City, Region, PostalCode, Country, Phone, Fax)
VALUES ('ILYA1', 'Acme Corp', 'Ilya Verbitskiy', 'Manager', '123 Main St', 'New York', 'NY', '10001', 'USA', '555-1234', '555-5678')

Insert node

CREATE (c:Customer {
  customerID: 'ILYA1',
  companyName: 'Acme Corp',
  contactName: 'Ilya Verbitskiy',
  contactTitle: 'Manager',
  address: '123 Main St',
  city: 'New York',
  region: 'NY',
  postalCode: '10001',
  country: 'USA',
  phone: '555-1234',
  fax: '555-5678’
})

Upsert node

MERGE (c:Customer {customerID: 'ILYA1'})
SET c.companyName = 'Acme Corp',
  c.contactName = 'Ilya Verbitskiy',
  c.contactTitle = 'Manager',
  c.address = '123 Main St',
  c.city = 'New York',
  c.region = 'NY',
  c.postalCode = '10001',
  c.country = 'USA',
  c.phone = '555-1234',
  c.fax = '555-5678'

Insert multiple entities within a transaction

BEGIN TRANSACTION

INSERT Orders(CustomerID, EmployeeID, OrderDate)
VALUES ('ILYA', 5, GETDATE())

DECLARE @LastOrderID INT = SCOPE_IDENTITY()
DECLARE @ProductId INT
SELECT @ProductId = ProductID FROM Products WHERE ProductName = 'Tofu'

INSERT OrderDetails(OrderID, ProductID, UnitPrice, Quantity, Discount)
VALUES (@LastOrderID, @ProductId, 10, 8, 0)

COMMIT

MATCH (c:Customer {customerID: 'ILYA’})
MATCH (p:Product {productName: 'Tofu’})
MERGE (o:Order {orderID: "9090"}) SET o.orderDate = date()
MERGE u=(c)-[:PURCHASED]->(o)
MERGE v=(o)-[po:ORDERS {unitPrice: "10", quantity: 8, discount: "0", productID: p.productID, orderID: o.orderID}]->(p)
RETURN u, v

Change Order.orderDate data type from String to Date

MERGE (o:Order)
SET o.orderDate = date(substring(o.orderDate, 0, 10))

Update the phone number.

UPDATE Customers SET Phone = '000-4321' WHERE CustomerID = 'ILYA'

MERGE (c:Customer {customerID: 'ILYA'})
SET c.phone = '000-4321';

Double the quantity of the order details record you inserted before

UPDATE od
SET Quantity = od.Quantity * 2
FROM OrderDetails od
INNER JOIN Orders o ON od.OrderID = o.OrderID
WHERE o.OrderID = 11084

MATCH (o:Order {orderID: "9090"})-[od:ORDERS]->(p:Product {productID: "14"})
SET od.quantity = od.quantity * 2
RETURN o, od, p;

Delete the records you inserted before. Don't delete any other records.

BEGIN TRANSACTION
DELETE od
FROM OrderDetails od
INNER JOIN dbo.Orders O on O.OrderID = od.OrderID
INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE c.CustomerID = 'ILYA'

DELETE o
FROM Orders o INNER JOIN Customers c ON o.CustomerID = c.CustomerID
WHERE c.CustomerID = 'ILYA'

DELETE Customers WHERE CustomerID = 'ILYA'
COMMIT

MATCH (c:Customer {customerID: "ILYA1"})-[:PURCHASED]->(o:Order)-[:ORDERS]->(p:Product)
DETACH DELETE c, o;

This cheat sheet highlights how graph databases simplify working with connected data by reimagining the Northwind database in Neo4j and translating common SQL queries into Cypher. Neo4j’s intuitive node-and-relationship model eliminates complex joins and makes relationship-driven queries more expressive, maintainable, and performant. Whether you’re exploring Neo4j for the first time or looking to deepen your understanding of Cypher, this post offers a practical, side-by-side comparison showing how graph thinking can transform how we query and structure data.

If you’re considering adopting Neo4j or need help modernizing your data architecture, feel free to contact me. I offer consulting services to help teams design and implement graph-based solutions that scale.

Rererences

No Backend Needed: Running Python in React with Pyodide

Ilya Verbitskiy — Sun, 01 Feb 2026 00:00:00 GMT

Introduction

Running Python in the browser used to sound like a gimmick — interesting, but not especially practical. Today, it’s becoming a genuinely useful tool for building data-driven applications, interactive demos, and even lightweight analysis tools without needing a backend.

That’s exactly what Pyodide enables: a full Python runtime compiled to WebAssembly (WASM), running entirely in the browser. With Pyodide, you can execute Python code from JavaScript/TypeScript, load Python packages such as numpy, and even generate charts using matplotlib — all client-side.

In this post, we’ll walk through how to use Pyodide in a modern frontend stack: React + TypeScript + Vite. We’ll cover:

how to install and initialize Pyodide inside a React app
how to load Python packages dynamically
how to run numpy + matplotlib to visualize revenue data
how to bridge Python outputs back into React UI

By the end, you’ll have a working React component that executes Python code in the browser and renders a plot generated by Python — no server required.

Setting up Pyodide in Vite

Pyodide can be loaded either directly from a CDN or bundled into your application. For quick prototypes, the CDN option is often the easiest. But for real production apps — especially if you want your app to work offline or avoid depending on an external CDN — bundling Pyodide into your Vite build is a great choice.

In this section, we’ll configure Vite + React + TypeScript so Pyodide works in:

Vite dev mode (npx vite)
production build (npx vite build)
production preview (npx vite preview)

Option 1 — Loading Pyodide from a CDN (quickest setup)

For many applications the simplest approach is to load Pyodide using a CDN by providing the indexURL parameter:

import { loadPyodide, version as pyodideVersion } from "pyodide";

async function initPyodide() {
  const pyodide = await loadPyodide({
    indexURL: `https://cdn.jsdelivr.net/pyodide/v${pyodideVersion}/full/`,
  });

  return pyodide;
}

This approach works with most bundlers without additional configuration and is recommended for many users. It’s also great if you want to avoid adding Pyodide assets to your own build output.

Option 2 — Bundling Pyodide in Vite (recommended for production apps)

When using Vite, Pyodide requires a small amount of additional configuration:

Pyodide must be excluded from dependency pre-bundling
Pyodide runtime files must be copied into the final build output (dist/assets)

To do that, install the required packages:

npm install pyodide vite-plugin-static-copy

Configure Vite to copy Pyodide assets

In your project, update vite.config.ts to:

exclude pyodide from optimizeDeps, and
copy Pyodide distribution files to the build output via vite-plugin-static-copy

Here is the exact TypeScript config (React included) that works well:

import { defineConfig } from "vite";
import { viteStaticCopy } from "vite-plugin-static-copy";
import { dirname, join } from "path";
import { fileURLToPath } from "url";
import react from "@vitejs/plugin-react";

const PYODIDE_EXCLUDE = [
  "!**/*.{md,html}",
  "!**/*.d.ts",
  "!**/*.whl",
  "!**/node_modules",
];

export function viteStaticCopyPyodide() {
  const pyodideDir = dirname(fileURLToPath(import.meta.resolve("pyodide")));
  return viteStaticCopy({
    targets: [
      {
        src: [join(pyodideDir, "*")].concat(PYODIDE_EXCLUDE),
        dest: "assets",
      },
    ],
  });
}

// https://vite.dev/config/
export default defineConfig({
  optimizeDeps: { exclude: ["pyodide"] },
  plugins: [react(), viteStaticCopyPyodide()],
});

With this setup, Vite will ensure Pyodide files such as pyodide.js, .wasm, and supporting runtime files are copied into dist/assets/ for production builds

Setting indexURL when using bundled assets

Once Pyodide is copied into dist/assets, you may want to explicitly point Pyodide to the correct path using indexURL:

const pyodide = await loadPyodide({
  indexURL: "/assets",
});

This tells Pyodide where to find pyodide.js, .wasm, and related files.

Loading packages

Once Pyodide is initialized, the next step is getting access to the Python ecosystem you care about. Pyodide ships with a large set of prebuilt packages (including numpy and matplotlib) that you can load on demand using pyodide.loadPackage().

The core API: pyodide.loadPackage()

Packages included in the official Pyodide repository can be loaded like this:

await pyodide.loadPackage("numpy");

A few important behaviors to know:

Dependencies are handled automatically when you load from the official Pyodide repository. If a package depends on other packages, Pyodide will load them too.
You can load multiple packages at once by passing a list:

await pyodide.loadPackage(["numpy", "matplotlib"]);

loadPackage() returns a Promise that resolves once everything is loaded, so you typically do:

const pyodide = await loadPyodide();
await pyodide.loadPackage("matplotlib");
// matplotlib is now available

In general, loading a package twice is not permitted. (So it’s best to keep Pyodide as a singleton and track what you’ve loaded.)

Loading packages from custom URLs (advanced)

You can also load a wheel directly from a URL:

await pyodide.loadPackage(
  "https://foo/bar/numpy-1.22.3-cp310-cp310-emscripten_3_1_13_wasm32.whl",
);

Two gotchas:

The filename must be a valid wheel name.
No dependency resolution happens for custom URLs. If you want dependency resolution for wheels from arbitrary URLs (or from PyPI), you’ll typically use micropip instead. It is a lightweight package installer for Pyodide that lets you install pure-Python packages (typically from PyPI) in the browser, and it can also handle dependency resolution when installing packages from custom wheels or URLs.

Where packages are stored (and why subsequent loads are faster)

When you call pyodide.loadPackage(...), Pyodide fetches the required package artifacts (and dependencies) from the configured indexURL (CDN or your local /assets folder if you bundled it with Vite).

From there, two kinds of caching typically help:

Browser HTTP cache The downloaded assets (e.g., .js, .data, .wasm, package files) are cached by the browser according to normal HTTP caching rules. This usually means the second page load is significantly faster than the first.
In-memory runtime state (per page load) Within a single session (while the tab is open), once packages are loaded into the Pyodide runtime, they’re immediately available for subsequent Python calls—no additional downloads needed.

If you want “offline-ish” behavior and long-lived caching guarantees, you can pair this with a Service Worker (e.g., via a PWA setup) to precache /assets or CDN resources. But even without that, normal browser caching already provides a nice speed-up after the first run.

Using numpy and matplotlib to visualize revenue data

Full source code for this demo is available here: https://github.com/ilich/demo-pyodide If you want to follow along with a working version, the repo includes the full Vite + React + TypeScript setup plus the Python script and hooks.

Now for the fun part: running real Python data processing in the browser and rendering a Matplotlib chart in a React component.

The goal of this section is:

accept raw CSV text from the UI
process it in Python using csv + numpy
generate a chart using matplotlib
return the chart as an SVG string
render the SVG inside React

Python: parse CSV, aggregate revenue, generate an SVG plot

We’ll keep the Python logic simple and self-contained. It reads CSV text, sums revenue per industry using numpy, and uses Matplotlib to generate a bar chart. Instead of writing a file, it returns the rendered chart as an SVG string:

import csv
import io
import numpy as np
import matplotlib.pyplot as plt


def plot_revenue_by_industry(csv_text: str) -> str:
    # Read data from CSV using csv reader
    data = []
    reader = csv.reader(io.StringIO(csv_text))
    _ = next(reader)  # Skip header

    for row in reader:
        company_name, industry, revenue = row
        data.append((company_name, industry, float(revenue)))

    # Load data to numpy
    np_data = np.array(data, dtype=object)

    industries = np_data[:, 1]
    revenues = np_data[:, 2].astype(float)

    # Sum revenue by industry
    unique_industries = np.unique(industries)

    industry_revenue_sum = []
    for ind in unique_industries:
        total = revenues[industries == ind].sum()
        industry_revenue_sum.append((ind, total))

    industry_revenue_sum = np.array(industry_revenue_sum, dtype=object)

    # Plot with bar chart using matplotlib
    plt.figure(figsize=(8, 5))
    plt.bar(industry_revenue_sum[:, 0], industry_revenue_sum[:, 1].astype(float))
    plt.xlabel("Industry")
    plt.ylabel("Total Revenue")
    plt.title("Total Revenue by Industry")
    plt.xticks(rotation=30)
    plt.tight_layout()

    # Save to SVG and return
    svg_io = io.StringIO()
    plt.savefig(svg_io, format='svg', bbox_inches='tight')
    svg_string = svg_io.getvalue()
    plt.close()

    return svg_string

Returning SVG makes the browser integration painless: the output is just a string that React can render directly.

React Hook: initialize Pyodide once and reuse it everywhere

Here’s the key: a usePyodide() hook that wraps initialization, package loading, and shared caching.

What it does

Uses a module-level singleton (pyodideInstance) so the runtime is created only once.
Uses a shared promise (pyodideLoadingPromise) so if multiple components mount at the same time, they don’t trigger multiple downloads.
Loads numpy and matplotlib exactly once.

import { useEffect, useState } from "react";
import { loadPyodide } from "pyodide";
import type { PyodideInterface } from "pyodide";

let pyodideInstance: PyodideInterface | null = null;
let pyodideLoadingPromise: Promise<PyodideInterface> | null = null;

async function getPyodide(): Promise<PyodideInterface> {
  if (pyodideInstance) {
    return pyodideInstance;
  }

  if (pyodideLoadingPromise) {
    return pyodideLoadingPromise;
  }

  pyodideLoadingPromise = loadPyodide({
    indexURL: "https://cdn.jsdelivr.net/pyodide/v0.29.3/full/",
  }).then(async (pyodide) => {
    await pyodide.loadPackage(["numpy", "matplotlib"]);
    pyodideInstance = pyodide;
    return pyodide;
  });

  return pyodideLoadingPromise;
}

interface UsePyodideResult {
  pyodide: PyodideInterface | null;
  loading: boolean;
  error: string | null;
}

export function usePyodide(): UsePyodideResult {
  const [pyodide, setPyodide] = useState<PyodideInterface | null>(
    pyodideInstance,
  );
  const [loading, setLoading] = useState(!pyodideInstance);
  const [error, setError] = useState<string | null>(null);

  useEffect(() => {
    if (pyodideInstance) {
      return;
    }

    let cancelled = false;

    getPyodide()
      .then((instance) => {
        if (!cancelled) {
          setPyodide(instance);
          setLoading(false);
        }
      })
      .catch((err) => {
        if (!cancelled) {
          setError(err.message || "Failed to load Pyodide");
          setLoading(false);
        }
      });

    return () => {
      cancelled = true;
    };
  }, []);

  return { pyodide, loading, error };
}

When pyodide.loadPackage(["numpy", "matplotlib"]) runs:

Pyodide downloads the package artifacts from the indexURL (in this case, the jsDelivr CDN).
Those files are cached client-side using normal browser HTTP caching rules, so repeat visits are typically much faster.
Within a single page session, once the packages are loaded into the Pyodide runtime, they’re available immediately for all later computations (no re-download).

This is why the singleton hook pattern matters: you get both runtime reuse (in-memory) and download reuse (browser cache).

React Component: run Python and render the returned SVG

The React component calls the Python function and renders the SVG returned by Matplotlib:

It waits for usePyodide() to finish loading
It runs your Python script (loaded as a raw string via Vite ?raw)
It calls plot_revenue_by_industry(csvData)
It injects the returned SVG into the DOM

import { useEffect, useState } from "react";
import { Loader } from "./Loader";
import { Alert } from "react-bootstrap";
import { usePyodide } from "../hooks/usePyodide";
import revenueScrpt from "../assets/revenue.py?raw";

interface RevenueChartProps {
  csvData: string | null;
}

export const RevenueChart = ({ csvData }: RevenueChartProps) => {
  const {
    pyodide,
    loading: pyodideLoading,
    error: pyodideError,
  } = usePyodide();

  const [calculating, setCalculating] = useState(false);
  const [calcError, setCalcError] = useState<string | null>(null);
  const [chartData, setChartData] = useState<string | null>(null);

  useEffect(() => {
    const calculateRevenue = async () => {
      if (!csvData || !pyodide) return;

      setCalculating(true);
      try {
        await pyodide.runPythonAsync(revenueScrpt);

        const plotRevenueByIndustry = pyodide.globals.get(
          "plot_revenue_by_industry",
        );

        const svg: string = plotRevenueByIndustry(csvData);
        setChartData(svg);
        setCalcError(null);
      } catch (error) {
        console.error("Error running Python code:", error);
        setChartData(null);
        setCalcError("Failed to run Python code");
      } finally {
        setCalculating(false);
      }
    };

    calculateRevenue();
  }, [csvData, pyodide]);

  if (pyodideLoading || calculating) {
    return <Loader visible />;
  }

  if (!csvData) {
    return <div>Please provide CSV data to analyze.</div>;
  }

  const error = pyodideError || calcError;
  if (error) {
    return <Alert variant="danger">Error: {error}</Alert>;
  }

  return (
    <div>
      <div dangerouslySetInnerHTML={{ __html: chartData || "" }} />
    </div>
  );
};

A note about dangerouslySetInnerHTML

Because Matplotlib returns raw SVG markup, the simplest rendering approach is injecting it as HTML. This is fine here because:

the SVG is generated by your own Python code
the input is structured CSV (still: treat user-provided CSV as untrusted in real apps)

If you ever render SVG/HTML generated from untrusted input, you should sanitize it first!

Conclusion

Pyodide makes it surprisingly practical to run Python directly inside a modern frontend app. With React + TypeScript + Vite, we can load a full Python runtime in the browser, install scientific packages like numpy and matplotlib, and use them to perform real computations and generate visualizations — all without a backend.

This pattern is not just a demo — it unlocks a lot of useful browser-native workflows:

Interactive data visualization dashboards Let users upload CSV files and instantly explore insights, plots, and analytics — without sending data to a server.
Privacy-first / offline-first analytics tools Great for sensitive datasets (finance, healthcare, internal company reports) where you want computations to run entirely client-side.
Education and tutorials Build Python-powered playgrounds directly into web pages: data science lessons, statistics examples, “try it” notebooks, etc.
Scientific or engineering calculators Some organizations already have reliable Python implementations — Pyodide lets you reuse those models directly in the browser.
Replacing backend microservices for lightweight compute For certain tasks (format conversion, data cleaning, small statistical models), running Python in the client can remove infrastructure complexity.

If you want to explore further, the next natural improvements could be:

building a more generic “Python runner” abstraction (not tied to revenue charts)
adding progress events while packages load
caching computed results
using micropip to install additional pure-Python packages dynamically

And if you want to see the full working code, you can find it here: https://github.com/ilich/demo-pyodide

References

Why Your ECS Service isn't Routing Traffic: Lessons from a Target Group Registration Issue

Ilya Verbitskiy — Sat, 07 Jun 2025 00:00:00 GMT

Introduction

Recently, I ran into an interesting issue while deploying a new container to an Amazon ECS service using our CI/CD pipeline. Everything looked great on the surface—the pipeline executed successfully, the ECS service picked up the new task definition, and the container was reported as healthy.

But despite all signs pointing to a successful deployment, something was off: our application wasn't receiving any traffic.

This blog post walks through the symptoms, root cause, and resolution—focusing on the crucial step of registering a target group with an ECS service.

Problem Recap

Here's what happened step-by-step:

Our CI/CD pipeline deployed a new container image to ECS using a Fargate service.
ECS successfully launched a new task using the updated task definition.
The task entered a RUNNING state and passed its health checks.
However, the application behind the ALB (Application Load Balancer) wasn't responding.

There were no errors in the logs and no failed tasks. But when I checked the ALB Target Group, I noticed something critical: the target group health checks were failing.

Detective Work: Identifying the Root Cause

When I realized our ECS deployment wasn't routing traffic, my first thought was to confirm the service and task statuses. Using the AWS CLI, I began a step-by-step investigation.

Step 1: Verify ECS Service Health

I listed all ECS services in the cluster to confirm that my target service, Report-Service, was active:

aws ecs list-services \
  --cluster EcsClusterAAAAAAAA-000000000000

{
  "serviceArns": [
    "arn:aws:ecs:us-east-1:000000000000:service/EcsClusterAAAAAAAA-000000000000/Report-Service"
  ]
}

✅ Result: The service Report-Service was listed, confirming it exists and is managed by the cluster.

Step 2: Check Task Status

Next, I checked if the service had launched any tasks:

aws ecs list-tasks \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --service-name Report-Service

✅ Result: A task was running as expected:

{
  "taskArns": [
    "arn:aws:ecs:us-east-1:000000000000:task/EcsClusterAAAAAAAA-000000000000/cdb13d4a4fa2439584114ad90d5aab9a"
  ]
}

To be thorough, I described the task to confirm its runtime status:

aws ecs describe-tasks \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --tasks "arn:aws:ecs:us-east-1:000000000000:task/EcsClusterAAAAAAAA-000000000000/cdb13d4a4fa2439584114ad90d5aab9a" \
  --query "tasks[*].[taskArn, lastStatus]"

✅ Result: Status was RUNNING — so far, so good:

[
  [
    "arn:aws:ecs:us-east-1:000000000000:task/EcsClusterAAAAAAAA-000000000000/cdb13d4a4fa2439584114ad90d5aab9a",
    "RUNNING"
  ]
]

Step 3: Check ALB Target Group Health

Now it was time to look at the target group attached to the Application Load Balancer (ALB). First, I listed all target groups:

aws elbv2 describe-target-groups \
  --query 'TargetGroups[*].[TargetGroupArn, TargetGroupName]'

I located the target group named reports-service, which was expected to be associated with the Report-Service:

[
  [
    "arn:aws:elasticloadbalancing:us-east-1:000000000000:targetgroup/reports-service/e4869714669ce12f",
    "reports-service"
  ]
]

Then, I checked the target health status:

aws elbv2 describe-target-health \
  --target-group-arn "arn:aws:elasticloadbalancing:us-east-1:000000000000:targetgroup/reports-service/e4869714669ce12f"

❌ Result: This indicated that the task was not responding to health checks, which usually means the ALB can't reach the container on the expected port.

{
  "TargetHealthDescriptions": [
    {
      "Target": {
        "Id": "10.1.28.89",
        "Port": 8080,
        "AvailabilityZone": "us-east-1a"
      },
      "HealthCheckPort": "8080",
      "TargetHealth": {
        "State": "unhealthy",
        "Reason": "Target.Timeout",
        "Description": "Request timed out"
      },
      "AdministrativeOverride": {
        "State": "no_override",
        "Reason": "AdministrativeOverride.NoOverride",
        "Description": "No override is currently active on target"
      }
    }
  ]
}

Step 4: Is the Target Group Even Registered with ECS?

At this point, I suspected a misconfiguration—maybe the service wasn't even hooked up to the target group. To confirm, I ran:

aws ecs describe-services \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --services Report-Service \
  --query "services[0].loadBalancers" \

❌ Result: [] (an empty array)

That was the smoking gun.

Despite having a healthy ECS task and a defined target group, the ECS service wasn't actually registered with the target group at all. As a result, the task wasn't formally added as a target behind the load balancer.

Fixing the Issue: Registering the ECS Service with a Target Group

Once I realized that the ECS service wasn't associated with any load balancer target group, I knew exactly what had to be done: manually register the service with the correct target group. Here's how I did it, step by step.

Step 1: Identify the Container Name

aws ecs describe-tasks \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --tasks "arn:aws:ecs:us-east-1:000000000000:task/EcsClusterAAAAAAAA-000000000000/cdb13d4a4fa2439584114ad90d5aab9a" \
  --query "tasks[*].containers[*].[name]"

✅ Result:

[[["report-service"]]]

The container name is report-service.

Step 2: Determine the Container Port

Since the ECS service is running on Fargate, the container port (used by the ALB) must be retrieved from the task definition:

First, get the task definition ARN:

aws ecs describe-tasks \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --tasks "arn:aws:ecs:us-east-1:000000000000:task/EcsClusterAAAAAAAA-000000000000/cdb13d4a4fa2439584114ad90d5aab9a" \
  --query "tasks[*].taskDefinitionArn"

✅ Result:

["arn:aws:ecs:us-east-1:000000000000:task-definition/Report-Service-Task:5"]

Now, describe the task definition:

aws ecs describe-task-definition \
  --task-definition "arn:aws:ecs:us-east-1:000000000000:task-definition/Report-Service-Task:5" \
  --query "taskDefinition.containerDefinitions[*].[name, portMappings[*].containerPort]"

✅ Result:

[["report-service", [8080]]]

So the container listens on port 8080.

Step 3: Register the Target Group with the ECS Service

With the targetGroupArn, containerName, and containerPort in hand, I was ready to associate the target group with the ECS service:

aws ecs update-service \
  --cluster EcsClusterAAAAAAAA-000000000000 \
  --service Report-Service \
  --load-balancers "[
    {\"targetGroupArn\": \"arn:aws:elasticloadbalancing:us-east-1:000000000000:targetgroup/reports-service/e4869714669ce12f\", \"containerName\": \"report-service\", \"containerPort\": 8080}
  ]"

⚠️ Tip: Be very careful with JSON formatting in the --load-balancers argument. Even minor formatting issues—like a stray space before the ARN—can result in cryptic errors such as:

An error occurred (InvalidParameterException) when calling the UpdateService operation:
Unable to assume role and validate the specified targetGroupArn.
Please verify that the ECS service role being passed has the proper permissions.

This error isn't always clear, so make sure:

The ARN is exact (no extra spaces).
The IAM service role has permissions to register targets with the load balancer.

Step 4: Confirm the Fix

Once the service was updated, I re-checked the target group health:

aws elbv2 describe-target-health \
  --target-group-arn "arn:aws:elasticloadbalancing:us-east-1:000000000000:targetgroup/reports-service/e4869714669ce12f"

This time, the target transitioned from unhealthy to healthy, and traffic started flowing as expected.

This debugging experience was a great reminder that a successful ECS deployment doesn't always mean a functional application. Even when tasks are running and marked healthy, traffic won't flow unless the ECS service is correctly registered with a target group.

The key takeaways are:

Always verify that your ECS service includes a loadBalancers configuration when using an ALB.
Ensure the correct containerName and containerPort are specified based on the task definition.
Check target group health status directly—ECS won't warn you if your service isn't properly attached.
Be meticulous with AWS CLI JSON formatting—small errors can trigger misleading messages.

Being thorough with these checks can save hours of head-scratching and get your applications routing traffic as expected.

Ilya Verbitskiy

API Rate Limiting by IP in ASP.NET Core

API Rate Limiting by IP with NGINX

Building AI-Native Engineering Teams Without Losing Engineering Discipline

How small, senior teams can ship in weeks — without turning the codebase into a liability

AI changes engineering velocity, but not engineering responsibility.

Faster teams can create architectural debt faster

What engineering discipline still means

Domain-Driven Design becomes the language between humans and AI

Spec-driven development turns intent into a contract

Test-driven development becomes non-negotiable

The AI-native development lifecycle

Context engineering is the infrastructure most teams are missing

Team design: small, senior, specialized, and highly accountable

Guardrails allow autonomy

Shipping in weeks, not months

Metrics for AI-native engineering

AI-native does not mean undisciplined

Stop Burning Tokens: A Practical Guide to Using Claude and Claude Code Efficiently

1. The core idea: tokens are not just what you type

2. Why token discipline matters for business users

3. Why token discipline matters even more in Claude Code

4. Track usage before optimizing blindly

5. Model choice: use the right model for the job

6. Use plan mode before expensive implementation

7. Manage context proactively: /clear, /compact, /resume, and /rename

8. Keep CLAUDE.md small, stable, and useful

9. Move specialised instructions into skills

10. Reduce MCP server overhead

11. Use code intelligence plugins for typed languages

12. Delegate verbose work to subagents carefully

13. Be careful with agent teams

14. Offload preprocessing to hooks and scripts

15. Write specific prompts

16. Control output length

17. Work incrementally and test early

18. Course-correct early

19. Use documents and attachments carefully

20. Understand extended thinking

21. Team-level governance for Claude Code usage

22. Recommended personal workflow for developers

At project setup

At the start of a task

During implementation

Between phases

At the end

23. Recommended workflow for business users

Start with the outcome

Work in sections

Reduce documents before uploading

Avoid endless chat drift

Ask for concise outputs

Preserve reusable context intentionally

24. Common mistakes and better alternatives

Mistake 1: Keeping one endless conversation

Mistake 2: Bloated CLAUDE.md

Mistake 3: Using Opus for everything

Mistake 4: Asking Claude Code to “look around”

Mistake 5: Pasting full logs

Mistake 6: Regenerating whole documents

Mistake 7: Too many MCP servers

Mistake 8: Using subagents for tiny tasks

Mistake 9: Waiting too long to correct Claude

Mistake 10: Not measuring usage

25. The operating model: spend tokens where they create value

26. Practical checklist

Before starting

During work

After a phase

For business writing

Conclusion

Debugging DllNotFoundException on Linux and Containers or DLL Hell in 2023

Extract text from images using Amazon Textract

Summary

References

How to pass the AWS Certified Solutions Architect - Associate (SAA-C02) exam

How to Pass the AWS Cloud Practitioner Exam

The fastest way to expose AWS Lambda to Internet via Function URL

Howto extend EBS volume on Amazon Linux 2

Lightweight search for .NET Core

7. Manage context proactively: `/clear`, `/compact`, `/resume`, and `/rename`

8. Keep `CLAUDE.md` small, stable, and useful

Mistake 2: Bloated `CLAUDE.md`