# Sovereign AI and Indian legal tech: why data residency matters

# Sovereign AI and Indian legal tech: why data residency matters

**TL;DR:** Sovereign AI means keeping the data, models, and compute behind an AI system under domestic control. For law, that question is sharper than for most industries, because the inputs are privileged client communications, confidential case files, and court records. India now has the pieces to make a domestic legal-AI stack credible: the [DPDP Act 2023](https://www.dpdpa.com/dpdpa2023/chapter-4/section16.html) with its 2025 Rules notified in November 2025, the IndiaAI Mission with more than 34,000 GPUs of common compute and indigenous foundation models from [Sarvam AI](https://www.sarvam.ai/blogs/indias-sovereign-llm), BharatGen and Gnani.ai unveiled in February 2026, and a string of court orders warning lawyers off general-purpose chatbots. This piece walks through what sovereign AI means for legal work, why legal data is uniquely sensitive, where Indian law on cross-border transfer actually stands, and why an India-hosted, India-law-trained stack is becoming the default expectation rather than a nice-to-have.

---

## On this page

- [What sovereign AI actually means](#what-sovereign-ai-actually-means)
- [Why legal data is uniquely sensitive](#why-legal-data-is-uniquely-sensitive)
- [The lawyer's real worry: feeding privileged data to a foreign model](#the-lawyers-real-worry)
- [Where Indian law on cross-border transfer stands](#where-indian-law-stands)
- [The IndiaAI Mission and indigenous foundation models](#the-indiaai-mission)
- [What data residency means for a legal SaaS product](#data-residency-for-legal-saas)
- [Why an India-law-trained stack matters, not just India-hosted](#india-law-trained-stack)
- [The courts have already drawn lines](#courts-have-drawn-lines)
- [The global picture: EU, US, and the localisation wave](#the-global-picture)
- [What a privileged file leaking abroad actually costs](#what-a-leak-costs)
- [Indian languages, Indian context, and why grounding wins](#indian-languages-and-grounding)
- [Adopting sovereign AI without breaking your practice](#adopting-without-breaking)
- [What a sovereign legal-AI stack looks like in practice](#sovereign-stack-in-practice)
- [A buyer's checklist for law firms](#buyers-checklist)
- [Where Niyam fits](#where-niyam-fits)
- [Frequently asked questions](#frequently-asked-questions)
- [Start for ₹100](#start-for-100)

---

## What sovereign AI actually means

Sovereign AI is a strategy, not a product feature. The idea is that the data an AI system reads, the model that processes it, and the compute it runs on all stay under the control of a single jurisdiction rather than scattered across servers and companies in other countries. As one industry overview puts it, sovereign AI keeps "AI data, models, and computing resources under national or regional control," which often means running separate infrastructure for each jurisdiction you operate in ([Mind Foundry](https://www.mindfoundry.ai/blog/ai-regulations-around-the-world)).

For a long time the phrase sounded like a slogan for government procurement decks. That has changed. India has spent 2024 and 2025 building the actual machinery: a national compute pool, a funded foundation-model programme, and a data-protection law with cross-border controls. The pieces now exist, which is why the conversation has moved from "should we care about sovereignty" to "what does a sovereign stack need to include."

It helps to separate three layers, because a vendor can be sovereign at one and not the others.

| Layer | What sovereignty means here | Common gap |
|-------|------------------------------|------------|
| Data | Where the inputs, outputs, and logs are stored and who can access them | Vendor hosts in India but routes inference to a US API |
| Model | Who built and controls the weights, and where they run | App is "Indian" but wraps a foreign closed model |
| Compute | Whose hardware runs the maths, in which country | GPU calls leave the country even if the front end does not |

A product can claim to be Indian and still send every legal query to a model hosted abroad. So sovereignty is best read as a question you ask layer by layer, not a badge a vendor wears.

## Why legal data is uniquely sensitive

Most data-protection debate is about personal data in the abstract: names, emails, location, spending. Legal data carries all of that plus three properties that raise the stakes.

First, **privilege**. Communications between a lawyer and a client for the purpose of legal advice are protected from disclosure. In India that protection sits in Sections 126 to 129 of the old Evidence Act and now in the corresponding provisions of the Bharatiya Sakshya Adhiniyam, 2023. Privilege is fragile. It can be waived by disclosure to a third party. The open question that lawyers keep raising is whether pasting a privileged document into a third-party AI service counts as that kind of disclosure. As [Artificial Lawyer](https://www.artificiallawyer.com/2025/07/28/chatgpt-has-no-legal-privilege-is-this-a-problem/) noted in mid-2025, a general chatbot "does not have legal privilege or confidentiality protections" the way a law firm does.

Second, **confidentiality as a professional duty**. Even where privilege does not strictly apply, an advocate owes a duty of confidence to the client. That duty does not pause because a tool is convenient.

Third, **court records and personal data of third parties**. A single case file can contain medical records, financial statements, identities of minors, witness details, and more. Each of those is personal data of someone who never agreed to be processed by an AI vendor.

Put together, a law firm's working set is one of the most sensitive data categories in the economy, and it is exactly the data a legal-AI tool needs to read to be useful. That tension is the whole reason data residency has become a board-level question for legal technology.

## The lawyer's real worry: feeding privileged data to a foreign model

Ask practising lawyers what stops them adopting AI and the answer is rarely accuracy alone. It is the discomfort of typing a client's confidential facts into a box and not knowing where those words travel.

The concern is well founded. When you paste a confidential document into a public chatbot, you are potentially handing it to the operator and, depending on settings, contributing to model training. The [San Francisco Bar Association](https://www.sfbar.org/blog/heads-up-new-chatgpt-privacy-concerns-for-lawyers-and-legal-staff/) and [LeanLaw](https://www.leanlaw.co/blog/what-are-the-data-privacy-implications-of-using-ai-tools-with-confidential-client-information/) have both flagged that inputting client material into public AI tools is a serious data-security risk. The American Bar Association has published guidance on the intersection of attorney-client privilege and generative AI for the same reason.

In India the worry sharpened after the DPDP Act passed in 2023, because for the first time there was a statutory framework around how personal data, including the personal data buried in case files, can be processed and transferred. The [iPleaders 2026 guide](https://blog.ipleaders.in/ai-tools-for-lawyers-in-india-the-2026-definitive-guide/) records that questions about feeding privileged client material into a foreign chatbot "had begun to surface following the enactment of India's Digital Personal Data Protection Act, 2023."

There is also a practical accuracy problem layered on top. General models are unreliable on Indian case law and statutory section numbers, which they frequently invent. So the lawyer faces two risks at once: the confidentiality risk of where the data goes, and the integrity risk of what comes back. A sovereign, India-law-trained stack is an attempt to reduce both at the same time.

## Where Indian law on cross-border transfer stands

This is where a lot of marketing copy gets the law wrong, so it is worth being precise.

The DPDP Act, 2023 governs cross-border transfer mainly through **Section 16**, which says personal data may be transferred to any country except those the Central Government restricts. The [DPDP Rules, 2025](https://en.wikipedia.org/wiki/Digital_Personal_Data_Protection_Rules,_2025) were notified on 13 November 2025 via Gazette G.S.R. 846(E), and **Rule 15** operationalises this.

The model India chose is a **negative list**, not a whitelist. As [MediaNama](https://www.medianama.com/2025/11/223-dpdp-rules-cross-border-data-transfers/) and the [K&S analysis](https://ksandk.com/data-protection-and-data-privacy/indias-new-cross-border-data-transfer-framework/) explain, transfers are permitted to any jurisdiction by default unless the government expressly prohibits that destination. This is the opposite of the EU's adequacy approach, where transfers are blocked until a country is approved.

So at the level of the DPDP Act, India does **not** impose blanket data localisation on legal data. That is the honest position, and any vendor claiming the DPDP Act forces all legal data to stay in India is overstating it.

But two things complicate the picture, and both push towards in-country hosting in practice.

| Factor | Effect |
|--------|--------|
| Sectoral rules survive the DPDP Act | Section 16(2) preserves stricter transfer restrictions under other laws. The RBI's payment data localisation, for example, prevails over the DPDP Act, per [BusinessToday](https://www.businesstoday.in/opinion/columns/story/indias-new-data-protection-law-heres-how-the-landmark-law-can-impact-the-countrys-financial-services-sector-395368-2023-08-24) and [Cyril Amarchand](https://corporate.cyrilamarchandblogs.com/2024/05/need-for-syncing-sectoral-regulations-with-data-protection-law/) |
| Significant Data Fiduciaries face higher duties | Larger processors get extra obligations, so a firm handling large volumes of sensitive client data may prefer in-country processing to simplify compliance |

The net is this. The law does not strictly mandate that every legal query stay in India. But the combination of professional-conduct duties, surviving sectoral rules, client expectation, and the simple desire to never have to argue about where a privileged document went makes in-country hosting the path of least resistance for serious legal-AI deployment. Sovereignty here is driven as much by risk posture as by black-letter law.

For more on the rules themselves, see our explainers on [the DPDP Rules 2025](/blog/dpdp-rules-2025) and [the consent manager framework](/blog/dpdp-consent-manager-framework).

## The IndiaAI Mission and indigenous foundation models

A sovereign legal stack needs more than a hosting choice. It needs Indian models and Indian compute to run them on. For years that was the missing piece. It is no longer.

The **IndiaAI Mission**, a roughly ₹10,300 crore national initiative led by MeitY and approved in March 2024, was built to lay down exactly this base layer ([RICE IAS](https://riceias.com/indiaai-mission-building-inclusive-and-sovereign-artificial-intelligence-in-india/)). Two outputs matter most for legal tech.

**Compute.** India's common compute capacity crossed 34,000 GPUs by mid-2025, according to a [Press Information Bureau release](https://www.pib.gov.in/PressReleasePage.aspx?PRID=2132817&reg=3&lang=2), with subsidised access reported around ₹65 per GPU hour. That brings the cost of training and serving Indian models within reach of startups rather than only the largest firms.

**Foundation models.** This is the headline shift. Under the Mission, the government selected startups to build indigenous foundation models trained on India-specific data. At the **India AI Impact Summit in February 2026**, three were unveiled, built by **Sarvam AI, BharatGen and Gnani.ai** ([Free Press Journal](https://www.freepressjournal.in/tech/ai-summit-2026-meet-the-3-sovereign-ai-llm-models-that-were-unveiled-in-delhi-to-rival-global-tech-giants), [Organiser](https://organiser.org/2026/02/19/340755/bharat/bharat-unveils-indigenous-ai-powerhouse-sarvam-gnani-ai-and-bharatgen-lead-the-sovereign-ai-revolution/)).

Sarvam's own announcement and reporting around it describe two models launched on 18 February 2026: a 30-billion-parameter model named Vikram with a 32,000-token context window for real-time conversation, and a larger 105-billion-parameter model with a 128,000-token window for heavier reasoning. Both are reported as trained from scratch in India on domestic compute, not fine-tuned wrappers over foreign weights, with support across all 22 scheduled Indian languages ([Sarvam AI](https://www.sarvam.ai/blogs/indias-sovereign-llm), [Sify](https://www.sify.com/ai-analytics/all-about-indias-indigenous-ai-llm-models-can-they-help-tackle-bias/)).

Note the naming. There is a lot of loose talk about "BharatGPT," but the verified mission-backed effort is **BharatGen**, not a product called BharatGPT. Where you see BharatGPT in marketing, treat it with care; the documented sovereign initiatives are Sarvam, BharatGen and Gnani.ai.

Why does this matter for law? Because a foundation model trained on Indian-language data, Indian context, and run on Indian compute removes the single biggest objection to sovereign legal AI, which was that there were no credible domestic models to build on. That objection is now weaker than it has ever been.

## What data residency means for a legal SaaS product

"Data residency" gets used loosely. For a legal SaaS tool it has a specific, checkable meaning across the lifecycle of a single query.

| Stage | The residency question |
|-------|------------------------|
| Upload | When a lawyer uploads a brief, where does the file land? |
| Storage | Where do the document, the vector index, and metadata live? |
| Inference | When the model reads the query, on whose servers and in which country does that run? |
| Logging | Are prompts and outputs logged, where, and for how long? |
| Training | Is the firm's data ever used to improve a shared model? |
| Sub-processors | Which third parties touch the data, and are they in India? |

A product can pass the first two and fail the third. Plenty of "Indian" legal tools store documents in an Indian bucket but call a foreign model API for the actual answer, which means the privileged text crosses the border at the exact moment it matters most. Inference residency is the stage that is easiest to gloss over and most important to verify.

The cleanest residency posture for legal work is: upload, storage, index, inference, and logging all in India; no training on customer data; and a published list of sub-processors with their locations. Anything short of that leaves a gap a careful general counsel will find.

## Why an India-law-trained stack matters, not just India-hosted

Hosting in India answers the confidentiality question. It does not, on its own, answer the accuracy question. And in legal work, accuracy is not a nice-to-have. A wrong section number or an invented case can cost a client and embarrass an advocate.

This is where training data does the heavy lifting. A model trained mostly on foreign legal text and general web data will be fluent and confident about Indian law while being wrong about it. It will guess at section numbers, misattribute ratios, and, worst of all, fabricate citations that look perfectly formatted. The cure is not a cleverer general model. It is grounding answers in the actual corpus of Indian judgments and statutes, so the tool retrieves real text rather than predicting plausible text.

So a genuinely sovereign legal stack has two halves that have to work together.

- **Residency** keeps the privileged input in India and out of foreign training pipelines.
- **Grounding in Indian law** keeps the output tied to real judgments and real statutory text, which is the only durable defence against the hallucinated-citation problem the courts are now sanctioning.

You need both. India-hosted but ungrounded gives you a confidential way to get a wrong citation. Grounded but foreign-hosted gives you an accurate answer that leaked a privileged file. The point of sovereignty for law is to refuse that trade-off.

If you want the practical side of reading and trusting a judgment, our guides on [how to read a judgment](/blog/how-to-read-a-judgment) and [checking whether a case is still good law](/blog/good-law-checking) cover the verification habits that pair with grounded tools.

## The courts have already drawn lines

Indian courts have not waited for a perfect regulatory framework. They have started ruling, and the direction is unmistakable.

**Kerala High Court, July 2025.** The Kerala High Court issued the first formally binding AI policy from any Indian court, the *Policy Regarding Use of Artificial Intelligence Tools in District Judiciary*. It prohibits judges and staff from using general generative tools like ChatGPT and DeepSeek, warning that uploading facts, personal identifiers, or privileged communications to cloud-based AI "may result in a serious flouting of confidentiality." It also bars using AI to arrive at findings or judgments and requires an audit trail of any AI use ([MediaNama](https://www.medianama.com/2025/07/223-kerala-hc-bars-ai-tools-chatgpt-deepseek/), [policy PDF](https://images.assettype.com/theleaflet/2025-07-22/mt4bw6n7/Kerala_HC_AI_Guidelines.pdf)).

**Fabricated citations reach the highest court.** In December 2025, the Supreme Court was presented with a set of fabricated precedents in a corporate dispute; on examination, none of the "hundreds" of cited cases existed ([Bharatlaw](https://www.bharatlaw.ai/post/ai-generated-fake-case-law-supreme-court-india)). The Bombay High Court imposed costs of ₹50,000 on a party for inserting fake case law into submissions in January 2026, and the Delhi High Court saw a petition withdrawn in September 2025 after fabricated citations were exposed ([LiveLaw](https://www.livelaw.in/articles/phantom-precedents-ai-generated-case-law-indian-courts-526665)).

**The Supreme Court calls it misconduct.** A Bench of Justice P.S. Narasimha and Justice Alok Aradhe held that relying on non-existent, fake judgments would amount to "misconduct" with consequences, and the Court asked the Bar Council of India to form an expert committee on AI in litigation ([MediaNama](https://www.medianama.com/2026/03/223-supreme-court-ai-fake-case-laws-misconduct/), [Indian Masterminds](https://indianmasterminds.com/news/judiciary/ai-misuse-in-indian-courts-supreme-court-bci-fake-judgments-202430/)).

**The judiciary's own tools are deliberately assistive.** The Supreme Court's eCommittee runs SUPACE for research assistance and SUVAS for translating judgments into Indian languages. Both are framed as assistive only; SUPACE "does not make decisions," and SUVAS translations are reviewed within the judicial framework ([PIB](https://www.pib.gov.in/PressReleasePage.aspx?PRID=2226283&reg=48&lang=2)). The contrast is instructive. Even the courts that build AI treat it as a helper kept on a short leash, with humans owning every finding.

Read together, these orders say two things to the profession. Confidentiality is non-negotiable, so do not pour privileged material into tools that offer no protection. And accuracy is the advocate's responsibility, so a tool that grounds its answers in real, verifiable judgments is not a luxury but a way of meeting an existing duty.

## The global picture: EU, US, and the localisation wave

India's choices make more sense against the wider 2025-26 backdrop, where sovereignty has moved to the centre of AI policy almost everywhere.

The **EU AI Act** came into full effect in August 2025, with most high-risk obligations applying from 2 August 2026. Several legal-adjacent uses fall into its high-risk tier, which brings documentation, oversight, and quality duties ([Mind Foundry](https://www.mindfoundry.ai/blog/ai-regulations-around-the-world), [Morgan Lewis](https://www.morganlewis.com/pubs/2025/12/the-new-rules-of-ai-a-global-legal-overview)).

In the **United States**, Executive Order 14179 in January 2025 revoked the earlier 2023 order and shifted towards removing regulatory barriers, leaning on voluntary commitments and sector guidance rather than a single federal statute ([Anecdotes](https://www.anecdotes.ai/learn/ai-regulations-in-2025-us-eu-uk-japan-china-and-more)).

The bigger structural signal is **data localisation as a worldwide trend**. By late 2025, at least 34 countries had enacted or strengthened requirements restricting where AI processing can occur, and the collapse of the EU-US Data Privacy Framework left a gap in lawful EU-to-US transfer ([AI Magicx](https://www.aimagicx.com/blog/ai-data-sovereignty-cloud-strategy-legal-risks-2026)).

India sits between the EU's strict, approval-first model and the US's lighter touch. Its negative-list approach keeps cross-border flows open by default while reserving the right to clamp down, and it pairs that with a serious domestic compute and model push. For legal buyers, the practical takeaway travels across all three regimes: the safest posture is to keep the most sensitive data, privileged legal material, processed close to home.

The collapse of the EU-US Data Privacy Framework is worth dwelling on, because it is a cautionary tale rather than a foreign curiosity. Firms that had built their workflows on the assumption that data could move freely between Europe and US-based cloud AI suddenly found that the legal mechanism underneath them had gone, through no fault of their own. The lesson generalises. A cross-border transfer route that exists today can be invalidated by a court or restricted by a government tomorrow, and any practice that depends on data leaving the country is exposed to that risk without controlling it. In-country processing is, among other things, a hedge against the instability of international transfer law. The data that never crosses a border cannot be caught by the next ruling that closes one. For a profession where a single sensitive matter can run for years, that durability is not a small consideration; it is the difference between a stack you can rely on for the life of a case and one whose legal foundation can shift under you mid-matter.

| Jurisdiction | Cross-border stance for sensitive data |
|--------------|----------------------------------------|
| India (DPDP) | Open by default, negative list; sectoral rules can be stricter |
| EU (GDPR + AI Act) | Approval-first adequacy; legal uses may be high-risk |
| US | Lighter federal touch; state and sector rules vary |

## What a privileged file leaking abroad actually costs

It is easy to treat "data residency" as a compliance abstraction. For a lawyer it is not abstract at all. Walk through what actually happens when a privileged file ends up on a foreign server it was never meant to reach, and the stakes become concrete.

Start with the matter itself. Suppose a partner pastes a draft settlement note into a public chatbot to tidy the language. The note contains the client's true walk-away number and the weakness in its own case that justifies settling. Two things can go wrong at once. The text may be retained by the operator and, depending on the account settings, used to improve the model, which means the firm has lost control of where that sentence travels. And the act of disclosing privileged material to a third party can put the privilege itself at risk, because privilege can be waived by voluntary disclosure. The firm has not just taken a data-security risk; it may have handed an opponent an argument that the protection over that document is gone.

Now widen the lens to the third parties whose data sits inside the file. A family-law brief carries the private facts of a marriage and often a child. A criminal matter carries the identity of a complainant or a witness. A corporate dispute carries unpublished price-sensitive information. None of those people consented to be processed by an AI vendor in another country, and several categories of that data are exactly what sectoral regulators care most about. A leak here is not one breach but many, stacked inside a single document.

Then there is the question of proof and discovery. If a dispute later turns on whether confidential information was mishandled, a firm that can show every byte stayed on Indian infrastructure, with logs and access controls it can produce, is in a very different position from a firm that has to explain why a privileged file appears in a foreign vendor's retention system. Residency is not only a control that prevents the bad outcome; it is the evidence that lets you demonstrate you behaved carefully if anyone ever asks.

| Failure mode | Who is harmed | Why residency reduces it |
|--------------|---------------|--------------------------|
| Privilege waiver by disclosure | The client, directly | Privileged text never leaves Indian jurisdiction or the firm's control |
| Third-party personal data exposure | Witnesses, family members, counterparties | Personal data in the file is not shipped to an unknown processor abroad |
| Sectoral data sent offshore | The firm and its regulated client | In-country processing keeps RBI or SEBI-bound data home by default |
| Inability to prove careful handling | The firm's own defence | In-country logs and access records are producible evidence of care |

The pattern across all four rows is the same. The cost of a leak in legal work is rarely a fine alone. It is the loss of the very thing the lawyer was hired to protect, and it lands at the worst possible moment, in the middle of a live matter. That asymmetry, large downside, no upside to the data leaving, is why so many cautious firms have concluded that the residency question is not worth getting wrong even once.

## Indian languages, Indian context, and why grounding wins

There is a quieter argument for a sovereign, India-trained stack that has nothing to do with confidentiality and everything to do with whether the tool is actually any good at Indian law.

Indian legal work does not happen only in English. Pleadings, FIRs, lower-court orders, land records, and client instructions arrive in a mix of languages and scripts, and often in code-mixed forms like Hinglish that general models handle poorly. The indigenous foundation models now coming out of the IndiaAI Mission were built precisely for this. Sarvam's models are reported to support all 22 scheduled Indian languages and to have been trained on diverse domestic material including literature, financial records, newspapers, archival content, and mixed-language text ([Sarvam AI](https://www.sarvam.ai/blogs/indias-sovereign-llm)). The judiciary's own translation tool, SUVAS, exists because the demand for moving legal text across Indian languages is real and large; it had translated tens of thousands of Supreme Court judgments into Hindi and other regional languages well before the current model wave ([PIB](https://www.pib.gov.in/PressReleasePage.aspx?PRID=2226283&reg=48&lang=2)).

But language coverage alone does not make a tool accurate on Indian law. The deeper requirement is grounding. A model can speak fluent Hindi and still invent a Section number, because fluency and factual recall are different things. The reliable pattern is retrieval first, generation second: the system searches a real corpus of Indian judgments and statutes, pulls the actual passages, and constrains the answer to what those passages say, with links the user can open. That is the architectural opposite of the free-text prediction that produces phantom citations.

Consider the contrast in concrete terms. Ask a general foreign model for the leading authority on a point of Indian constitutional law and it may return a beautifully formatted citation with a plausible neutral cite and a confident one-line ratio, all of it fabricated. Ask a grounded, India-trained system the same question and the honest version retrieves the genuine judgment, shows you the paragraph it relied on, and lets you read the ratio yourself. The first feels faster. The second is the only one a lawyer can sign their name under.

This is why "India-hosted" and "India-trained" are not interchangeable, and why the strongest sovereign stacks insist on both. Residency keeps the input safe. Indian-language capability lets the tool read the real material a practice actually deals with. Grounding in the Indian corpus keeps the output truthful. Take any one of the three away and you have a tool that is either unsafe, illiterate in the languages of Indian practice, or untrustworthy on the law. The point of building India-first is to refuse all three failure modes together rather than trading one for another.

## Adopting sovereign AI without breaking your practice

None of this means a firm has to rip out its workflow overnight. Sovereign AI is best adopted the way careful firms adopt anything that touches client confidentiality: deliberately, in stages, with the riskiest uses gated first.

A sensible sequence looks less like a technology rollout and more like a policy decision followed by a controlled trial.

1. **Set the data rule before the tool.** Decide, in writing, that privileged and client-identifying material does not go into any tool that cannot keep it in India and out of training. This single rule, written down, resolves most of the hard cases in advance and gives juniors a clear line they cannot accidentally cross.
2. **Separate research from drafting in your head.** Using AI to find and read real authority is a different risk profile from pasting a live client document in for editing. In-country, grounded research can be adopted earlier and more widely; anything that involves uploading privileged content should wait for a tool that satisfies the data rule.
3. **Pilot on non-sensitive questions.** Test a candidate tool on pure questions of law with no client facts attached. This lets you judge accuracy, citation quality, and whether the citator actually catches overruled cases, without putting anything confidential at stake.
4. **Verify the citations every time, early on.** Treat the tool as a research assistant whose work you check, not an oracle. The court orders this year make the advocate responsible for accuracy regardless of how the citation was produced, so the verification habit is not optional; it is the duty restated.
5. **Keep a human owning every finding.** The judiciary's own posture, AI assists, the human decides, is the right default for practice too. The tool drafts, surfaces, and translates; the lawyer reasons, decides, and signs.

The reason a staged approach works is that it maps onto duties you already have. The data rule is the confidentiality duty made operational. The verification habit is the competence duty made operational. The human-owns-findings principle is the rule against delegating judgment made operational. Adopting sovereign AI well is mostly a matter of letting the tools you choose make those existing duties easier to keep rather than harder.

## What a sovereign legal-AI stack looks like in practice

Pulling the threads together, a stack that earns the word "sovereign" for legal work has a recognisable shape.

1. **In-country data plane.** Uploads, storage, vector indexes, and logs all live in India.
2. **In-country inference.** The model that reads a privileged query runs on Indian compute, so the text does not cross a border at answer time.
3. **Grounded answers.** Every response is tied to retrieved Indian judgments and statutes, with citations the user can open and check, rather than free-text prediction.
4. **A citator habit.** Tools flag whether a cited case is still good law, because retrieving a real case is only half the job.
5. **No training on client data.** Customer documents are not used to improve a shared model.
6. **Transparent sub-processors.** A published list of who touches the data and where they sit.
7. **Human ownership of findings.** The tool drafts and surfaces; the lawyer decides and signs.

None of this is exotic. It is what the court orders, the DPDP framework, and basic professional duty already point towards. Sovereignty is less a moonshot and more the disciplined version of what a careful firm would ask for anyway.

## A buyer's checklist for law firms

If your firm is evaluating a legal-AI tool, the residency and grounding questions can be reduced to a short interrogation.

| Question to ask the vendor | What a good answer sounds like |
|----------------------------|--------------------------------|
| Where is my data stored? | Named Indian region, for documents, indexes, and logs |
| Where does inference run? | On Indian infrastructure; the model is not a foreign API call |
| Is my data used for training? | No, customer data is excluded from model training |
| How do you prevent fake citations? | Answers are retrieved from a real Indian corpus and link to source |
| Do you have a citator? | Yes, cases are checked for current validity |
| Who are your sub-processors? | A published list with locations |
| What happens to logs? | Stated retention period and access controls in India |

If a vendor cannot answer the inference question crisply, treat the "Indian" label with caution. That single question separates real residency from a hosting badge.

## Where Niyam fits

Niyam is built India-first, which for us is a design constraint rather than a tagline. The product is meant to read Indian judgments and statutes, return answers grounded in that real corpus, and keep citations checkable, so the structural cause of hallucinated case law is reduced at source rather than papered over with a disclaimer. The aim is to let an advocate use AI without surrendering either confidentiality or accuracy, which, as the court orders this year make plain, are the two duties that do not bend.

We will not pretend that any single tool resolves every legal or ethical question raised by sovereign AI; the law itself is still settling, and the Bar Council's expert committee has yet to report. What we can say is that the direction of travel, towards in-country processing and grounded, verifiable answers, is the direction a careful Indian legal stack should be building in anyway.

## Frequently asked questions

**What does sovereign AI mean for a law firm?**
It means the data your AI tool reads, the model that processes it, and the compute it runs on stay under Indian control. For a firm, the practical version is simple: privileged client material does not leave the country at any stage of a query, and answers come from a real Indian legal corpus rather than a foreign general model.

**Does the DPDP Act require legal data to stay in India?**
Not as a blanket rule. The DPDP Act, 2023, through Section 16 and Rule 15 of the 2025 Rules, uses a negative-list model: transfers are allowed to any country unless the government restricts it. But sectoral rules can be stricter, Significant Data Fiduciaries carry extra duties, and professional-conduct obligations make in-country processing the safer default for privileged material.

**Is it unethical to put client information into ChatGPT?**
There is no Bar Council rule naming ChatGPT, but the duties of confidentiality and competence already apply. The Kerala High Court has warned that uploading privileged communications to general cloud AI risks flouting confidentiality, and bar bodies abroad have raised the same concern. The conservative reading, which most cautious practitioners now follow, is to avoid putting privileged material into tools that offer no confidentiality protection.

**Are there real Indian foundation models, or is it marketing?**
There are real ones. Under the IndiaAI Mission, Sarvam AI, BharatGen and Gnani.ai unveiled indigenous models at the India AI Impact Summit in February 2026. Sarvam reported two models trained from scratch in India with support across the 22 scheduled languages. Be careful with the name "BharatGPT," which is often used loosely; the documented mission-backed effort is BharatGen.

**Why do general AI tools invent Indian case citations?**
Because they predict plausible-looking text rather than retrieve real documents. Trained largely on foreign and general data, they are confident but unreliable on Indian section numbers and case names, so they generate citations that look correctly formatted but do not exist. Indian courts have started treating reliance on such fabricated cases as misconduct.

**Does hosting in India guarantee accuracy?**
No. Hosting answers the confidentiality question, not the accuracy question. Accuracy comes from grounding answers in real Indian judgments and statutes and from a citator that checks whether a case is still good law. A genuinely sovereign legal stack needs both in-country residency and grounded retrieval, not one or the other.

**What single question best tests a vendor's residency claim?**
"Where does inference run?" Many tools store documents in India but send the actual query to a foreign model API, which means the privileged text crosses the border at the moment it matters most. A vendor that runs inference on Indian infrastructure can answer this in one sentence.

## Start for ₹100

If you want an India-first legal research tool that keeps your work grounded in real Indian judgments and gives you citations you can open and verify, you can try Niyam without a large commitment. [Start for ₹100](https://app.niyam.ai/register) and see how grounded, in-country legal AI handles your own research questions. Bring a real query, check the citations, and judge it against the duties that already govern your practice.
