We Audited 7 AI Agent Frameworks for Identity Security. Here's What We Found.

By NullBridge · June 2026

Short answer: none of them handle it. Every major AI agent framework in 2026 — LangChain, CrewAI, AutoGen, LlamaIndex, Semantic Kernel, OpenAI Agents SDK, Pydantic AI — is excellent at helping developers build agents that reason, plan, and use tools. None of them governs the identity of the agents themselves.

That's not a criticism of the frameworks. They were built to solve orchestration problems, not IAM problems. But as organizations move agent deployments from prototype to production, the identity gap becomes a real security problem — and most teams don't realize it until they're already in production with ungoverned agents.

We evaluated each framework against five identity security criteria drawn from enterprise IAM practice:

  1. Per-agent credential management — does the framework issue, track, and manage credentials per agent?
  2. Credential rotation — is there a native mechanism to rotate agent credentials without redeployment?
  3. Behavioral anomaly detection — does the framework detect and alert on unusual agent behavior?
  4. Cross-system kill switch — can a deployed agent's access be revoked instantly across all integrations?
  5. Agent-to-agent delegation — when agents call other agents, is the delegation chain tracked and scoped?

Here's what we found for each framework.


1. LangChain / LangGraph

Identity security score: 1 / 5

LangChain is the most widely adopted agent framework by integration breadth, and LangGraph — its agent-specific library — has become a default choice for stateful, complex workflows where explicit control over branching and retries matters. LangGraph's graph-based execution model is the most auditable of any framework reviewed: you can trace exactly which node ran, in what order, with what state. That's valuable for debugging, but it's observability at the application layer, not identity governance.

Credential handling: LangChain expects credentials to be supplied by the developer via environment variables, configuration files, or secrets managers. There is no per-agent credential concept — credentials belong to the tool or integration, not the agent using it. If the same tool is used by ten different agents, all ten share the same credential with no individual attribution.

Kill switch: None. Stopping an agent requires terminating the process or removing the credential from the environment. There is no mechanism to instantly revoke an agent's access across all its active integrations simultaneously.

Anomaly detection: LangSmith (the observability companion) provides token usage, latency, and trace visibility. It does not provide identity-level behavioral monitoring — flagging when an agent accesses something outside its normal scope, or makes an unusually high volume of calls to a sensitive system.

Agent-to-agent delegation: LangGraph supports multi-agent graphs where agents call other agents. There is no standard delegation model — a sub-agent called by an orchestrator inherits the same environmental credentials, not a scoped delegated identity.


2. CrewAI

Identity security score: 1 / 5

CrewAI's role-based model — agents defined with a role, goal, and backstory collaborating on tasks — is the fastest path to a working multi-agent prototype. Its intuitive structure makes it easy for non-specialist developers to build agents quickly. That accessibility is also why it shows up most often in environments where security was an afterthought: agents get stood up fast, production happens before governance does.

Credential handling: Same pattern as LangChain — environment variables and configuration. No per-agent credentials. Credentials belong to tools, not the agents using them, so there is no way to attribute a specific tool call to a specific agent identity in an audit log.

Kill switch: None native. Stopping a CrewAI agent requires terminating the crew process. Credentials remain valid in whatever systems they authenticate to — there is no revocation propagation.

Anomaly detection: Not provided. CrewAI's observability covers task execution and output. Behavioral anomalies at the identity level are not monitored.

Agent-to-agent delegation: CrewAI's hierarchical mode uses a manager agent to delegate tasks to specialist agents. No delegation chain is tracked from an identity perspective — all agents operate under the same credential environment.


3. AutoGen / AG2

Identity security score: 1 / 5

Microsoft's AutoGen — now evolving into AG2 and merging with Semantic Kernel as Microsoft Agent Framework — is strongest for conversational multi-agent systems and research-heavy prototyping. Its code execution capabilities (agents that write and run code autonomously) make identity governance particularly important: a compromised AutoGen agent with code execution access can do significantly more damage than one limited to API calls.

Credential handling: Developer-supplied via environment or configuration. No per-agent credentials. Code execution environments are sandboxed at the infrastructure level, but the agent's identity within those environments is not separately managed.

Kill switch: None native. Given AutoGen's code execution capabilities, this is the most significant gap in the reviewed frameworks — an active AutoGen agent with broad tool access and no revocation mechanism represents a meaningful attack surface if compromised.

Anomaly detection: Not provided by the framework. Microsoft's Azure AI Foundry integration provides some observability at the platform level for agents deployed there, but this is infrastructure-level monitoring, not agent identity governance.

Agent-to-agent delegation: AutoGen's conversational agent model allows agents to message and task each other. No identity-level delegation chain is maintained.


4. LlamaIndex

Identity security score: 1 / 5

LlamaIndex is the strongest framework for RAG-first agents — systems where the agent's primary value is querying and reasoning over private indexed data. Its event-driven Workflows layer handles multi-agent orchestration well for data-heavy pipelines. The data access patterns that make LlamaIndex powerful also make identity governance particularly important: agents with broad read access to indexed private data are a high-value target if compromised.

Credential handling: Developer-supplied. LlamaIndex does support local model deployment to keep data on-premises, which reduces some external credential surface, but agent identity management is not addressed.

Kill switch: None native. An LlamaIndex agent with access to a sensitive document index has no platform-level revocation mechanism.

Anomaly detection: Not provided. LlamaCloud (the managed offering) provides some observability, but not identity-level behavioral monitoring.

Agent-to-agent delegation: LlamaIndex's TrustedAgentWorker (added via Microsoft's Agent Governance Toolkit integration) is a notable exception — the first framework-level integration we found that approaches a delegation trust model. It remains an add-on, not native behavior.


5. Semantic Kernel

Identity security score: 2 / 5

Semantic Kernel is the strongest open-source framework for enterprise security alignment, primarily because of its Microsoft/.NET lineage and Azure integration. Agents deployed via Azure AI Foundry inherit Azure's identity controls — managed identities, role-based access control, and Azure Monitor integration. This is the closest any reviewed framework comes to meaningful identity governance, but it's inherited from the deployment platform rather than built into the framework itself.

Credential handling: Azure-deployed agents can use managed identities, which eliminates static credential management for Azure-native integrations. For non-Azure integrations, developer-supplied credentials apply. Notably, this is platform-level, not agent-level — identity is scoped to the deployment environment, not the individual agent.

Kill switch: Partial. Revoking an Azure managed identity cuts off Azure-authenticated integrations. Non-Azure integrations are not covered. True cross-system agent revocation requires external governance.

Anomaly detection: Azure Monitor and Microsoft Sentinel can be configured to monitor agent activity for Azure-native deployments. This requires custom configuration and is not agent-aware out of the box — it monitors infrastructure events, not agent behavioral patterns.

Agent-to-agent delegation: Not addressed natively. Semantic Kernel's plugin and skill model handles agent composition, but delegation identity is not tracked.


6. OpenAI Agents SDK

Identity security score: 1 / 5

The OpenAI Agents SDK provides a clean, straightforward path to building agents against OpenAI models, with native handoff between agents and built-in tool use. Its simplicity is its main appeal — less ceremony than LangChain, more structure than raw API calls. From an identity security perspective, it shares the same fundamental gap as every other framework: no per-agent credential management, no kill switch, no behavioral monitoring.

Credential handling: OpenAI API keys are developer-supplied. The SDK has no mechanism for per-agent credential issuance or lifecycle management. All agents in a deployment share the same API key environment.

Kill switch: None native. Revoking an OpenAI API key terminates all agents using it simultaneously — a blunt instrument that doesn't allow selective revocation of a single compromised agent.

Anomaly detection: OpenAI's usage dashboard provides token consumption visibility. Behavioral anomaly detection at the agent identity level is not provided.

Agent-to-agent delegation: The SDK's handoff model transfers conversation context between agents. No identity delegation chain is tracked.


7. Pydantic AI

Identity security score: 1 / 5

Pydantic AI brings FastAPI-style type safety and dependency injection to agent development — the strictest typing model of any reviewed framework. Its structured, verifiable outputs and clean Python ergonomics make it increasingly popular for production Python stacks. Type safety helps catch a class of runtime errors, but does not address identity governance.

Credential handling: Developer-supplied via Pydantic's dependency injection model. Clean DI patterns make credential management more structured than environment variable soup, but there is no per-agent credential lifecycle — credentials are injected at the dependency level, not issued per agent.

Kill switch: None native.

Anomaly detection: Not provided.

Agent-to-agent delegation: Not addressed.


Summary scorecard

Framework Per-Agent Credentials Credential Rotation Anomaly Detection Kill Switch Delegation Tracking
LangChain / LangGraph
CrewAI
AutoGen / AG2
LlamaIndex ⚠️ Partial (add-on)
Semantic Kernel ⚠️ Azure only ⚠️ Azure only ⚠️ Azure only ⚠️ Partial
OpenAI Agents SDK
Pydantic AI

What this means for production deployments

The pattern is consistent across all seven frameworks: identity governance is explicitly the developer's responsibility. This isn't a design failure — it's a scope decision. Agent frameworks are libraries for building agent behavior. They hand off deployment, observability, and security to the team shipping the agent.

The problem is that most teams treat that handoff as a later problem. Agents get deployed with environment variable credentials, no rotation schedule, no behavioral monitoring, and no documented process for what happens when an agent is suspected to be compromised. "Later" often arrives as an incident.

The gap frameworks leave is consistent: per-agent credential issuance and lifecycle management, behavioral monitoring tuned to machine-speed agent activity, cross-system kill switches for instant revocation, and tracked delegation chains for multi-agent systems. These aren't framework problems to solve — they're governance layer problems. The framework builds the agent. The governance layer manages its identity.

Frequently asked questions

Do AI agent frameworks handle identity security?

No major framework provides built-in agent identity governance. They handle tool authentication (connecting agents to APIs and services) but not agent identity management: per-agent credentials, behavioral monitoring, cross-system kill switches, or agent audit logging. Security is explicitly the developer's responsibility in every framework reviewed.

Which AI agent framework has the best built-in security?

Semantic Kernel has the strongest enterprise security alignment among open-source frameworks, due to Azure managed identity integration for Azure-deployed agents. LangGraph offers the most auditable execution model. Neither provides native agent identity governance out of the box.

What are the biggest identity security gaps in AI agent frameworks?

The four most consistent gaps: no per-agent credential management, no behavioral monitoring tuned to agent activity, no cross-system kill switch, and no tracked agent-to-agent delegation model.


NullBridge provides the identity governance layer that agent frameworks don't →