# A Guide to Implementing Secure AI Command-Line Interfaces with OAuth 2.0 Authentication ## I. Introduction: The Architectural Imperative for User-Centric Authentication in AI CLI Tools The proliferation of powerful Large Language Models (LLMs) from providers such as Google, OpenAI, and Anthropic has catalyzed a new wave of developer tooling, particularly within the command-line interface (CLI) ecosystem. However, the prevailing authentication model for these tools—relying on developer-provisioned API keys—presents significant security, scalability, and usability challenges. This report details the architectural shift required to move from this brittle, developer-centric model to a robust, user-centric paradigm using the OAuth 2.0 authorization framework. This transition is not merely an enhancement; for any publicly distributed CLI tool, it is a fundamental prerequisite for achieving enterprise-grade security and a viable operational model. ### The Paradigm Shift from Developer Keys to User Tokens The traditional approach involves the developer obtaining a single, high-privilege API key from an AI service provider and embedding it within the distributed CLI application or a backend service. This model is fraught with peril. A compromised key, whether leaked from a client application or a central server, grants an attacker unfettered access to the developer's account, leading to potential data breaches and catastrophic billing fraud. Furthermore, the developer is forced into the role of a billing and quota intermediary, a complex and costly position that stifles scalability. The modern, delegated authorization model offered by OAuth 2.0 fundamentally inverts this relationship. Instead of the developer authenticating on behalf of all users, each user authenticates directly with the AI service provider and explicitly grants the CLI application limited, revocable permissions to act on their behalf. The CLI receives a short-lived access token specific to that user, which it uses for API calls. This approach offers a multitude of benefits that are essential for a production-grade application. - **Enhanced Security:** The core principle of OAuth 2.0 is that the user never shares their primary credentials (username, password, or master API key) with the client application.1 The CLI is granted only a limited-scope access token, adhering to the principle of least privilege. If a token is compromised, the blast radius is confined to a single user's permissions and can be easily revoked without affecting the entire user base.2 - **User Control and Trust:** During the OAuth 2.0 flow, the user is presented with a consent screen managed by the trusted AI provider (e.g., Google). This screen clearly enumerates the permissions (known as "scopes") the CLI is requesting.3 The user has the power to grant or deny this request and can revoke the application's access at any time from their provider's security dashboard. This transparency is critical for building user trust. - **Scalable Billing and Quotas:** Each user's API calls are billed directly against their own account with the AI provider and are subject to their individual rate limits and quotas. This elegant solution completely removes the developer from the billing loop, enabling the CLI to scale to thousands or millions of users without incurring API usage costs on their behalf. - **Simplified Distribution:** The developer can distribute the CLI application publicly without the immense operational burden of provisioning, securely storing, rotating, and protecting a centralized, high-privilege API key.2 ### The "OAuth Chasm" and its Strategic Implications While the benefits of OAuth 2.0 are clear, a critical examination of the major AI providers reveals a fundamental divergence in authentication philosophies. This "OAuth Chasm" has profound implications for any developer seeking to build a multi-provider CLI. On one side, Google has deeply integrated OAuth 2.0 into its identity platform and provides robust, well-documented support for third-party applications to authenticate users for the Gemini API.5 This aligns with Google's broader ecosystem strategy, leveraging its mature identity infrastructure to empower third-party developers. On the other side, OpenAI and Anthropic have adopted a more direct, developer-pays business model for their public APIs. OpenAI's official policy explicitly forbids user-delegated authentication. The API is accessible only via a developer-owned API key, and their terms of service prohibit "bring-your-own-key" models where a user provides their key to a third-party application.7 This is not a temporary feature gap but a deliberate policy choice. Similarly, Anthropic's public API documentation exclusively details an API key-based authentication method, with no mention of a public OAuth 2.0 flow for third-party developers.8 This divergence forces a major architectural bifurcation. The implementation path for Google Gemini is clean, secure, and follows established industry standards. The path for OpenAI's ChatGPT and Anthropic's Claude, under their current public terms, is fundamentally incompatible with a direct, client-side OAuth 2.0 flow. Supporting these providers transforms the project from a simple client-side tool into a full-fledged Software-as-a-Service (SaaS) platform. The developer must build, host, and secure a backend proxy, manage user accounts, handle the secure storage of user-provided API keys, and assume significant legal, operational, and financial responsibilities. This report will provide a detailed blueprint for both architectural paths, enabling developers to make informed strategic decisions. ## II. The Gold Standard for Public Clients: The OAuth 2.0 Authorization Code Flow with PKCE For a CLI application, which is classified as a "public client" because it cannot securely store a long-lived client secret, the industry-standard protocol for user authentication is the OAuth 2.0 Authorization Code Flow with Proof Key for Code Exchange (PKCE). This extension, detailed in RFC 7636, is designed to mitigate authorization code interception attacks and is the recommended best practice by major identity providers and security bodies.9 The AWS CLI, for instance, defaults to this flow for its SSO login commands.9 ### Deconstructing the Flow The PKCE flow adds a dynamic, request-specific secret to the standard Authorization Code flow, ensuring that even if an authorization code is stolen, it is useless to an attacker. The process involves a carefully choreographed exchange between the CLI application, the user's browser, and the AI provider's authorization server. 1. **Code Verifier and Challenge Generation:** Before initiating the flow, the CLI generates a high-entropy cryptographic random string known as the `code_verifier`. This string must be between 43 and 128 characters long. The CLI then transforms this verifier into a `code_challenge`. The recommended transformation method is `S256`, where the `code_challenge` is the Base64URL-encoded SHA256 hash of the `code_verifier`.12 - ![]() 2. **The Authorization Request:** The CLI launches the system's default web browser, directing the user to the provider's authorization endpoint. This request is a URL containing several key query parameters: - `client_id`: The public identifier for the CLI application, obtained during registration with the provider. - `redirect_uri`: The URI where the provider will redirect the user after authentication. For a CLI, this is a loopback address pointing to a temporary local web server (e.g., `http://localhost:7777`).13 - `response_type=code`: Specifies that the application is requesting an authorization code. - `scope`: A space-delimited list of permissions the application is requesting (e.g., `https://www.googleapis.com/auth/generative-language.retriever`).3 - `code_challenge`: The transformed verifier from Step 1. - `code_challenge_method=S256`: Informs the server that the challenge was created using the SHA256 algorithm.12 3. **User Authentication and Consent:** The user interacts directly and securely with the provider's website. They enter their credentials (e.g., Google username and password) and are presented with a consent screen detailing the permissions requested by the CLI. If the user approves, the authorization server records the `code_challenge` associated with this session.1 4. **Authorization Code Redirect:** Upon successful consent, the provider's server redirects the user's browser back to the `redirect_uri` specified in Step 2. Appended to this URL is a short-lived, single-use `authorization_code`.13 For example: `http://localhost:7777/callback?code=AUTHORIZATION_CODE_HERE`. 5. **The Token Exchange:** The temporary local web server running as part of the CLI process receives this incoming request and extracts the `authorization_code`. The CLI then makes a direct, server-to-server POST request to the provider's token endpoint. This request is not performed in the browser. It contains: - `grant_type=authorization_code`: Specifies the grant type. - `code`: The authorization code received in the previous step. - `redirect_uri`: The same redirect URI used in the initial request. - `client_id`: The application's public client ID. - `code_verifier`: This is the critical step. The CLI sends the original, unhashed `code_verifier` from Step 1. This proves it is the same client that initiated the flow.12 6. **Token Issuance and Verification:** The authorization server receives the token request. It retrieves the `code_challenge` it stored in Step 3, applies the same S256 transformation to the received `code_verifier`, and compares the results. If they match, the server authenticates the client and issues an `access_token` (used to make API calls) and a `refresh_token` (used to obtain new access tokens without user interaction).10 ### Security Analysis: Why PKCE is Non-Negotiable The brilliance of PKCE lies in its defense against authorization code interception. In a non-PKCE flow, if a malicious application on the user's machine could intercept the redirect containing the `authorization_code`, it could potentially exchange that code for an access token. With PKCE, the intercepted code is worthless. The attacker does not possess the original `code_verifier`, which was generated and stored securely within the legitimate CLI application and never transmitted during the browser-based part of the flow. Without the `code_verifier`, the token exchange in Step 5 will fail, rendering the attack inert.10 The following table provides a summary of the PKCE flow for quick reference. | Step | Actor (CLI/User/Provider) | Action | Key Parameters Exchanged | Security Purpose | | --- | --- | --- | --- | --- | | 1 | CLI | Generates a secret (`code_verifier`) and a public challenge (`code_challenge`). | `code_verifier`, `code_challenge` | Creates a one-time secret for the transaction. | | 2 | CLI | Redirects the user's browser to the provider's authorization endpoint. | `client_id`, `redirect_uri`, `scope`, `code_challenge` | Initiates the authorization request with the public part of the proof key. | | 3 | User / Provider | User authenticates with the provider and grants consent. Provider stores the `code_challenge`. | User credentials, consent grant | Authenticates the user and authorizes the requested permissions. | | 4 | Provider | Redirects the user's browser back to the CLI's local server with a temporary code. | `authorization_code` | Delivers a single-use code to the client via the user-agent. | | 5 | CLI | Captures the code and makes a direct POST request to the provider's token endpoint. | `authorization_code`, `client_id`, `code_verifier` | Exchanges the code for tokens, proving its identity with the original secret. | | 6 | Provider | Verifies the `code_verifier` against the stored `code_challenge` and issues tokens. | `access_token`, `refresh_token` | Validates the client's proof and grants API access. | ## III. Architectural Blueprint for a Multi-Provider AI CLI A well-architected CLI that supports OAuth 2.0 and multiple AI providers requires a modular and stateful design. The architecture must cleanly separate concerns related to command parsing, authentication, secure storage, and API interaction. ### Core Components The following components form the logical backbone of the application: - **Command Dispatcher:** This is the application's entry point. It is responsible for parsing command-line arguments, flags, and subcommands. It then routes the user's request to the appropriate controller or manager. Libraries like `yargs` or `commander` in the Node.js ecosystem are ideal for building a sophisticated and user-friendly command structure.18 - **Authentication Manager:** This is a stateful, central module that orchestrates all authentication and session-related logic. It should be designed as a singleton or a globally accessible service within the application. Its responsibilities include: - Triggering the provider-specific login flow (i.e., the PKCE flow). - Interacting with the Secure Token Store to persist and retrieve tokens. - Handling silent token refresh operations when an access token expires. - Managing the logout process by clearing tokens from storage. - **Secure Token Store:** This is an abstraction layer that provides a simple interface (`saveTokens`, `getTokens`, `deleteTokens`) for interacting with the underlying secure storage mechanism. Crucially, this component must not store tokens in plaintext files. It should integrate with the native operating system's credential management system: Keychain Services on macOS, Credential Manager on Windows, and Secret Service API/keyring on Linux. This leverages OS-level security features for encryption at rest.2 - **Provider-Specific API Clients:** For each supported AI service, a dedicated client module (e.g., `GeminiClient`, `ProxiedOpenAIClient`) should be implemented. Each client is responsible for: - Encapsulating the logic for making requests to its specific API endpoints. - Correctly formatting requests and parsing responses. - Attaching the appropriate `Authorization: Bearer` header to every outgoing request. - Implementing retry logic that can gracefully handle `401 Unauthorized` errors by signaling the Authentication Manager to perform a token refresh. - **State Management:** The CLI needs a simple mechanism to manage its current state, primarily tracking which user is logged in and which AI provider is currently active. This can be managed through a configuration file (e.g., `~/.my-ai-cli/config.json`) that stores non-sensitive information like the default provider or the user's email. ### User Flow and State Diagram The lifecycle of a user's interaction with the CLI can be modeled with the following state transitions: 1. **Initial State (Unauthenticated):** On the very first run, the CLI checks the Secure Token Store and finds no tokens. The application is in an unauthenticated state. Most commands will fail, prompting the user with a message like, "Please log in first using `cli login` ." 2. **Login Flow:** The user executes `cli login gemini`. - The Command Dispatcher invokes the Authentication Manager's `login` method for the 'gemini' provider. - The Authentication Manager initiates the PKCE flow as described in Section II. - The browser opens, the user authenticates and consents. - The local server captures the authorization code. - The Authentication Manager exchanges the code for tokens. 3. **Transition to Authenticated State:** Upon successful token acquisition, the Authentication Manager calls the Secure Token Store's `saveTokens` method, persisting the access and refresh tokens in the OS keychain. The CLI is now in an authenticated state for the 'gemini' provider. 4. **Authenticated Interaction:** The user executes `cli prompt "Summarize this document."`. - The Command Dispatcher invokes the main application logic. - The logic retrieves the active provider from the state configuration. - It requests the tokens for that provider from the Secure Token Store via the Authentication Manager. - It instantiates the corresponding API Client (e.g., `GeminiClient`) with the retrieved access token. - The API Client makes the request to the AI service. 5. **Token Expiration and Refresh Flow (Silent):** The API Client makes a request but receives a `401 Unauthorized` response, indicating the access token has expired. - The API Client, instead of failing, calls the Authentication Manager's `refreshToken` method. - The Authentication Manager retrieves the `refresh_token` from the Secure Token Store. - It makes a request to the provider's token endpoint with `grant_type=refresh_token`. - It receives a new `access_token` (and potentially a new `refresh_token`). - It updates the tokens in the Secure Token Store. - It returns the new `access_token` to the waiting API Client. - The API Client automatically retries the original failed request with the new token. This entire process is transparent to the end-user. 6. **Refresh Failure and Re-authentication:** If the refresh token is also invalid (e.g., it has expired or been revoked by the user), the `refreshToken` call will fail. The Authentication Manager will then clear the invalid tokens from storage and transition the CLI back to the Unauthenticated state, informing the user that they need to log in again. ## IV. Implementation Guide: Building a Secure CLI with Node.js This section provides a practical, step-by-step guide to building the core of the secure AI CLI using Node.js. The implementation will follow the architectural blueprint from Section III, focusing on the "gold standard" integration with Google Gemini. ### 1. Project Setup and Dependencies First, initialize a new Node.js project and install the necessary libraries. Bash ``` mkdir secure-ai-cli cd secure-ai-cli npm init -y # Install dependencies npm install openid-client express open keytar yargs ``` - **`openid-client`**: A certified OpenID Connect and OAuth 2.0 client library that robustly handles the complexities of the PKCE flow, including discovery of provider metadata, generation of PKCE parameters, and the token exchange process.13 - **`express`**: A minimal web framework used to create the temporary local HTTP server that listens for the OAuth 2.0 redirect callback.13 - **`open`**: A simple, cross-platform utility for opening the authorization URL in the user's default web browser.13 - **`keytar`**: A library for interacting with the native OS keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service/libsecret). This is essential for securely storing sensitive tokens.20 - **`yargs`**: A powerful library for parsing command-line arguments and building a rich, user-friendly command interface. ### 2. The Authentication Module (`auth.js`) This module orchestrates the entire PKCE login flow. It dynamically starts a local server, opens the browser for user consent, captures the callback, exchanges the code for tokens, and then shuts down the server. JavaScript ``` // auth.js const { Issuer, generators } = require('openid-client'); const http = require('http'); const { URL } = require('url'); const open = require('open'); const { saveTokens } = require('./token-manager'); const GOOGLE_ISSUER_URL = 'https://accounts.google.com'; const REDIRECT_URI = 'http://localhost:7777/callback'; const PORT = 7777; // Replace with your actual Client ID from Google Cloud Console const GOOGLE_CLIENT_ID = 'YOUR_GOOGLE_CLIENT_ID.apps.googleusercontent.com'; async function login() { const googleIssuer = await Issuer.discover(GOOGLE_ISSUER_URL); const client = new googleIssuer.Client({ client_id: GOOGLE_CLIENT_ID, redirect_uris:, response_types: ['code'], token_endpoint_auth_method: 'none' // Required for public clients }); return new Promise((resolve, reject) => { const server = http.createServer(async (req, res) => { try { const params = client.callbackParams(req); const tokenSet = await client.callback(REDIRECT_URI, params, { code_verifier }); await saveTokens('google-gemini', tokenSet); res.end(' # Authentication successful! You can now close this browser window and return to your terminal. '); server.close(); console.log('Successfully authenticated and tokens stored securely.'); resolve(tokenSet); } catch (err) { res.end(' # Authentication failed. Please try again. '); server.close(); reject(new Error(`Authentication failed: ${err.message}`)); } }).listen(PORT); const code_verifier = generators.codeVerifier(); const code_challenge = generators.codeChallenge(code_verifier); const authUrl = client.authorizationUrl({ scope: 'openid email profile https://www.googleapis.com/auth/cloud-platform https://www.googleapis.com/auth/generative-language.retriever', code_challenge, code_challenge_method: 'S256', }); console.log('Please complete the authentication process in your browser.'); open(authUrl); }); } module.exports = { login }; ``` ### 3. Secure Token Storage (`token-manager.js`) This module provides a clean abstraction over `keytar` for securely storing and retrieving tokens from the OS keychain. This prevents sensitive credentials like refresh tokens from ever being written to a plaintext file on disk. JavaScript ``` // token-manager.js const keytar = require('keytar'); const SERVICE_NAME = 'SecureAI-CLI'; async function saveTokens(provider, tokenSet) { // keytar can only store strings, so we serialize the tokenSet object const serializedTokenSet = JSON.stringify(tokenSet); await keytar.setPassword(SERVICE_NAME, provider, serializedTokenSet); } async function getTokens(provider) { const serializedTokenSet = await keytar.getPassword(SERVICE_NAME, provider); if (!serializedTokenSet) { return null; } // Deserialize the string back into an object return JSON.parse(serializedTokenSet); } async function deleteTokens(provider) { await keytar.deletePassword(SERVICE_NAME, provider); } module.exports = { saveTokens, getTokens, deleteTokens }; ``` ### 4. The API Client (`gemini-client.js`) This class encapsulates the logic for making API calls to Google Gemini. It includes built-in logic to handle token refreshes automatically, providing a seamless experience for the user. JavaScript ``` // gemini-client.js const { getTokens, saveTokens } = require('./token-manager'); const { Issuer } = require('openid-client'); const GOOGLE_ISSUER_URL = 'https://accounts.google.com'; const GOOGLE_CLIENT_ID = 'YOUR_GOOGLE_CLIENT_ID.apps.googleusercontent.com'; // Must be the same as in auth.js class GeminiClient { constructor(tokenSet) { this.tokenSet = tokenSet; } static async create() { const tokenSet = await getTokens('google-gemini'); if (!tokenSet) { throw new Error('User not authenticated. Please run "login" command.'); } return new GeminiClient(tokenSet); } async refreshToken() { console.log('Access token expired. Refreshing...'); const googleIssuer = await Issuer.discover(GOOGLE_ISSUER_URL); const client = new googleIssuer.Client({ client_id: GOOGLE_CLIENT_ID }); try { const new_token_set = await client.refresh(this.tokenSet.refresh_token); this.tokenSet = new_token_set; await saveTokens('google-gemini', this.tokenSet); console.log('Token refreshed successfully.'); return this.tokenSet.access_token; } catch (err) { console.error('Failed to refresh token. Please log in again.'); // Optionally, delete invalid tokens here await deleteTokens('google-gemini'); throw err; } } async makeRequest(url, body, retry = true) { if (!this.tokenSet ||!this.tokenSet.access_token) { throw new Error('No access token available.'); } const response = await fetch(url, { method: 'POST', headers: { 'Authorization': `Bearer ${this.tokenSet.access_token}`, 'Content-Type': 'application/json', }, body: JSON.stringify(body), }); if (response.status === 401 && retry) { // Token likely expired, attempt to refresh and retry once await this.refreshToken(); return this.makeRequest(url, body, false); // retry=false to prevent infinite loop } if (!response.ok) { const errorBody = await response.text(); throw new Error(`API request failed with status ${response.status}: ${errorBody}`); } return response.json(); } async generateContent(prompt) { // Replace with the correct Gemini API endpoint and project ID const API_URL = 'https://generativelanguage.googleapis.com/v1beta/models/gemini-pro:generateContent'; const body = { contents: [{ parts: [{ text: prompt }] }] }; return this.makeRequest(API_URL, body); } } module.exports = { GeminiClient }; ``` ### 5. The Main CLI file (`index.js`) This file ties everything together using `yargs`. It defines the command structure (`login`, `logout`, `prompt`) and invokes the appropriate modules. JavaScript ``` #!/usr/bin/env node // index.js const yargs = require('yargs/yargs'); const { hideBin } = require('yargs/helpers'); const { login } = require('./auth'); const { deleteTokens } = require('./token-manager'); const { GeminiClient } = require('./gemini-client'); yargs(hideBin(process.argv)) .command('login', 'Authenticate with your Google account', {}, async () => { try { await login(); } catch (error) { console.error(error.message); process.exit(1); } }) .command('logout', 'Log out and remove stored credentials', {}, async () => { await deleteTokens('google-gemini'); console.log('Successfully logged out.'); }) .command('prompt ', 'Send a prompt to the Gemini API', (yargs) => { yargs.positional('text', { describe: 'The prompt text to send', type: 'string', }); }, async (argv) => { try { const client = await GeminiClient.create(); const response = await client.generateContent(argv.text); console.log(JSON.stringify(response, null, 2)); } catch (error) { console.error(`Error: ${error.message}`); process.exit(1); } }) .demandCommand(1, 'You need to provide a command.') .help() .argv; ``` This complete implementation provides a secure, robust foundation for a CLI that uses the industry-standard OAuth 2.0 PKCE flow for user authentication. ## V. AI Service Integration: A Comparative Analysis and Walkthrough Successfully integrating OAuth 2.0 into a multi-provider AI CLI requires a deep understanding of each provider's specific implementation, capabilities, and, most importantly, their policies. As established, the support for third-party, user-delegated OAuth varies dramatically across the landscape. The following matrix provides a definitive, at-a-glance summary of the authentication landscape, clarifying the feasibility and complexity for each major provider. | AI Provider | Public OAuth 2.0 Support | Primary Auth Method | Recommended CLI Flow | Key Scopes | Documentation Links | | --- | --- | --- | --- | --- | --- | | **Google Gemini** | Yes, fully supported | OAuth 2.0 | Authorization Code with PKCE | `generative-language.retriever`, `cloud-platform` | 5 | | **OpenAI ChatGPT** | No | API Key | Secure Proxy Backend | N/A (Permissions managed by key) | 7 | | **Anthropic Claude** | No (Publicly Documented) | API Key | Secure Proxy Backend | N/A (Permissions managed by key) | 8 | ### A. Google Gemini: The Gold Standard Implementation Google's Gemini API is built upon the mature Google Cloud Platform and its robust identity services, making it the ideal candidate for a native OAuth 2.0 integration. The process is well-documented and aligns perfectly with security best practices. #### Step 1: Google Cloud Project Setup Before writing any code, the CLI application must be registered with Google. 1. **Enable the API:** In the Google Cloud Console, navigate to the API Library and enable the "Google Generative Language API" for a new or existing project.1 2. **Configure the OAuth Consent Screen:** This is the screen users will see when they grant your CLI permission. Navigate to "APIs & Services" > "OAuth consent screen." - Select "External" as the user type. This allows any Google account user to authenticate, not just users within your own organization.5 - Fill in the required application details, such as app name and support email. - For development, you can skip adding scopes here and proceed. However, for a production application that uses sensitive scopes, a verification process by Google is required.4 3. **Add Test Users:** While the app is in "testing" mode, only explicitly listed test users can authenticate. Add the Google accounts you will use for development under the "Test users" section.5 #### Step 2: Creating OAuth 2.0 Credentials The CLI needs its own identity to participate in the OAuth flow. 1. Navigate to "APIs & Services" > "Credentials." 2. Click "Create Credentials" > "OAuth client ID." 3. Select "Desktop app" as the Application type. This is the correct choice for a CLI application running on a user's local machine.5 4. Give the credential a name. 5. Upon creation, Google will provide a **Client ID** and a **Client Secret**. Download the JSON file containing these credentials. This file, typically named `client_secret.json`, contains the `client_id` needed for the Node.js implementation. While a `client_secret` is provided, it is not used by a public client like our CLI during the PKCE flow.5 #### Step 3: Scopes for Gemini Scopes define the specific permissions your CLI is requesting. It is a security best practice to request only the scopes that are absolutely necessary ("incremental authorization").2 For basic Gemini functionality, the following scopes are required: - `https://www.googleapis.com/auth/generative-language.retriever`: Grants permission to access the generative language models. - `https://www.googleapis.com/auth/cloud-platform`: A broader scope often required for services running on GCP. - `openid`, `email`, `profile`: Standard OpenID Connect scopes to get basic user identity information. These scopes are included in the `authorizationUrl` call in the `auth.js` example.5 #### Step 4: Code Integration With the credentials and scopes identified, the Node.js implementation provided in Section IV can be configured. The `GOOGLE_CLIENT_ID` constant in `auth.js` and `gemini-client.js` must be set to the value obtained from the Google Cloud Console. The `openid-client` library will handle the rest, including discovering the correct authorization and token endpoints from Google's well-known configuration URL. ### B. OpenAI ChatGPT: Navigating the Lack of OAuth Support The situation with OpenAI is starkly different. As confirmed by community discussions and the lack of official documentation, the public OpenAI API does not support any form of user-delegated OAuth 2.0 authentication.7 Furthermore, their terms of service actively prohibit the "bring-your-own-key" model, which they consider a form of key sharing and a security risk.7 This policy decision forces a complete architectural pivot. A developer cannot build a simple, client-side CLI that allows users to log in to their own OpenAI accounts. #### Architectural Workaround: The Secure Proxy Model The only viable and policy-compliant method for integrating ChatGPT into a multi-user application is to build a secure backend service that acts as a proxy. 1. **Backend Service:** The developer must create, deploy, and maintain a backend application (e.g., using Node.js/Express, Spring Boot, or another framework). 2. **User Authentication (to the Backend):** The CLI application does not authenticate users with OpenAI. Instead, it authenticates users with the developer's own backend service. This can be done via traditional email/password, or by using a social login provider (like "Sign in with Google") for the backend itself. 3. **Secure Key Ingestion:** The user must log into a web portal for the developer's service and manually provide their OpenAI API key. This key must be immediately encrypted using a strong algorithm (e.g., AES-256) and stored in the backend's database, securely associated with the user's account. The unencrypted key should never be logged or stored. 4. **Proxied API Calls:** - The CLI sends its prompts to an endpoint on the developer's backend service, authenticating with a token issued by that backend (e.g., a JWT). - The backend receives the request, identifies the authenticated user, and retrieves their encrypted OpenAI API key from the database. - It decrypts the key in memory. - It then makes a new request to the actual OpenAI API, injecting the user's decrypted API key into the `Authorization` header. - It receives the response from OpenAI and relays it back to the CLI client. #### Analysis of Overheads This model fundamentally changes the nature of the project, introducing massive complexity and responsibility: - **Infrastructure Costs:** The developer is now responsible for hosting, scaling, and maintaining a 24/7 backend service. - **Security Burden:** The developer's service is now a high-value target for attackers, as it stores sensitive user credentials (their OpenAI API keys). A breach would be catastrophic. This requires rigorous security practices, including database encryption, secrets management, and regular security audits. - **Development Complexity:** The project scope expands to include backend development, database management, user account systems, and a secure web front-end for key management. - **Billing Model:** This architecture does not solve the billing problem. The developer must decide whether to pass on the costs of their own API key usage to users (requiring a full payment and subscription system) or rely on the user's key, which still carries the security risks of storing third-party secrets. ### C. Anthropic Claude: An Ambiguous Landscape Anthropic's public API documentation mirrors OpenAI's approach. The only documented authentication method is via an `x-api-key` header, where the key is provisioned by the developer.8 There is no public documentation for a third-party OAuth 2.0 flow. However, the existence of first-party tools like "Claude Code" (mentioned in the user query) and the "Claude Code" documentation on their site 21 implies that an OAuth 2.0 mechanism likely exists within Anthropic's ecosystem. This is common for services that need to authenticate users into their own web applications or for select enterprise partners. At present, this mechanism does not appear to be open for public, third-party developer use. #### Strategic Recommendation For integrating Claude into the CLI, the immediate and only documented path is the same **Secure Proxy Model** described for OpenAI. The developer must assume that a native OAuth flow is not available. The recommended course of action is two-fold: 1. **Architect for the Proxy Model:** Design the application with the assumption that Claude integration will require a secure backend. 2. **Engage with Anthropic:** Proactively contact Anthropic's developer relations or partnership teams. Inquire about the existence of a private or beta OAuth 2.0 program for third-party tool developers. While there is no guarantee of access, it is the only potential path to a more direct integration in the future. ## VI. Advanced Topics and Alternative Implementations A production-grade CLI must be versatile, catering to different user environments and potentially being built with different technology stacks. This section explores two critical advanced topics: handling headless environments and an alternative implementation sketch using Java and Spring Boot. ### The Duality of CLI Authentication Flows The Authorization Code Flow with PKCE, while being the most secure interactive standard, has a significant operational dependency: it requires access to a local web browser and the ability to receive a callback on a localhost port.9 This makes it fundamentally incompatible with non-interactive, "headless" environments where developers often work, such as: - Remote servers accessed via SSH. - Docker containers. - CI/CD pipelines (e.g., GitHub Actions). This limitation creates a duality in CLI authentication. A tool that only supports PKCE will fail for a significant portion of its target developer audience. Major CLI tools recognize this; the AWS CLI, for example, can fall back from the PKCE-based flow to the OAuth 2.0 Device Authorization Grant in environments where it cannot launch a browser.9 Conversely, some platforms like Salesforce are moving away from the Device Flow due to perceived security trade-offs, highlighting the nuanced landscape.23 For maximum utility, a modern CLI should implement a **conditional authentication strategy**. It should detect the environment and default to the more secure PKCE flow whenever possible, but gracefully fall back to the Device Authorization Flow in headless scenarios. ### The Device Authorization Flow The Device Authorization Flow (RFC 8628) is specifically designed for input-constrained devices and headless clients. It decouples the device where the CLI is running from the device where the user authenticates. 1. **Device Authorization Request:** The CLI makes a POST request to the provider's device authorization endpoint, sending its `client_id` and requested `scope`. 2. **User Interaction Codes:** The authorization server responds with several pieces of information 24: - `device_code`: A long-lived secret that the CLI will use for polling. - `user_code`: A short, user-friendly code (e.g., `RRGQ-BJVS`). - `verification_uri`: A URL the user must visit on another device (e.g., their laptop or smartphone). - `interval`: The recommended polling interval in seconds. 3. **User Prompt:** The CLI displays the `verification_uri` and `user_code` to the user in the terminal, instructing them to complete the process on a different device. 4. **User Authentication:** The user opens the `verification_uri` in a browser on their phone or laptop, authenticates with the provider, and enters the `user_code` to link the session to the waiting CLI. 5. **Polling for Tokens:** While the user is authenticating, the CLI begins polling the provider's token endpoint at the specified `interval`. Each poll is a POST request containing the `grant_type=urn:ietf:params:oauth:grant-type:device_code` and the `device_code`. 6. **Token Issuance:** Once the user successfully completes authentication and consent in their browser, the next poll from the CLI will succeed, and the token endpoint will respond with the `access_token` and `refresh_token`, completing the flow. An implementation would involve adding logic to the `auth.js` module to detect a headless environment (e.g., by checking for a `DISPLAY` environment variable on Linux or trying to launch the `open` command in a try-catch block) and invoking the Device Flow instead of the PKCE flow. ### Java/Spring Boot Implementation Sketch While this guide's primary implementation is in Node.js, the same architectural principles apply to a Java-based CLI. However, it's important to note that most examples for Spring Security's OAuth2 client are tailored for web applications, where much of the flow is automated by the framework.25 A CLI requires a more manual orchestration. - **Dependencies (Maven):**XML `org.springframework.boot spring-boot-starter org.springframework.security spring-security-oauth2-client org.springframework.boot spring-boot-starter-webflux io.projectreactor.netty reactor-netty` - **Configuration (`application.yml`):** Provider details would be configured similarly to a web application.25YAML`spring: security: oauth2: client: registration: google: client-id: YOUR_GOOGLE_CLIENT_ID client-name: Google provider: google scope: openid, email, profile, https://www.googleapis.com/auth/cloud-platform provider: google: issuer-uri: https://accounts.google.com` - **Orchestration Logic:** A main application class would need to: 1. **Generate PKCE Parameters:** Use Java's `SecureRandom` and `MessageDigest` to create the `code_verifier` and `code_challenge`. 2. **Construct Authorization URL:** Manually build the authorization URL using properties from Spring's `ClientRegistrationRepository`. 3. **Start Embedded Server:** Programmatically start an embedded web server like Reactor Netty to listen on the loopback address for the redirect. 4. **Launch Browser:** Use `java.awt.Desktop.getDesktop().browse(uri)` to open the system browser. 5. **Handle Callback:** The embedded server's handler would capture the `authorization_code` from the incoming request. 6. **Perform Token Exchange:** Use Spring's `WebClient` to make the POST request to the token endpoint, including the `code_verifier` and other required parameters. 7. **Secure Token Storage:** Use a library that interfaces with the native OS keychain, such as `keytar-java` or by executing platform-specific command-line tools. This approach leverages Spring's configuration and OAuth2 client objects but requires manual implementation of the flow orchestration that is typically handled automatically in a web context. ## VII. Conclusion and Strategic Recommendations The endeavor to build a secure, multi-provider AI CLI using user-delegated authentication is both feasible and architecturally necessary for long-term success. However, the path is fraught with nuance, primarily driven by the divergent authentication strategies of the major AI service providers. A successful implementation hinges on embracing modern security standards, understanding the operational realities of developer workflows, and making informed strategic decisions based on the current provider landscape. ### Summary of Findings This analysis has yielded several critical conclusions: - **PKCE is the Standard:** The OAuth 2.0 Authorization Code Flow with PKCE is the non-negotiable gold standard for authenticating users in a public CLI application. Its design provides robust security against interception attacks in interactive environments. - **Headless Environments Require a Fallback:** A sole reliance on the browser-dependent PKCE flow will alienate a significant segment of the developer user base. A production-grade CLI must implement a conditional authentication strategy, falling back to the Device Authorization Flow in headless contexts like SSH and CI/CD pipelines. - **Secure Storage is Paramount:** Sensitive credentials, particularly long-lived refresh tokens, must never be stored in plaintext. The only acceptable practice is to use the native operating system's secure keychain (macOS Keychain, Windows Credential Manager, Linux Secret Service), which provides hardware-backed encryption and access control. - **The "OAuth Chasm" Dictates Architecture:** A profound split exists between Google's open, standards-based approach to third-party authentication for Gemini and the closed, API-key-centric models of OpenAI and Anthropic. This is not a technical detail but the single most important strategic factor, forcing fundamentally different architectures for supporting these providers. ### Final Architectural Recommendation Based on these findings, the following phased, modular architectural approach is recommended: 1. **Prioritize a Native Gemini Integration:** Begin by building the CLI with a first-class, native OAuth 2.0 implementation for Google Gemini. This provides a secure, scalable foundation that aligns perfectly with the project's original vision. The Node.js implementation detailed in Section IV serves as a production-ready blueprint for this phase. 2. **Isolate Other Providers via a Secure Proxy:** For OpenAI and Anthropic, do not attempt to force a non-existent OAuth flow. Instead, architect and build the secure backend proxy model described in Section V. The CLI's interaction with these providers should be entirely through this intermediary service. The CLI will authenticate users to *your backend*, not to OpenAI or Anthropic directly. This cleanly separates the two disparate authentication models and contains the significant complexity and security burden of the proxy model. 3. **Design for Future-Proofing:** Implement the provider-specific API clients as modular plugins. This will allow the application to easily adapt if OpenAI or Anthropic eventually release a public, third-party OAuth 2.0 flow. The proxy-based client could simply be swapped out for a new, native OAuth client without requiring a full re-architecture of the CLI's core. ### A Strategic Pivot: Embracing the Model Context Protocol (MCP) Ecosystem Beyond building a new CLI from the ground up, the research into existing tools presents a powerful and highly efficient alternative strategy. The official, open-source Gemini CLI is a mature, feature-rich application that has already solved many of the core challenges: a polished terminal UI, command history, session management, and, most importantly, a built-in, secure OAuth 2.0 login flow for Google accounts.29 Crucially, the Gemini CLI is designed for extensibility via the Model Context Protocol (MCP), a standard for allowing the core AI agent to discover and use external tools and services exposed by MCP servers.22 The community is already building upon this; open-source MCP servers exist that bridge the Gemini CLI to other models, including Claude and the entire catalog of over 400 models available through OpenRouter.31 This opens a new strategic path: **Instead of building a competing CLI, build a value-adding MCP server.** The developer could create a custom MCP server that implements the "Secure Proxy Model" for OpenAI and Anthropic. The end-user experience would be: 1. The user installs and uses the official Google Gemini CLI. 2. They authenticate securely using the built-in "Login with Google" OAuth flow. 3. In their Gemini CLI configuration, they add the developer's custom MCP server endpoint. 4. The Gemini CLI will then automatically discover the "tools" exposed by that server—for example, a `chatWithOpenAI` tool and a `chatWithClaude` tool. 5. The user can now interact with all three AI providers seamlessly within the unified, robust environment of the official Gemini CLI. This approach offers immense advantages: - **Reduced Development Effort:** Leverages a mature, existing client application, allowing the developer to focus solely on the backend integration logic. - **Enhanced User Experience:** Provides users with a single, powerful interface they may already be familiar with. - **Future Alignment:** Aligns the project with an emerging ecosystem standard (MCP), increasing its potential for interoperability and adoption. By pivoting from building a standalone CLI to contributing a specialized server to the Gemini CLI ecosystem, a developer can deliver the desired multi-provider functionality more efficiently, securely, and strategically.