Rethinking Research: Private GPTs for Investment Analysis

In an era where data privacy and efficiency are paramount, investment analysts and institutional researchers may increasingly be asking: Can we harness the power of generative AI without compromising sensitive data? The answer is a resounding yes.

This post describes a customizable, open-source framework that analysts can adapt for secure, local deployment. It showcases a hands-on implementation of a privately hosted large language model (LLM) application, customized to assist with reviewing and querying investment research documents. The result is a secure, cost-effective AI research assistant, one that can parse thousands of pages in seconds and never sends your data to the cloud or the internet. I use AI to augment the process of investment analysis through partial automation, also discussed in an Enterprising Investor post on using AI to augment investment analysis.

This chatbot-style tool allows analysts to query complex research materials in plain language without ever exposing sensitive data to the cloud.

The Case for “Private GPT”

For professionals working in buy-side investment research — whether in equities, fixed income, or multi-asset strategies — the use of ChatGPT and similar tools raises a major concern: confidentiality. Uploading research reports, investment memos, or draft offering documents to a cloud-based AI tool is usually not an option.

That’s where “Private GPT” comes in: a framework built entirely on open-source components, running locally on your own machine. There’s no reliance on application programming interface (API) keys, no need for an internet connection, and no risk of data leakage.

This toolkit leverages:

Python scripts for ingestion and embedding of text documents
Ollama, an open-source platform for hosting local LLMs on the computer
Streamlit for building a user-friendly interface
Mistral, DeepSeek, and other open-source models for answering questions in natural language

The underlying Python code for this example is publicly housed in the Github repository here. Additional guidance on step-by-step implementation of the technical aspects in this project is provided in this supporting document.

Querying Research Like a Chatbot Without the Cloud

The first step in this implementation is launching a Python-based virtual environment on a personal computer. This helps to maintain a unique version of packages and utilities that feed into this application alone. As a result, settings and configuration of packages used in Python for other applications and programs remain undisturbed. Once installed, a script reads and embeds investment documents using an embedding model. These embeddings allow LLMs to understand the document’s content at a granular level, aiming to capture semantic meaning.

Because the model is hosted via Ollama on a local machine, the documents remain secure and do not leave the analyst’s computer. This is particularly important when dealing with proprietary research, non-public financials like in private equity transactions or internal investment notes.

A Practical Demonstration: Analyzing Investment Documents

The prototype focuses on digesting long-form investment documents such as earnings call transcripts, analyst reports, and offering statements. Once the TXT document is loaded into the designated folder of the personal computer, the model processes it and becomes ready to interact. This implementation supports a wide variety of document types ranging from Microsoft Word (.docx), website pages (.html) to PowerPoint presentations (.pptx). The analyst can begin querying the document through the chosen model in a simple chatbot-style interface rendered in a local web browser.

Using a web browser-based interface powered by Streamlit, the analyst can begin querying the document through the chosen model. Even though this launches a web-browser, the application does not interact with the internet. The browser-based rendering is used in this example to demonstrate a convenient user interface. This could be modified to a command-line interface or other downstream manifestations. For example, after ingesting an earnings call transcript of AAPL, one may simply ask:

“What does Tim Cook do at AAPL?”

Within seconds, the LLM parses the content from the transcript and returns:

“…Timothy Donald Cook is the Chief Executive Officer (CEO) of Apple Inc…”

This result is cross-verified within the tool, which also shows exactly which pages the information was pulled from. Using a mouse click, the user can expand the “Source” items listed below each response in the browser-based interface. Different sources feeding into that answer are rank-ordered based on relevance/importance. The program can be modified to list a different number of source references. This feature enhances transparency and trust in the model’s outputs.

Model Switching and Configuration for Enhanced Performance

One standout feature is the ability to switch between different LLMs with a single click. The demonstration exhibits the capability to cycle among open-source LLMs like Mistral, Mixtral, Llama, and DeepSeek. This shows that different models can be plugged into the same architecture to compare performance or improve results. Ollama is an open-source software package that can be installed locally and facilitates this flexibility. As more open-source models become available (or existing ones get updated), Ollama enables downloading/updating them accordingly.

This flexibility is crucial. It allows analysts to test which models best suit the nuances of a particular task at hand, i.e., legal language, financial disclosures, or research summaries, all without needing access to paid APIs or enterprise-wide licenses.

There are other dimensions of the model that can be modified to target better performance for a given task/purpose. These configurations are typically controlled by a standalone file, typically named as “config.py,” as in this project. For example, the similarity threshold among chunks of text in a document may be modulated to identify very close matches by using high value (say, greater than 0.9). This helps to reduce noise but may miss semantically related results if the threshold is too tight for a chosen context.

Likewise, the minimum chunk length can be used to identify and weed out very short chunks of text that are unhelpful or misleading. Important considerations also arise from the choices of the size of chunk and overlap among chunks of text. Together, these determine how the document is split into pieces for analysis. Larger chunk sizes allow for more context per answer, but may also dilute the focus of the topic in the final response. The amount of overlap ensures smooth continuity among subsequent chunks. This ensures the model can interpret information that spans across multiple parts of the document.

Finally, the user must also determine how many chunks of text among the top items retrieved for a query should be focused on for the final answer. This leads to a balance between speed and relevance. Using too many target chunks for each query response might slow down the tool and feed into potential distractions. However, using too few target chunks may run the risk of missing out important context that may not always be written/discussed in close geographic proximity within the document. In conjunction with the different models served via Ollama, the user may configure the ideal setting of these configuration parameters to suit their task.

Scaling for Research Teams

While the demonstration originated in the equity research space, the implications are broader. Fixed income analysts can load offering statements and contractual documents related to Treasury, corporate or municipal bonds. Macro researchers can ingest Federal Reserve speeches or economic outlook documents from central banks and third-party researchers. Portfolio teams can pre-load investment committee memos or internal reports. Buy-side analysts may particularly be using large volumes of research. For example, the hedge fund, Marshall Wace, processes over 30 petabytes of data each day equating to nearly 400 billion emails.

Accordingly, the overall process in this framework is scalable:

Add more documents to the folder
Rerun the embedding script that ingests these documents
Start interacting/querying

All these steps can be executed in a secure, internal environment that costs nothing to operate beyond local computing resources.

Putting AI in Analysts’ Hands — Securely

The rise of generative AI need not mean surrendering data control. By configuring open-source LLMs for private, offline use, analysts can build in-house applications like the chatbot discussed here that are just as capable — and infinitely more secure — than some commercial alternatives.

This “Private GPT” concept empowers investment professionals to:

Use AI for document analysis without exposing sensitive data
Reduce reliance on third-party tools
Tailor the system to specific research workflows

The full codebase for this application is available on GitHub and can be extended or tailored for use across any institutional investment setting. There are several points of flexibility afforded in this architecture which enable the end-user to implement their choice for a specific use case. Built-in features about examining the source of responses helps ascertain the accuracy of this tool, to avoid common pitfalls of hallucination among LLMs. This repository is meant to serve as a guide and starting point for building downstream, local applications that are ‘fine-tuned’ to enterprise-wide or individual needs.

Generative AI doesn’t have to compromise privacy and data security. When used cautiously, it can augment the capabilities of professionals and help them analyze information faster and better. Tools like this put generative AI directly into the hands of analysts — no third-party licenses, no data compromise, and no trade-offs between insight and security.

Source link