DIVITIAE.AI employs a robust architecture to seamlessly integrate bleeding-edge artificial intelligence technology and diverse data sources, delivering best-in-class intelligence to retail investors.
Technology
Overall Technology Architecture
DIVITIAE.AI’s Data Processing Platform handles both quantitative and qualitative data seamlessly. Within the platform, the orchestrator artifact defines the underlying infrastructure and automates job execution. All jobs leverage our custom software package DIVITIAE ETL, which streamlines common functions. Once processed, the data is stored in our datalake, ready for market use via DIVITIAE application.
Learn more about each component
Quantitative Data
Polygon.io is a financial data provider that specializes in real-time and historical market data for stocks and cryptocurrencies. DIVITIAE.AI uses polygon to pull market price of stocks.
SEC API provides access to a wealth of financial and regulatory data related to publicly traded companies. DIVITIAE.AI extracts information from annual / quarterly reports to compute intrinsic values.
Qualitative Data
NewsCatcher API aggregates news articles from various sources across the web. It provides a streamlined way to access up-to-date news content. DIVITIAE.AI uses the API for the news summary.
Finnhub is an API provider that offers financial data and market information to developers, investors, and businesses. DIVITIAE.AI uses the API to collect the earnings call data including transcripts.
RAG Pipeline
For summary generation, DIVITIAE utilizes the Retrieval Augmented Generation (RAG) pipeline to enhance large language models (LLMs) by integrating external knowledge sources, facilitating summary generation. RAG enables us to access up-to-date news and earnings call data, which we condense into concise summaries.
The user's query is transformed into a prompt for retrieving proprietary data. This prompt is then combined with retrieved context and LLM instructions to form a template. This template serves as input for the final LLM, contributing to the process of generating summaries.
RAG Evaluation
Considering the diverse elements of the RAG pipeline and potential future developments, we’ve adopted a holistic evaluation approach for various modeling combinations. We tested 44 total modeling combinations.
The evaluation pipeline employs 47 metrics, which are categorized into 5 categories: count metrics, semantic similarity metrics, n-gram metrics, RAGAS metrics, and inference time.
The heatmap shows the indexed results. The 21st modeling combination was selected as the best model both for news and earnings call summary, based on its best balance of semantic similarity and RAGAS metrics.
Normalized Evaluation Metrics Across modeling combinations (News)
Normalized Evaluation Metrics Across modeling combinations (Earnings Calls)
RAG Components
The RAG Pipeline comprises several key components, which is validated by the evaluation pipeline above.
The chosen models and tools enable robust yet flexible architecture development, minimizing the need for extensive pipeline modifications even as the MVP and functional requirements of the end product evolve.