Leveraging Large Language Models in Solar PV Industry to Unlock the Power of Historical Data

Maksim Markevich
Jun 4, 2023
8 min read

Historical project data and WHY do we need it?

Historical project data in the AEC industry refers to the collection and analysis of data from previous construction projects

Historical project data in the construction industry refers to the collection and analysis of data from previous construction projects. This data includes information from different project stages, such as design, pre-construction, procurement, construction, and maintenance. It can include construction drawings, schedules, specifications, performance metrics, materials used, changes, issues, and more.

In real life, there are many technical steps between data collection and data analysis. For example, data cleaning/preprocessing and data normalization. But before diving deeper into the details, let's figure out why we need historical project data:

Design Optimization. Historical project data can inform the design process by indicating which design choices led to the best performance in previous projects.
Cost Estimation and Budgeting. By analyzing the costs associated with past projects, companies can more accurately estimate the cost of future projects and allocate their budgets more effectively.
Construction Planning and Management. Data on construction timelines and challenges from past projects can help project managers plan construction activities more effectively, anticipate potential issues, and keep construction on schedule and within budget.
Risk Assessment. Historical project data can be used to identify risks associated with certain design choices or construction activities. This can inform risk management strategies and contingency planning.
Efficiency Improvements. Analysis of historical project data can identify trends and patterns that indicate opportunities for improving the efficiency and effectiveness of construction projects.
Training and Education. Historical project data can be used to train new employees or educate stakeholders about design and construction.

The collective knowledge of historical project data is frequently encapsulated within the experiences and memories of employees, transmitted organically through casual conversations amongst colleagues, and handed down from one generation to the next. While this form of knowledge transfer happens intuitively, it is evident that there is ample scope to streamline and enhance this process for more systematic and efficient information sharing.

WHY defines WHAT, WHAT defines HOW

The motivation or the goal - the WHY - varies greatly (as we saw above) and influences WHAT data we need and HOW we should gather it. Let's dive deeper into a few examples of this WHY/WHAT/HOW relationship of historical project data.

Example 1: Improve Feasibility Analysis

WHY: The goal is to enhance the accuracy and reliability of feasibility analyses to minimize risks and ensure resources are allocated to viable projects.

WHAT: To accomplish this goal, we need data on projected vs actual costs, design economics, value metrics, timelines and any challenges or successes encountered along the way.

HOW: Collecting this data can involve a multi-faceted approach. Detailed project reports, financial records, and post-project reviews can all provide valuable insights. Once gathered, the data can be analyzed to identify patterns, correlations, or trends that could inform future feasibility analyses.

Example 2: Improve Decision-Making Process

WHY: The objective is to strengthen the decision-making process, allowing for more informed, data-driven decisions that lead to better outcomes.

WHAT: To support this objective, we need data that provides insights into the impacts and outcomes of past decisions. In the construction context, this might include data on chosen construction methods, hired contractors, design changes, and their impacts on cost, time, and the overall quality of the finished buildings.

HOW: Getting this data isn't just about looking at records and reports. The real treasure trove of decision-making info often comes from daily chats, meetings where important changes get hashed out, and the ongoing updates that shape the project over time. These casual but super important data sources show us how decisions were made, what other options were on the table, and how changes rolled out. Checking this data regularly helps those making the calls understand the results of their choices and tweak their game plan as projects change, leading to smoother running and better results for design and construction projects.

HOW is linked back to WHY

Undeniably, the "HOW" in both examples goes well beyond merely collating data from concrete sources such as drawing packages or purchase orders. The process of procuring relevant data is frequently a complex and nuanced endeavour, encompassing many aspects related to project execution. By keeping our purpose front and centre, we ensure that we're not just collecting data for the sake of it.

Solar PV industry features

The solar photovoltaic (PV) industry is a relative newcomer to the energy scene, particularly when benchmarked against traditional stalwarts like coal, oil, and natural gas. Furthermore, even within the realm of renewable technologies, commercial-scale electricity generation from solar PV is a recent phenomenon, with widespread adoption taking root in the early years of the 21st century.

Yet, despite its youth, the solar PV industry has charted a stellar trajectory in recent years. The United States alone marked a significant milestone, installing a staggering 20.2 GW of solar PV capacity in 2022. This unprecedented surge has created a flourishing demand for EPC (Engineering, Procurement, and Construction) contractors, the technical experts who spearhead solar PV projects from concept to completion.

Certain EPC contractors have ridden this wave of opportunity, rapidly expanding their operations to meet the burgeoning demand. Each one of the top 5 EPC contractors in the US, for instance, catapulted from zero revenue in 2018 to a staggering ~$1 billion in revenue in 2022.

This trend offers two pivotal insights from a historical project data standpoint:

The industry is not just growing, but it is also evolving and innovating at breakneck speed. Improvements in solar cell efficiency, advances in panel design, and the integration of energy storage solutions, among others, underpin the technological progress in solar PV.
However, despite this rapid progress, a reliable corpus of historical project data is yet to emerge. Given that the widespread adoption of solar PV started only in the early 2000s, there aren't many projects that have completed full operational lifecycles. Consequently, we lack comprehensive data sets that could potentially shed light on the long-term reliability of specific systems or technologies.

In a fast-paced, innovative industry like solar PV, experimentation is part of the game. Trying out new technologies, implementing novel design practices, tinkering with financial models – it's all part of pushing the boundaries. But not every experiment is a resounding success, and this process can generate a considerable amount of data that may seem, at first glance, less than useful.

However, labelling certain data as ‘useless’ might be a bit harsh. Instead, we can view data as existing on a spectrum from ‘high-value’ to ‘low-value’. Remember, what may appear as a failure or a dead-end at one point might turn into a valuable lesson or a stepping-stone to a new idea later on.

Take, for example, data related to obsolete technologies or systems. At first glance, this might seem to be on the ‘low-value’ end of the spectrum. However, this data can become ‘high-value’ when used for system maintenance or upgrades.

Similarly, data on outdated design practices might initially appear to be of ‘low-value’. Yet, when we translate these old practices into robust principles or 'lessons learned', they transition into the ‘high-value’ category.

Even financial data, such as information on initial installation costs, ongoing operational expenses, and cost of capital for different projects and technologies, may seem ‘low-value’ when collected over long periods due to inflation and currency fluctuations. But when we normalise these figures to ensure relevance and comparability across different contexts, they become ‘high-value’.

Normalising cost data, in essence, means adjusting the data to allow for a fair comparison across different contexts, periods, or other variables. It involves eliminating the influence of certain external factors so that underlying trends or patterns can be more clearly seen. This is particularly important in the solar PV industry, given the rapid technological changes, economies of scale, and regulatory frameworks.

Normalising designs in the same way as cost data is a bit more challenging due to the qualitative nature of the design. However, it's still possible to create certain standards or reference points that can serve as a form of normalization (efficiency indexes, complexity metrics, performance indicators).

With the pace of advancement in our field, what seems like low-value data today might be a treasure trove tomorrow. That's why, storage permitting, it's smart to keep data in its most unprocessed form. This raw data can be manipulated and examined in ways we might not even envision right now, as future methodologies could unlock previously unseen value.

Sure, not all data will shine. Some might be forever confined to the dusty archives of solar PV history, and others, like incomplete or incorrect data, could lead us astray. There may even be old policy data we'd rather not remember. But as long as we can afford to store it, it's worth keeping.

Time might change our perspective, or new contexts might highlight unexpected insights in these overlooked data sets. By storing this data, we hedge against prematurely discarding something that could, with future understanding or techniques, prove to be worthwhile.

Large language models in solar PV industry when working with historical project data

Large language models and historical project data in the AEC

Digging into historical data of solar PV farms, or really any project can be a pretty complex puzzle. You've got all sorts of tools at your disposal, from data crunchers and stats software to prediction tools and machine learning stuff. But right now, there's a new kid on the block that's getting a lot of buzz - those Large Language Models.

So, let's take a wild guess at how these Large Language Models could help us dig into historical project data. A Large Language Model is kind of like a super-smart text generator. It's a computer program that's been trained to understand and generate human-like text. You give it some input — say, a question or a prompt — and it predicts what comes next based on what it's learned from training on a huge amount of text data. It's like having a digital buddy that can chat, write, and even help you solve problems in various topics.

Let's explore the practical applications of Large Language Models (LLMs) in handling historical project data, focusing on their current capabilities rather than potential future advancements:

Data Preparation:
- Natural Language Processing: LLMs excel at processing and understanding unstructured written data such as meeting logs, conversation transcripts, emails, and reports. These models can extract key decisions, critical events, and important discussions from the data, converting unstructured information into a structured format for easier analysis and reference. This makes a potentially difficult-to-navigate archive accessible and searchable, fostering more effective learning from past projects and ensuring continuity even when team members change.
Data Analysis and Interpretation:
- Interactive Querying and Insight Extraction: By being fine-tuned on historical project data, LLMs can serve as interactive interfaces for exploring that data. Users can pose questions in natural language, and the LLM can respond with relevant information extracted directly from the data. This creates a more efficient process for finding information and extracting insights compared to manual searching, particularly with large volumes of text.
- Facilitating Advanced Data Analysis: Although LLMs cannot perform advanced data analysis tasks such as numerical predictions or forecasts, they can aid these processes by providing clear, understandable descriptions of the data being analyzed and by summarizing complex technical results in terms that are easily understood.
Knowledge Base for Training and Education:
- Creating an Interactive Knowledge Base: An LLM, fine-tuned on your historical project data, essentially forms an interactive knowledge base that encapsulates the information within that data. This knowledge base can be queried in natural language, making it a user-friendly tool for learning about past projects and decisions. This proves particularly valuable for onboarding new employees, allowing them to interact with the LLM to familiarize themselves with the company's history and practices quickly.
- Ongoing Learning and Improvement: The utility of this LLM-powered knowledge base is not static. As new projects are completed, and more data is generated, the LLM can be further fine-tuned on this new data, enabling it to provide more up-to-date and relevant responses. Additionally, based on user feedback, the model can be adjusted and validated, ensuring it remains a reliable source of information. This allows the knowledge base to evolve with the company, supporting continuous learning and improvement.

Wrapping up, it's all about knowing what tools you've got in your toolbox and when to use them. Think of Large Language Models in Solar PV Industry as the Swiss Army knives of language understanding - they're fantastic for sifting through a mountain of text, turning a messy heap of meeting notes into a neatly organized database or serving as a handy guidebook for new team members. They're not the go-to tool for number-crunching tasks like trend analysis or forecasting - that's where your stats and math-focused tools come in.

The big wow factor with LLMs is their knack for speaking our language. Give them a bunch of text data, and they'll help you dig up the nuggets of gold buried in there. Even better, they're learners just like us, getting better and more tuned in to your specific needs the more data they digest.

But hey, remember - no tool can do it all. It's all about mixing and matching, playing to each tool's strengths. LLMs have got the language part nailed, but they're one part of the puzzle. Pair them up with your number-crunching tools and, most importantly, your human know-how, and you've got yourself a well-rounded, powerhouse team ready to make the most of your project data. Sounds like a win to me!