trowel-bricksStructuring Context So the Model Actually Uses It

Across all three tiers, the single most impactful variable in context quality is not the volume of information you provide but how well that information is structured. A well-structured small document will consistently outperform a poorly structured large one. This chapter covers the structural practices that apply regardless of which tier you are working in.

The Difference Between Raw and Prepared Context

Raw context is a document in its original form: the PDF as exported from the authoring tool, the email thread as copied from the inbox, the spreadsheet as pasted from the application. Raw context contains everything, including formatting artefacts, boilerplate headers and footers, navigation elements, metadata noise, and irrelevant background material.

Prepared context is the result of deliberately selecting, cleaning, and structuring the information before it reaches the model. It contains what the model needs and omits what it does not. The effort required to prepare context is modest relative to the improvement it produces in response quality, and it is the most underused technique available to non-technical users.

Removing Noise

The first step in preparing context is removing content that will not contribute to the model's response. This includes:

  • Boilerplate headers, footers, and page numbers

  • Navigation elements and table of contents entries

  • Disclaimer and legal notice sections that are not relevant to the query

  • Formatting characters and encoding artefacts introduced during copy and paste

  • Redundant repetition of information already present elsewhere in the document

None of this content helps the model. All of it consumes context window capacity and dilutes attention from the content that matters.

Labelling and Metadata

After removing noise, the second step is adding labels that give the model structural orientation. A document pasted without any labelling is an undifferentiated block of text. A document with clear section labels, source identifiers, and date metadata is a navigable structure the model can reference precisely.

Effective labelling patterns include:

[SOURCE: Q3 2024 Financial Report | DATE: October 2024 | SECTION: Revenue Summary]

[paste section content here]

[SOURCE: Q3 2024 Financial Report | DATE: October 2024 | SECTION: Cost Analysis]

[paste section content here]

When multiple documents are provided together, source labels allow the model to attribute its response to specific sources and allow you to verify which document a given piece of information came from.

Summaries as Context Anchors

For long documents where you need the model to understand the overall structure before diving into a specific section, a brief summary at the beginning of the pasted content functions as a context anchor. It orients the model before it processes the detail, improving the coherence of responses that require understanding both the overall argument and specific passages.

Structured Data: Tables and Lists

Structured data such as tables and lists requires particular care in context preparation, because the way tabular information is represented in plain text significantly affects how reliably the model can parse it.

When pasting table data, markdown table format is more reliably processed than space-aligned plain text columns, which can become misaligned depending on the content:

For large datasets with many columns and rows, including a brief description of what the table represents and which columns are most relevant to the query prevents the model from having to infer the table's purpose from the data alone.

Pre-Processing Before Injection

For repeated tasks using the same source material, pre-processing the document once and saving the prepared version is significantly more efficient than preparing it fresh each time. A pre-processed version of a document might include:

  • Noise removed and formatting cleaned

  • Section labels and source metadata added

  • A summary anchor at the top

  • Irrelevant sections replaced with a one-line placeholder noting their omission

This prepared version becomes the context template for all subsequent tasks using that source, and the incremental time cost of preparation is amortised across every use.

Last updated

Was this helpful?