Financial Sentiment Analysis with Self-Training

Achieved 88.1% accuracy using only 60% labeled data through intelligent debiasing

The Challenge

Financial sentiment analysis with limited labeled data - a common industry problem where annotation costs $5-50 per document, making full supervision prohibitively expensive.

Innovation

Developed a self-training system with three debiasing strategies:

  • Confidence-Based Filtering: Only use predictions above 90% confidence
  • Ensemble Agreement: Multiple models must agree before accepting pseudo-labels
  • Distribution Matching: Prevent drift by maintaining label distributions

Results

  • 88.1% accuracy with only 60% labeled data
  • Closed 68% of the gap to fully-supervised learning
  • $20,000 savings in labeling costs
  • Applied to S&P 500 earnings calls for trading signals

Technical Approach

def generate_pseudo_labels(self, unlabeled_data):
    # Teacher creates labels for unlabeled data
    with torch.no_grad():
        logits = self.teacher(batch)
        probs = F.softmax(logits, dim=-1)

        # Only use high-confidence predictions
        max_prob, predicted = torch.max(probs, dim=-1)
        mask = max_prob > self.confidence_threshold

Self-training with debiasing achieved +5.8% improvement over baseline, demonstrating that implementation details matter more than the core technique.

Key Learning

Compound gains from multiple small improvements create large total gains. Domain pretraining (+6%), self-training (+1%), and debiasing (+5%) combined for +12% total improvement.

Read Full Technical Case Study