OCR/IDP Data Labelling & Validation Specialist - Contract -
Important Note
This is a project-based contract role with an initial 6-month duration. While contract extensions may be offered based on performance and business needs, this role does not convert to full-time employment unless explicitly stated.
Position Overview
We are seeking detail-oriented Data Labeling & Validation Specialists to support ABBYY's OCR and Intelligent Document Processing (IDP) systems.
This role combines hands-on document annotation with structured validation of automated labeling outputs. You will play a key role in the human-in-the-loop pipeline, ensuring machine learning models are trained on high-quality, accurate ground truth data.
Success in this role requires prior hands-on annotation experience and the ability to evaluate whether automated outputs meet quality expectations, identify error patterns, and provide structured feedback to improve model performance.
Key Responsibilities
Document Annotation
- Annotate semi-structured and unstructured documents across diverse formats and domains
- Perform labeling across key IDP elements, including:
- Text recognition (including handwriting)
- Document classification
- Field extraction (PII, dates, amounts, signatures, etc.)
- Table detection and structure
- Label document layout elements such as zones, reading order, and hierarchy
- Verify OCR output accuracy and correct recognition errors
- Handle complex or ambiguous document formats beyond automated capabilities
- Maintain high levels of accuracy and consistency across all annotation tasks
Auto-Label Validation & Error Analysis
- Review sampled subsets of auto-labeled outputs and validate against ground truth
- Identify, categorize, and document errors-including distinguishing:
- Isolated issues
- Systematic failure patterns across document types
- Provide structured, actionable feedback to ML engineering teams
- Assess confidence scores and flag outputs below quality thresholds
- Track validation metrics over time and identify quality trends
Quality Assurance & Feedback
- Review annotations completed by other team members to ensure consistency
- Identify and document edge cases (e.g., unusual layouts, ambiguous fields)
- Participate in calibration sessions to align on annotation standards
- Provide feedback to improve annotation guidelines and workflows
- Adhere strictly to data privacy and confidentiality standards
Qualifications
Education & Experience
- High school diploma or equivalent; Associate's or Bachelor's degree preferred
- 1+ year of hands-on experience in document annotation or data labeling (direct annotation required)
- Proven ability to maintain high accuracy in repetitive, detail-oriented tasks
- Experience working with and following annotation guidelines
Technical Skills
- Familiarity with annotation tools and labeling platforms
- Understanding of document structure and layout types
- Basic knowledge of data privacy and security practices
- Reliable computer and high-speed internet connection
- Strong English reading comprehension and written communication skills
Analytical Skills
- Ability to distinguish between isolated errors and systematic issues
- Strong pattern recognition across large datasets
- Critical thinking to evaluate ambiguous cases and escalate appropriately
- High attention to detail when reviewing auto-generated outputs
Preferred
- 1-2 years of experience in OCR, IDP, or document labeling workflows
- Experience with auto-labeling systems or AI-assisted annotation tools
- Background reviewing or auditing machine-generated outputs
- Familiarity with inter-annotator agreement and data quality metrics
- Domain expertise in document-heavy industries (e.g., finance, legal, healthcare)
- Proficiency in languages beyond English
- Experience with spreadsheets, data tracking, or reporting tools
Compensation & Benefits
- Competitive hourly rate (based on location and experience)
- Flexible schedule within project deadlines
- Remote work environment
What You'll Gain
- Hands-on experience with real-world AI/ML data pipelines
- Direct collaboration with machine learning engineers
- Exposure to auto-labeling systems and document AI technologies
- Development of skills in data quality, validation, and error analysis
- Experience valuable for future roles in ML data operations, QA, or annotation engineering
Training & Support
- Structured onboarding (1-2 weeks) covering tools, workflows, and guidelines
- Ongoing support from project managers and technical teams
- Access to detailed documentation and best practices
- Regular performance feedback with metrics and improvement insights
Project Details
- Duration: 6-month contract (renewal based on performance and project needs)
- Workload: Typically 20-40 hours per week depending on project phase
- Team Structure: Distributed team with established communication channels
- Performance Metrics:
- Annotation accuracy
- Validation throughput
- Quality of error documentation
- Adherence to guidelines
Application Requirements
Please submit:
- Resume highlighting relevant annotation, data labeling, or QA experience
- Cover letter describing your approach to identifying errors in automated outputs
- Work samples (if available) demonstrating document labeling or review accuracy