Data Portfolio · 2026
Full Lifecycle Data Science
Juan de Dios Alvarez
// candidate_profile · data_story.html

Juan de Dios
Alvarez

Data Scientist  ·  Digital Transformation  ·  Full Lifecycle

From processing cosmic ray events at the HAWC Observatory to rebuilding agricultural credit analysis at a national financial institution — twenty years of turning complex, untamed data into decisions that move organizations forward. The source and format change every time. The discipline does not.

// profile.narrative()
"Translate data into actionable insights through data storytelling, empowering strategic decisions." — Data Scientist · Role Description

That is what I have done across seven industries, from particle physics to agricultural finance. The foundation came from the HAWC Observatory — a multinational cosmic ray experiment with distributed computing, petabyte-scale event data, and statistical analysis across institutions in the US and Mexico. That standard held: when the data is hard, the discipline matters most.

In industry, the data changed. At D3Clarity, I turned thousands of unstructured public school board meeting minutes into a competitive intelligence dashboard used daily by a sales team. The core task — classify text, surface patterns, structure the unlabeled — was familiar ground: my MSc thesis at the National Institute of Astrophysics addressed automatic text classification through prototype-based learning. That was 2009. At FIRA, I replaced manual Excel workflows with a governed pipeline integrating satellite imagery, climate models, and regulatory databases into credit decisions for agricultural producers. At CITEIM, I applied Transformer architectures to sign language recognition. The tools advanced. The method held.

Every engagement followed the same sequence: establish data quality, build outputs accessible to non-technical users, demonstrate value early, then scale. What makes work at this level engaging is the specific challenge it carries: shaping how data science functions inside an organization — building the culture, not executing within one that already exists. My technical trajectory has followed the broader AI evolution, from classical ML through production systems to generative AI. That progression is something to build on, not replace.

01 · Governance
02 · Literacy
03 · Early PoC
04 · Production
// career_metrics.json
0
Years with
data
0
Scientific
publications
orcid
0
Industries
served
CERN
International
research
E2E
Full lifecycle
ownership
// requirements_mapping.compare()

The role. The context.

Excerpts from the role requirements, with context from the work.

The requirement
The work

"Collaborate within Agile squads to design and implement machine learning models that solve real business problems."

D3Clarity · Agile · 4 yrs

Four years at D3Clarity in Agile/Scrum with Jira across distributed international teams. End-to-end ML delivery — NLP classification from unstructured public documents to Power BI dashboards used by sales teams. Every project: PoC → production.

"Manage the model lifecycle: from ideation and data preparation to deployment and scaling on Cloud platforms using modern MLOps practices."

FIRA · AWS · Azure ML

At FIRA: public source ingestion → geospatial validation → climate risk integration → governance protocols → Apache Superset dashboards. Stack: Python, PostgreSQL, Linux, AWS EC2, Azure ML, SageMaker. AWS Cloud Practitioner certified. Google Workspace administrator (own domain, independently configured) — platform selection by fit, not familiarity.

"Translate data into actionable insights through data storytelling, empowering strategic decisions."

FIRA · D3Clarity · Tec MTY

Dashboards built for non-technical credit analysts at FIRA — no training required for daily use. Executive KPI reporting at Punto Singular. Taught Tableau-based dashboarding to Business School students at Tec de Monterrey. Power BI · Tableau · Looker · Apache Superset.

"Contribute to building a strong data culture, ensuring quality, ethics, and compliance in every solution."

FIRA · Semarchy xDM

Defined governance frameworks and quality protocols at FIRA. Incorporated environmental compliance layers into credit assessment. Led a mentorship program building the next generation of practitioners. Semarchy xDM certified — MDM, Data Profiling, Data Lineage, Golden Records.

"Degree in a quantitative field. Fluency in English. Italian is a plus."

PhD · MSc · D3Clarity

PhD Candidate in Computer Physics · MSc Computer Science (ML/NLP) · BSc Physics & Mathematics. English: professional — 4+ years with US and European clients. Italian: basic-intermediate — ongoing private tutoring, native Spanish provides structural advantage for rapid progression.

// career.trajectory()

Twenty years. One through-line.

Scientific rigor applied across an expanding surface area of domains and industries.

Dec 2002 — CERN · University of Lausanne, Switzerland
DAQ System Member
CERN / University of Lausanne
First contact with large-scale scientific data processing. Signal conversion and data handling for a Positron Emission Tomography scanner project — at one of the world's foremost research facilities. The benchmark for data precision was set here.
2004 — 2019 · Mexico · US collaboration
Researcher, Lecturer & Multi-Industry Consultant
HAWC Collaboration · Tec de Monterrey · UMSNH · 7 private clients
PhD candidate processing cosmic ray datasets at petabyte scale (HAWC: 14 co-authored publications, multinational experiment). Taught data and programming at university level. Consulted across real estate, media, healthcare, education, and public sector — building intuition for how different industries structure and consume data.
Feb 2019 — Mar 2023
Data Scientist
D3Clarity (formerly Viqtor Davis North America)
Hired to build the analytics practice from scratch — expanding the company's offering beyond MDM into data insights. Four years of end-to-end delivery: NLP pipelines, ETL, data lakes, MDM, forecasting models, and executive dashboards for international clients. Agile/Scrum · Jira · Azure ML · AWS · Power BI · Tableau.
Sep 2023 — Aug 2024
Data Scientist & Team Lead
BairesDev · Punto Singular
KPI reporting and quality dashboards. Prototyped streaming financial data pipelines on AWS. Graph database modeling (Neo4j) for relationship analysis. Designed a RAG and NLP pipeline for a legal services client — contract insight extraction per lawyer, making unstructured legal text queryable. Founded a mentorship group: bachelor-level students solving real problems for nonprofits — data science with social responsibility as a design constraint.
Oct 2024 — Jan 2025
Data Scientist
CITEIM Research Center
Transformer-based (LLM) models for continuous sign language recognition, integrating cloud services and pre-trained language models for real-time interpretation. Demonstrated that ML techniques developed in one domain translate directly to novel, high-impact problems.
Jan 2025 — Jan 2026
Data Scientist
FIRA — Agricultural Development Trust (Mexico)
Led full digital transformation: manual Excel workflows replaced by an automated, governed data pipeline. Integrated satellite vegetation health indices, climate risk models, and regulatory databases (protected zones) into credit assessment for agricultural producers. Outputs designed for non-technical credit analysts — zero technical training required for daily use.
// signature_work.highlight()

Three proofs that started as PoCs.

Each one began with a focused hypothesis. Each scaled once value was demonstrated.

01
NLP · Sales Intelligence · D3Clarity

Public Records → Competitive Intelligence Dashboard

A client needed to sharpen their sales strategy for school district products across the US. School board meeting minutes are public record. I built the full pipeline: automated scraping → NLP processing → ML topic classification → Power BI dashboard mapping which products to pitch to each district based on what they are actively discussing at the board level. Public data transformed into a scalable commercial tool.

Key relevance: End-to-end ownership. Unstructured data → structured commercial insight. PoC → production. Business-facing output built for non-technical users.
02
Geospatial · Digital Transformation · FIRA

Agricultural Credit: Excel to Governed Automated Pipeline

Replaced ad-hoc spreadsheet processes with an automated pipeline that integrates satellite-derived vegetation health indices, multi-source climate risk models, and regulatory databases (protected environmental zones). Credit analysts receive structured, repeatable reports — no manual data aggregation, no ad-hoc interpretation. Data governance protocols established across all source feeds.

Key relevance: Digital transformation. Multi-source data integration. Data culture built from scratch. Non-technical stakeholder adoption by design.
03
Deep Learning · Cloud Services · CITEIM

Sign Language Recognition with Transformer Architectures

Designed and implemented Transformer-based models for continuous sign language recognition, integrating cloud services and pre-trained language models to enable real-time interpretation. Brought current-generation AI architectures to a problem with direct human communication impact — demonstrating that modern ML methods generalize across domains when the fundamentals are solid.

Key relevance: Current-generation AI (LLMs, Transformers). Cloud deployment. Full MLOps lifecycle. Research → applied production system.
// tech_stack.inventory()

Technical capabilities.

Tools are means, not ends — selected for fit, not for familiarity.

Machine Learning & AI

NLP / LLMs
Transformers
Classification
Forecasting
Geospatial ML

Data Engineering

Python
SQL
ETL / Pipelines
MDM / Governance
AWS · Azure · GCP

Visualization & Storytelling

Power BI
Tableau
Apache Superset
Data Literacy

Infrastructure & Methods

Linux / SysAdmin
Agile / Scrum
Team Leadership
Data Governance
// alignment.check("values")

Beyond the technical.

Values, language, and context.

Values in practice.

  • Authenticity Proof of Concepts that demonstrate real value before committing to scale. No inflated claims — every output is auditable and every conclusion is traceable to the data.
  • Passion for Excellence Twenty years of problems that require precision: cosmic ray detection, regulatory credit compliance, sign language recognition at scale. The standard does not vary by domain or budget.
  • Responsibility Data governance and quality protocols designed as structural constraints, not afterthoughts. Environmental compliance layers in credit assessment. Ethics in data is a design input — not a checkbox. This extends to the field: a mentorship group where practitioners solve real problems for nonprofits, building the next generation with accountability as a design requirement.
  • Inventiveness Turned public school board minutes into a sales intelligence system. Brought ML into agricultural credit in Mexico. Applied Transformer architectures to human communication access. The discipline is constant; the applications keep expanding.

Languages & readiness.

  • Español Native
  • English Professional
    International work with US and European clients across 4+ years at D3Clarity.
  • Italiano Basic–Intermediate · In progress
    Ongoing private tutoring. Native Spanish provides structural advantage — grammar transfer accelerates progression. NLP-domain vocabulary in Italian is already navigable: a practical advantage at the intersection of data science and European markets.
  • Français Basic
Relocation

Available to relocate. Open to international positions. Logistical transition is not a constraint.

"The source and format of the data change every time; the discipline does not."
Juan de Dios Alvarez
Data Scientist
Morelia, Mexico
Contact available upon request