Model osiąga 94% accuracy, ale na prezentacji dla zarządu w Londynie nie wiesz, jak po angielsku wytłumaczyć, dlaczego recall jest ważniejszy niż precision dla tego konkretnego przypadku biznesowego. Możesz zbudować najlepszy model w firmie — ale jeśli nie potrafisz go obronić przed stakeholderami po angielsku, wyniki trafią do szuflady.

Z tego przewodnika skorzystają: Data Scientist, Senior Data Scientist, ML Engineer, Data Analyst, Research Scientist i Applied Scientist — wszyscy, którzy prezentują wyniki modeli klientom zagranicznym lub pracują w anglojęzycznych środowiskach data science.

Grupa 1: Machine Learning — podstawy

Termin angielski	Wymowa	Odpowiednik polski
Supervised learning	/ˌsuːpəvaɪzd ˈlɜːnɪŋ/	uczenie nadzorowane
Unsupervised learning	/ˌʌnˈsuːpəvaɪzd ˈlɜːnɪŋ/	uczenie nienadzorowane
Training data	/ˈtreɪnɪŋ ˈdeɪtə/	dane treningowe
Test data	/ˈtest ˈdeɪtə/	dane testowe
Overfitting	/ˌoʊvərˈfɪtɪŋ/	przetrenowanie
Underfitting	/ˌʌndərˈfɪtɪŋ/	niedotrenowanie
Cross-validation	/ˌkrɒs ˌvælɪˈdeɪʃən/	walidacja krzyżowa
Hyperparameter tuning	/ˌhaɪpəˌpærəˈmiːtər ˈtjuːnɪŋ/	strojenie hiperparametrów

Grupa 2: Ocena modelu (Model Evaluation)

Termin angielski	Wymowa	Odpowiednik polski
Accuracy	/ˈækjərəsi/	dokładność / trafność
Precision	/prɪˈsɪʒən/	precyzja
Recall	/rɪˈkɔːl/	czułość / kompletność
F1 score	/ˌef ˈwʌn skɔːr/	miara F1
Confusion matrix	/kənˈfjuːʒən ˈmeɪtrɪks/	macierz pomyłek
ROC curve	/ˌɑːr oʊ ˈsiː kɜːv/	krzywa ROC
AUC	/ˌeɪ juː ˈsiː/	pole pod krzywą ROC
Baseline model	/ˈbeɪslaɪn ˈmɒdəl/	model bazowy / punkt odniesienia

Grupa 3: Dane i cechy (Data & Features)

Termin angielski	Wymowa	Odpowiednik polski
Feature engineering	/ˈfiːtʃər ˌendʒɪˈnɪərɪŋ/	inżynieria cech
Data pipeline	/ˈdeɪtə ˈpaɪplaɪn/	potok danych
Data cleaning	/ˈdeɪtə ˈkliːnɪŋ/	czyszczenie danych
Missing values	/ˈmɪsɪŋ ˈvæljuːz/	wartości brakujące
Outlier	/ˈaʊtlaɪər/	wartość odstająca
Feature importance	/ˈfiːtʃər ɪmˈpɔːtəns/	ważność cechy

Grupa 4: Wdrożenie i MLOps

Termin angielski	Wymowa	Odpowiednik polski
Model deployment	/ˈmɒdəl dɪˈplɔɪmənt/	wdrożenie modelu
Model monitoring	/ˈmɒdəl ˈmɒnɪtərɪŋ/	monitorowanie modelu
Data drift	/ˈdeɪtə drɪft/	dryf danych
Model retraining	/ˈmɒdəl ˌriːˈtreɪnɪŋ/	ponowne trenowanie modelu
A/B test (model)	/ˌeɪ ˈbiː test/	test A/B dla modelu
Inference	/ˈɪnfərəns/	wnioskowanie / predykcja na żywo

Komunikacja po angielsku — 3 scenariusze

Scenariusz 1: Prezentacja wyników modelu stakeholderom (8 fraz)

Zarząd pyta o wyniki modelu do wykrywania fraudów. Model ma 94% accuracy — ale to nie jest właściwa metryka.

The model achieves 94% accuracy on the test set — however, accuracy alone is misleading for imbalanced datasets.
In fraud detection, we prioritize recall over precision — we want to catch as many fraud cases as possible, even at the cost of some false alarms.
Our precision is 71%, meaning 71% of flagged transactions are actual fraud.
Our recall is 89%, meaning we catch 89% of all fraudulent transactions in the dataset.
The business cost of a missed fraud case significantly outweighs the cost of a false positive.
The F1 score of 0.79 balances both precision and recall into a single metric.
Compared to our baseline model, we've improved recall by 18 percentage points.
I recommend we set the classification threshold at 0.35 rather than the default 0.5 to optimize for recall.

Scenariusz 2: Eskalacja problemu z jakością danych (6 fraz)

Model zaczyna gorzej działać na produkcji. Podejrzewasz data drift.

We're observing significant data drift in the input features — the production distribution has shifted from what the model was trained on.
Model performance has degraded over the past two weeks — precision dropped from 71% to 58%.
The root cause appears to be a change in how the upstream team encodes customer segment data.
We need to flag this to the data engineering team — the pipeline is producing inconsistent feature values.
I recommend triggering a model retraining cycle with data from the last 90 days.
Until the issue is resolved, I suggest rolling back to the previous model version as a fallback.

Scenariusz 3: Debrief eksperymentu z teamem (6 fraz)

Porównałeś XGBoost (AUC 0.82) z siecią neuronową (AUC 0.91).

I ran a comparative experiment between XGBoost and a neural network architecture.
The neural network achieved an AUC of 0.91 versus 0.82 for the XGBoost baseline — a meaningful improvement.
I used 5-fold cross-validation to ensure the results generalize beyond the training set.
The neural network required significantly more hyperparameter tuning, but the performance gain justifies the effort.
One concern is inference time — the neural network is 3x slower, which matters for real-time scoring.
My recommendation is to proceed with the neural network for the batch use case and keep XGBoost for the real-time endpoint.

Dialogue — Precision vs Recall dla Product Managera

Data Scientist wyjaśnia precision vs recall trade-off niestechnicznemu Product Managerowi.

Product Manager: The model is 94% accurate — that's great, right? Why are we still getting complaints from the fraud team?

Data Scientist: Accuracy can be misleading here. Let me explain with numbers. Out of 10,000 transactions, only 200 are fraudulent — that's 2% of the dataset. A model that flags nothing as fraud would still be 98% accurate, but completely useless.

Product Manager: Okay, so what should we be looking at?

Data Scientist: We care about recall — the percentage of actual fraud cases we catch. Right now our recall is 71%, which means we're missing 29% of fraud. For every 100 fraudulent transactions, 29 slip through undetected.

Product Manager: And what's precision?

Data Scientist: Precision tells us how often we're right when we raise an alarm. At 71% precision, about 29% of our fraud alerts are false positives — legitimate transactions that get flagged.

Product Manager: So there's a trade-off?

Data Scientist: Exactly. We can tune the model to catch more fraud — higher recall — but we'll also raise more false alarms, which frustrates legitimate customers. The business needs to decide: what's more costly, missed fraud or false alarms?

Product Manager: Missed fraud, clearly. Can we fix that?

Data Scientist: Yes — I can lower the classification threshold from 0.5 to 0.35. That will push recall up to around 89%, though precision will drop to about 65%. I'd recommend running an A/B test on 10% of traffic first to validate the impact before full rollout.

Model Card Vocabulary — dokumentacja modelu po angielsku

6 zwrotów do dokumentacji modelu (Model Card)

Przy wdrożeniu modelu w środowiskach regulowanych coraz częściej wymagany jest Model Card — ustandaryzowany dokument opisujący model.

„Intended use: this model is designed for binary classification of financial transactions as fraudulent or legitimate."
„Limitations: the model was trained on European transaction data and may underperform on markets with different spending patterns."
„Training data: the model was trained on 2.4 million anonymised transactions from Q1 2024 – Q4 2025."
„Evaluation metrics: we report precision, recall, F1 score, and AUC across three demographic subgroups."
„Ethical considerations: the model was audited for bias across age groups and geographies before deployment."
„Out-of-scope use: this model should not be used for credit scoring or insurance risk assessment without revalidation."

5 typowych błędów Data Scientistów mówiących po angielsku

❌ Błąd 1: Mylenie accuracy, precision i recall w rozmowie biznesowej
To trzy różne metryki — ale nagminnie używane zamiennie przez osoby spoza ML.
✅ Zawsze podawaj konkretne liczby i wyjaśniaj, co dana metryka oznacza dla biznesu: „the model achieves high recall — it catches 89% of fraud cases — though precision is moderate at 71%."

❌ Błąd 2: „The model is 94% accurate" bez kontekstu
Dla biznesu to zdanie może być mylące lub nieistotne bez wyjaśnienia class imbalance.
✅ „The model achieves 94% accuracy on the test set, but given the class imbalance — only 2% of transactions are fraudulent — accuracy is not the right metric for this use case."

❌ Błąd 3: Wymowa „overfitting"
Akcent pada na „fit", nie na „over": /ˌoʊvərˈfɪtɪŋ/.
✅ Ćwicz: „The model is overfitting to the training data."

❌ Błąd 4: „Data" — liczba pojedyncza czy mnoga?
American English: „The data shows a clear trend." British English / formalny: „The data show a clear trend." Oba poprawne — wybierz jeden styl i bądź konsekwentny.

❌ Błąd 5: „feature" vs „variable" vs „attribute"
Feature dominuje w ML, variable w statystyce, attribute w bazach danych.
✅ W rozmowie z ML teamem używaj feature. Z analitykami statystycznymi — variable. Z inżynierami danych — column lub attribute.

Szybka ściągawka — najważniejsze zwroty

Sytuacja	Zwrot angielski
Wyjaśnianie metryki	Accuracy alone is misleading here — let me walk you through precision and recall.
Trade-off decyzja	There's a trade-off between precision and recall — what's the business priority?
Data drift	We're observing data drift — the production distribution has shifted.
Retraining	I recommend triggering a retraining cycle with the last 90 days of data.
A/B test modelu	I'd suggest running an A/B test on 10% of traffic before full deployment.
Dokumentacja	As per the model card, the intended use is limited to European transaction data.
Baseline	Compared to the baseline model, we've improved recall by 18 percentage points.

Podsumowanie

Angielski w Data Science to nie tylko znajomość terminologii — to umiejętność tłumaczenia wyników modelu na język biznesowy. Różnica między „the model is 94% accurate" a wyjaśnieniem precision-recall trade-off w kontekście kosztów biznesowych może zadecydować o tym, czy Twój model zostanie wdrożony.

Jeśli pracujesz w szerszym środowisku IT, uzupełnij słownictwo: angielski dla AI/ML Engineera, słownictwo IT po angielsku oraz angielski dla Product Managera IT. Ćwicz to słownictwo z flashcardami w sekcji fiszek dla IT & Programowanie.

Angielski dla Data Scientista — machine learning, modele i analityka po angielsku