About AMR and Stata
The IBM Stata software will help us to analyze the antimicrobial resistance data. We can use cross tabulation, pearson's chi-squared test, bar, pie, line, box, histogram, and regression analysis to determine frequency distributions.
Outine
- Stata installtion
- Import CSV data
- Data cleaning
- Frequency check
- Number of isolates count by year
- Cross tabulation
- Pearson's Chi-squared test
- Plotting results
- Yearly antibiotics by organism
1. Stata installation
Visit the following website and follow the installation steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-1.pdf
2. Import CSV/ Excel data
Visit the following website and follow the steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-3.pdf
Output:
3. Data management
Visit the following website and follow the steps: https://julhas.com/jsedutech/materials/Level-1/Stata-Session-6.pdf
4. Frequency check
Frequency check - Gender, Year, Sample, Organism, Age group: A frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample.
Commands: tab sex tab year tab sample tab organism Output: . count 23,411 . tab sex Sex | Freq. Percent Cum. ------------+----------------------------------- f | 13,568 57.96 57.96 m | 9,843 42.04 100.00 ------------+----------------------------------- Total | 23,411 100.00 tab year Year | Freq. Percent Cum. ------------+----------------------------------- 2017 | 6,878 29.38 29.38 2018 | 8,072 34.48 63.86 2019 | 8,461 36.14 100.00 ------------+----------------------------------- Total | 23,411 100.00 . tab sample Sample | Freq. Percent Cum. ------------+----------------------------------- Blood | 1,214 5.19 5.19 Pus | 5,753 24.57 29.76 Sputum | 1,079 4.61 34.37 Stool | 2 0.01 34.38 Urine | 12,814 54.73 89.11 Wound Swab | 2,549 10.89 100.00 ------------+----------------------------------- Total | 23,411 100.00 . tab organism Organism | Freq. Percent Cum. ----------------------+----------------------------------- Escherichia coli | 11,887 50.78 50.78 Klebsiella sp. | 7,709 32.93 83.70 Staphylococcus aureus | 3,815 16.30 100.00 ----------------------+----------------------------------- Total | 23,411 100.00
5. Number of isolates count by year
Commands: count if year == 2019 tab amk if year == 2017 Output: . count if year == 2019 8,461 . tab amk if year == 2017 AMK | Freq. Percent Cum. ------------+----------------------------------- I | 2 0.03 0.03 R | 999 14.91 14.94 S | 5,697 85.06 100.00 ------------+----------------------------------- Total | 6,698 100.00
6. Cross tabulation
Cross tabulation: It is used to quantitatively analyze the relationship between multiple variables.
Commands: tab sex year tab organism year, col nofreq // Display only percentage tab sample year, col // Number and percentage Output: . tab sex year | Year Sex | 2017 2018 2019 | Total -----------+---------------------------------+---------- f | 4,038 4,602 4,928 | 13,568 m | 2,840 3,470 3,533 | 9,843 -----------+---------------------------------+---------- Total | 6,878 8,072 8,461 | 23,411 . tab organism year, col nofreq // Display only percentage | Year Organism | 2017 2018 2019 | Total ----------------------+---------------------------------+---------- Escherichia coli | 54.04 49.55 49.28 | 50.78 Klebsiella sp. | 28.58 34.64 34.83 | 32.93 Staphylococcus aureus | 17.37 15.81 15.88 | 16.30 ----------------------+---------------------------------+---------- Total | 100.00 100.00 100.00 | 100.00 . tab sample year, col // Number and percentage +-------------------+ | Key | |-------------------| | frequency | | column percentage | +-------------------+ | Year Sample | 2017 2018 2019 | Total -----------+---------------------------------+---------- Blood | 361 399 454 | 1,214 | 5.25 4.94 5.37 | 5.19 -----------+---------------------------------+---------- Pus | 1,578 1,979 2,196 | 5,753 | 22.94 24.52 25.95 | 24.57 -----------+---------------------------------+---------- Sputum | 336 376 367 | 1,079 | 4.89 4.66 4.34 | 4.61 -----------+---------------------------------+---------- Stool | 0 0 2 | 2 | 0.00 0.00 0.02 | 0.01 -----------+---------------------------------+---------- Urine | 3,993 4,256 4,565 | 12,814 | 58.05 52.73 53.95 | 54.73 -----------+---------------------------------+---------- Wound Swab | 610 1,062 877 | 2,549 | 8.87 13.16 10.37 | 10.89 -----------+---------------------------------+---------- Total | 6,878 8,072 8,461 | 23,411 | 100.00 100.00 100.00 | 100.00
7. Pearson's Chi-squared test
Pearson's Chi-squared test: Pearson's chi-squared test (χ2) is a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance.
Commands: tabulate ageGroup1 sample, chi2 tabulate organism sample, chi2 Output: . tabulate ageGroup1 sample, chi2 | Sample ageGroup1 | Blood Pus Sputum Stool Urine Wound S.. | Total ------------+------------------------------------------------------------------+---------- <=4 Years | 47 66 0 1 241 202 | 557 5-14 Years | 21 54 2 0 103 347 | 527 15-24 Years | 27 120 23 1 257 329 | 757 25-34 Years | 34 299 41 0 731 328 | 1,433 35-44 Years | 117 818 91 0 1,388 275 | 2,689 45-54 Years | 247 1,580 215 0 2,792 404 | 5,238 55-64 Years | 337 1,636 299 0 3,503 392 | 6,167 >= 65 Years | 384 1,180 408 0 3,799 272 | 6,043 ------------+------------------------------------------------------------------+---------- Total | 1,214 5,753 1,079 2 12,814 2,549 | 23,411 Pearson chi2(35) = 3.9e+03 Pr = 0.000 . tabulate organism sample, chi2 | Sample Organism | Blood Pus Sputum Stool Urine Wound S.. | Total ----------------------+------------------------------------------------------------------+---------- Escherichia coli | 640 1,041 105 2 9,800 299 | 11,887 Klebsiella sp. | 351 2,299 848 0 2,508 1,703 | 7,709 Staphylococcus aureus | 223 2,413 126 0 506 547 | 3,815 ----------------------+------------------------------------------------------------------+---------- Total | 1,214 5,753 1,079 2 12,814 2,549 | 23,411 Pearson chi2(10) = 9.9e+03 Pr = 0.000
8. Plotting results
// Histogram histogram age // Box plot graph box age // Bar graph graph bar (count), over(ageGroup1) stack // Bar graph with more than one categorical variables graph bar (count), over(ageGroup1) over(sex) stack // Pie chart graph pie, over(sex) graph pie, over(sex) title(Gender distribution) graph pie, over(sex) title(Gender distribution) legend(on) scheme(s2color) Output:
9. Yearly antibiotics by organism
Commands: tab amc year if organism == "Escherichia coli", col nofreq tab amc year if organism == "Klebsiella sp.", col nofreq tab amc year if organism == "Staphylococcus aureus", col Output: . tab amc year if organism == "Escherichia coli", col nofreq | Year AMC | 2017 2018 2019 | Total -----------+---------------------------------+---------- R | 75.14 70.07 73.57 | 72.84 S | 24.86 29.93 26.43 | 27.16 -----------+---------------------------------+---------- Total | 100.00 100.00 100.00 | 100.00 . tab amc year if organism == "Klebsiella sp.", col nofreq | Year AMC | 2017 2018 2019 | Total -----------+---------------------------------+---------- R | 68.89 71.05 70.59 | 70.33 S | 31.11 28.95 29.41 | 29.67 -----------+---------------------------------+---------- Total | 100.00 100.00 100.00 | 100.00 . tab amc year if organism == "Staphylococcus aureus", col +-------------------+ | Key | |-------------------| | frequency | | column percentage | +-------------------+ | Year AMC | 2017 2018 2019 | Total -----------+---------------------------------+---------- R | 412 446 530 | 1,388 | 38.98 36.44 40.30 | 38.60 -----------+---------------------------------+---------- S | 645 778 785 | 2,208 | 61.02 63.56 59.70 | 61.40 -----------+---------------------------------+---------- Total | 1,057 1,224 1,315 | 3,596 | 100.00 100.00 100.00 | 100.00