The Game of Thrones is one of my favorite American dramas, so I downloaded the dataset and analyzed it through Kaggle. The data name is interpreted as follows:
Name: War of names, character variables. Year: The number of years that the war occurred, a numeric variable. Battle_number: Unique ID in this data, corresponding to each of the independent battles, numeric variables. Attacker_king: The king of the attacking party, "/" represents the king's replacement. For example: "Joffrey/tommen Baratheon" means that Tomen Baratheon inherits the throne of Joffrey, categorical variables. Defender_king: The defender's King, the categorical variable. Attacker_1: Attacking party generals, character variables. Attacker_2: Attacking party generals, character variables. Attacker_3: Attacking party generals, character variables. Attacker_4: Attacking party generals, character variables. Defender_1: Defensive generals, character variables. Defender_2: Defensive generals, character variables. Defender_3: Defensive generals, character variables. Defender_4: Defensive generals, character variables. Attacker_outcome: The outcome of the war from the point of attack, respectively: win, loss, draw, categorical variables. Battle_type: The category of war. Pitched_battle: The armies of both sides meet and fight in one place, which is also the most basic category of warfare; Ambush: A war with stealth or trickery as the main means of attack; siege: Positional warfare; razing: An attack on an unprotected location. Categorical variables. Major_death: Whether there are important characters in the death, binary variables. Major_capture: Whether there are important people arrested, binary variables. Attacker_size: The size of the attacking force does not distinguish between cavalry, infantry and other soldier species, numerical variables. Defender_size: The size of the defensive force, not the cavalry, infantry and other types of soldiers are differentiated, numerical variables. Attacker_commander: The main commander of the attacking party. The commander's name does not contain a title, and the names of different commanders are separated by commas and character variables. Defender_commander: The main commander of the defensive side. The commander's name does not contain a title, and the names of different commanders are separated by commas and character variables. Summer: Whether the war occurred in summer, binary variables. Location: Where the war took place, character variables. Region: Regions where the war took place, including: Beyond The Wall, the north, the Iron Islands, the Riverlands, the Vale of Arryn, the Westerlands, the Crownlands, the Reach, the Stormlands, Dorne, categorical variables. Note: note, character variable.
First, let's ask the question:
1. Each King attack mode
2. Important people who die or are captured every year
3. Number of important persons killed or captured in each region
4. Is the outcome of the war related to the number of troops?
1 Importing Packages
# to Do:load Pacakges
Import Pandas as PDimport numpy as NPimport matplotlib.pyplot as Plt C7>import Seaborn as SNS%matplotlib inline
1 Collecting data
# To do:load the datasetdf = pd.read_csv ('battles.csv') df.columns
1.1 Viewing data types
Df.info ()
Df.describe ()
Collect and view the data, we want to organize the data
Two data collation
2.1 First Backup data
# Backup
DF1 = Df.copy ()
By observing the data we found the following error:
Quality Error:
1.defender3 and Defender4 are both NaN, and their type is float.
2. Some data loss, solve the loss problem.
3. Year and Battle_number type error
#remove ' attacker_2 ', ' attacker_3 ', ' attacker_4 ', ' defender_2 ', ' defender_3 ', ' defender_4 ', ' Note ' columnData_game_clean = Data_game_clean.drop (['attacker_2','attacker_3','attacker_4','defender_2','defender_3','defender_4','Note'], Axis=1)
data_game_clean['attacker_king'= data_game_clean['attacker_king ']. Astype ('category')
data_game_clean[' defender_king ' = data_game_clean[' defender_king '].astype (' category ')
data_game_clean[' battle_type ' = data_game_clean[' Battle_type '].astype (' category ')
data_game_clean[' region ' = data_game_clean[' region '].astype (' category ')
# year and Battle_number type error, need to convert int to object data_game_clean[ " year ' ] = Data_game_clean[ " Year "].astype (" object " ) Data_game_clean [ " battle_number " ] = Data_game_clean[ " Battle_number "].astype (" object ")
# Check if there is a duplicate value sum (data_game_clean.duplicated ())
View Results
Data_game_clean.attacker_outcome.head ()
# Check if there is a duplicate value sum (df1.duplicated ())
Data exploration analyze the number of attacks per king
data_game_clean['attacker_king'].value_counts (). Plot (kind='Barh ', rot=45) plt.show ()
The first is that Joffrey/tommen Baratheon 14 battles as the attacking side, mainly because the Lords did not recognize Joffrey's rightful place because Joffrey was Cersei and his brother's child, and the blood of the Lannister family was dripping from the body, In order for the Lords to recognize their rightful place, they fought a great deal. The second is Robb Stark, because his father was murdered to start the war of revenge.
Each King attack way
Sns.set (style="darkgrid") sns.countplot (y='battle_type ', hue='attacker_king', data = df1) plt.legend (Bbox_to_anchor= (1.05, 1)) plt.show ()
This battle is divided into four types, namely, skirmish, ambush, siege, razing (do not understand this meaning), you can see Joffrey/tommen Baratheon 14 war has 6 encounters, 3 Ambush War and 5 siege war, Stark, who has 5 ambushes and 3 encounters and 2 siege battles in 10 wars, can see that Joffrey/tommen Baratheon prefers encounters, and Stark prefers ambush battles. It is also possible to see that only Stannis Baratheon has been razing.
Number of important persons killed or captured in each region
# remove null value for Major_death major_capture df0 = data_game_clean.dropna (subset = ['major_death') 'major_capture')# to group each region and calculate Major_death major_capture and data = df0.groupby ('region'). sum () [['major_death ' ' major_capture ' ]]data
#将region计数, and convert to form and data merge PD. Concat([datadf0. Region. Value_counts(). To_frame1)
# sort p = p.sort_values ('region', ascending = False)
# Drawing P.plot.barh () Plt.xlabel('count') plt.title ('attacker_ Outcome_size')
It can be seen that in the Riverlands the most war, death and capture of the most people, and the bloody wedding also happened here, the Stark family here heavy casualties. Although there are many wars in the north, there are fewer important people to die.
Whether the outcome of the war is related to the number of troops
#remove ' attacker_size ', ' defender_size ', ' attacker_outcome ' 3 column null valuesDF2 = Data_game_clean.dropna (subset = ['attacker_size','defender_size','Attacker_outcome'])#Calculate the difference in the strength of offense and defenseDF3 = Df2.attacker_size-df2.defender_size#turn it into a dataframeDF3 = Df3.to_frame (name='size')#Merge This column into the DF1 tableresult = Pd.concat ([df2,df3],join='outer', Axis=1) Result.info ()
' attacker_size ' ' defender_size ', hue='attacker_outcome', fit_reg=false,data = data_game)
Whether the outcome of the war is not related to the number of troops, only 2 wars in the case of the strength of the victory, the other is beat, because the war has a lot of unpredictable, not many people can win the war.
Attackers = Df_data_game_clean.attacker_king.map (lambda x:str (x). Split ("," = [] for in attackers: = Np.append (Empty_array, i)
from Import = Wordcloud (width=1440, height=1080, relative_scaling=0.5, stopwords=['battle' ]). Generate ("". Join (Empty_array)) plt.figure (figsize= ()) plt.imshow (Cloud) Plt.axis ('off') plt.show ()
You can see that Offrey/tommen Baratheon is mentioned more, while Euron Greyjoy is the least mentioned.
Come to the conclusion
Conclusion:
1. Each King's attack: it can be seen that there are 6 encounters, 3 ambushes and 5 siege battles in the 14 wars of the Joffrey/tommen Baratheon, while Stark is 10 wars with 5 ambushes and 3 encounters and 2 siege battles that can be seen joffrey/ Tommen Baratheon prefers encounters, while Stark prefers ambush battles. It is also possible to see that only Stannis Baratheon has been razing.
2. Important persons who die or are captured each year: the most important figures in the 299 years of war capture and death are likely to be related to the number of wars that have occurred, as the number of wars in 299 years has been greatest.
3. Number of important persons killed or captured in each region: it can be seen that the Riverlands has the most wars, death and capture of the most people, and the bloody wedding is also happening here, the Stark family here heavy casualties. Although there are many wars in the north, there are fewer important people to die.
4. Whether the outcome of the war is related to the number of troops: whether the outcome of the war is not related to the number of troops, only 2 wars are victorious in the strength of the force, others are beat, because war has many unpredictability, not many people can win the war.
5. You can see that Offrey/tommen Baratheon is mentioned more, while Euron Greyjoy is the least mentioned.
Using Python to analyze the Game of Thrones five Kings battle data