1 00:00:05,686 --> 00:00:07,190 Thank you everyone for coming. 2 00:00:08,005 --> 00:00:12,345 If you were expecting the Postgres talk, that was the one before, so 3 00:00:12,345 --> 00:00:14,836 you might need to watch the video stream. 4 00:00:16,614 --> 00:00:18,120 So, Ansible best practices, 5 00:00:18,619 --> 00:00:22,249 I thought about calling it "Ansible, my best practices", 6 00:00:22,740 --> 00:00:29,806 so, just warning ahead, this is things I stumbled on using Ansible 7 00:00:29,806 --> 00:00:32,121 for the last 2-3 years and 8 00:00:32,121 --> 00:00:37,240 those are very specific things I found that worked very well for me. 9 00:00:39,075 --> 00:00:45,725 About me, I do also freelance work, do a lot of Ansible in there, 10 00:00:46,081 --> 00:00:51,901 I'm also the Debian maintainer for Ansible with Harlan Lieberman-Berg 11 00:00:54,055 --> 00:00:57,792 If there are any bugs in the package, just report them. 12 00:01:06,481 --> 00:01:10,217 The talk will be roughly divided into 4 parts. 13 00:01:14,523 --> 00:01:19,928 The first part will be about why you actually want to use config management 14 00:01:19,928 --> 00:01:23,478 and why you specifically want to use Ansible. 15 00:01:24,005 --> 00:01:30,340 So, if you're still SSHing into machines and editing config files, 16 00:01:30,340 --> 00:01:33,553 you're probably a good candidate for using Ansible. 17 00:01:35,627 --> 00:01:41,273 Then, the second part will be about good roles and playbook patterns 18 00:01:41,799 --> 00:01:43,910 that I have found that work really well for me. 19 00:01:47,130 --> 00:01:52,533 The third chapter will be about typical antipatterns I've stumbled upon, 20 00:01:52,533 --> 00:01:57,696 either in my work with other people using Ansible, 21 00:01:57,696 --> 00:02:00,740 or the IRC support channel, for example. 22 00:02:02,648 --> 00:02:08,537 The fourth part will be like advanced tips and tricks you can use 23 00:02:08,537 --> 00:02:11,476 like fun things you can do with Ansible. 24 00:02:12,945 --> 00:02:16,275 Quick elevator pitch, what makes config management good? 25 00:02:18,263 --> 00:02:25,492 It actually also serves as a documentation of changes on your servers over time 26 00:02:25,492 --> 00:02:29,237 so if you just put the whole config management in a git repo 27 00:02:29,237 --> 00:02:30,986 and just regularly commit, 28 00:02:30,986 --> 00:02:32,577 you will actually be able to say 29 00:02:32,577 --> 00:02:35,506 "Why doesn't this work? It used to work a year ago" 30 00:02:35,506 --> 00:02:38,717 You can actually check why. 31 00:02:41,399 --> 00:02:49,761 Also, most config management tools have a lot better error reporting than 32 00:02:49,761 --> 00:02:53,295 your self-written bash scripts that do whatever. 33 00:02:56,180 --> 00:03:02,891 And usually, you have a very good reproducibility with config management 34 00:03:02,891 --> 00:03:10,813 and also idempotency, meaning that if you run, for example, a playbook several times 35 00:03:10,813 --> 00:03:12,761 you will always get the same result. 36 00:03:14,751 --> 00:03:23,655 Also, it's great if you work in small team or you admin ??? in the company 37 00:03:23,655 --> 00:03:26,830 and you have some people working on a few things too. 38 00:03:29,384 --> 00:03:33,374 It makes team work a lot easier and you will save a lot of time actually 39 00:03:33,374 --> 00:03:35,890 debugging things when things break. 40 00:03:37,840 --> 00:03:39,426 What makes Ansible good? 41 00:03:40,273 --> 00:03:45,612 Comparing it to Chef or Puppet for example it's really easy to set up, 42 00:03:45,612 --> 00:03:50,369 you start with two config files, you have it installed and you're ready to go. 43 00:03:52,399 --> 00:03:56,419 It's also agentless, so whatever machines you actually want to control, 44 00:03:56,419 --> 00:04:05,237 the only thing you they really need to have is an SSH daemon and Python 2.6+ 45 00:04:05,480 --> 00:04:10,678 so that's virtually any Debian machine you have installed and 46 00:04:10,678 --> 00:04:12,710 that is still supported in any way. 47 00:04:15,084 --> 00:04:21,668 Ansible also supports configuration of many things like 48 00:04:21,668 --> 00:04:25,896 networking equipment or even Windows machines, 49 00:04:25,896 --> 00:04:30,742 they don't need SSH but they use the WinRM 50 00:04:30,742 --> 00:04:39,068 but Ansible came a bit late to the game so Ansible's still not as good 51 00:04:39,068 --> 00:04:41,435 in coverage like for example Puppet, 52 00:04:41,916 --> 00:04:46,555 which literally, you can configure any machine on the planet with that, 53 00:04:46,555 --> 00:04:48,389 as long as it has a CPU. 54 00:04:50,380 --> 00:04:53,918 Next step, I will talk about good role patterns. 55 00:04:57,010 --> 00:04:58,837 If you've never worked with Ansible before, 56 00:04:58,837 --> 00:05:01,835 this is the point when you watch the video stream, 57 00:05:01,835 --> 00:05:05,698 that you pause it and start working a few weeks with it 58 00:05:05,698 --> 00:05:08,380 and then unpause the actual video. 59 00:05:13,338 --> 00:05:17,529 A good role should ideally have the following layout. 60 00:05:18,790 --> 00:05:24,970 So, in the "roles" directory, you have the name of the role and task/main.yml 61 00:05:25,945 --> 00:05:29,202 You have the following rough layout. 62 00:05:31,561 --> 00:05:38,720 At the beginning of the role, you check for various conditions, 63 00:05:38,720 --> 00:05:44,085 for example using the "assert" task to for example check that 64 00:05:44,085 --> 00:05:48,190 certain variables are defined, things are set, 65 00:05:48,190 --> 00:05:53,235 that it's maybe part of a group, things like that you actually want to check. 66 00:05:54,660 --> 00:06:03,122 Then, usually, you install packages, you can use apt, or on CentOS machines, yum 67 00:06:03,504 --> 00:06:05,464 or you can do a git checkout or whatever, 68 00:06:07,092 --> 00:06:14,106 then usually you do some templating of files where you have certain abstraction 69 00:06:14,106 --> 00:06:18,579 and the variables are actually put into the template and 70 00:06:18,579 --> 00:06:21,027 make the actual config file. 71 00:06:22,480 --> 00:06:26,639 There's also good to point out that the template module actually has 72 00:06:26,639 --> 00:06:29,934 a "validate" parameter, 73 00:06:30,343 --> 00:06:35,953 that means you can actually use a command to check your config files for syntax errors 74 00:06:35,953 --> 00:06:44,193 and if that fails, your playbook will fail before actually deploying that config file 75 00:06:44,193 --> 00:06:53,179 so you can for example use Apache with the right parameters to actually do 76 00:06:53,179 --> 00:06:56,670 a check on the syntax of the file. 77 00:06:57,238 --> 00:07:01,829 That way, you never end up with a state where there's a broken config. 78 00:07:03,581 --> 00:07:05,410 In the end, you usually… 79 00:07:06,018 --> 00:07:10,038 When you change things, you trigger handlers to restart any daemons. 80 00:07:12,448 --> 00:07:23,623 If you use variables, I recommend putting sensible defaults in 81 00:07:23,623 --> 00:07:26,753 defaults/main.yml 82 00:07:28,126 --> 00:07:34,798 and then you only have to override those variables on specific cases. 83 00:07:35,486 --> 00:07:41,260 Ideally, you should have sensible defaults you want to have to get whatever things 84 00:07:41,260 --> 00:07:42,806 you want to have running. 85 00:07:45,847 --> 00:07:51,949 When you start working with it and do that a bit more, 86 00:07:51,949 --> 00:07:58,499 you notice a few things and that is 87 00:07:58,499 --> 00:08:01,950 your role should ideally run in "check mode". 88 00:08:02,275 --> 00:08:07,560 "ansible-playbook" has --check that basically is just a dry run of 89 00:08:07,560 --> 00:08:11,586 your complete playbook 90 00:08:11,586 --> 00:08:17,642 and with --diff, it will actually show you for example file changes, 91 00:08:17,642 --> 00:08:20,728 or file mode changes, stuff like that 92 00:08:20,728 --> 00:08:23,860 and won't actually change anything. 93 00:08:24,184 --> 00:08:31,579 So if you end up editing a lot of stuff, you can use that as a check. 94 00:08:32,270 --> 00:08:37,229 I'll later get to some antipatterns that actually break that thing. 95 00:08:40,075 --> 00:08:47,146 And, ideally, the way you change files and configs and states, 96 00:08:47,146 --> 00:08:50,724 you should make sure that when the actual changes are deployed, 97 00:08:50,724 --> 00:08:53,164 and you run it a second time, 98 00:08:53,164 --> 00:08:57,629 that Ansible doesn't report any changes 99 00:08:57,629 --> 00:09:02,939 because if you end up writing your roles fairly sloppy, you end up having 100 00:09:02,939 --> 00:09:05,880 a lot of changes and then, 101 00:09:05,880 --> 00:09:10,718 in the end of the report, you have like 20 changes reported and 102 00:09:10,718 --> 00:09:14,790 you kind of then know those 18, they're always there 103 00:09:14,790 --> 00:09:18,414 and you kind of miss the 2 that are important, that actually broke your system 104 00:09:18,414 --> 00:09:25,168 If you want to do it really well, you make sure that it doesn't report any changes 105 00:09:25,168 --> 00:09:27,407 when you run it twice in a row. 106 00:09:30,977 --> 00:09:38,494 Also, a thing to consider is you can define variables in the "defaults" folder 107 00:09:38,494 --> 00:09:40,485 and also in the "vars" folder, 108 00:09:41,256 --> 00:09:46,095 but if you look up how variables get inherited, you'll notice that 109 00:09:46,095 --> 00:09:49,715 the "vars" folder is really hard to actually override, 110 00:09:50,154 --> 00:09:53,495 so you want to avoid that as much as possible. 111 00:09:58,989 --> 00:10:05,859 That much larger section will be about typical anti-patterns I've noticed 112 00:10:05,859 --> 00:10:10,490 and I'll come to the first one now. 113 00:10:11,632 --> 00:10:15,171 It's the shell or command module. 114 00:10:17,290 --> 00:10:20,462 When people start using Ansible, that's the first thing they go 115 00:10:20,462 --> 00:10:26,075 "Oh well, I know how to use wget or I know 'apt-get install' " 116 00:10:26,075 --> 00:10:29,768 and then they end up using the shell module to do just that. 117 00:10:30,540 --> 00:10:35,388 If you use the shell module or the command module, you usually don't want to use that 118 00:10:35,388 --> 00:10:38,555 and that's for several reasons. 119 00:10:40,143 --> 00:10:46,520 There's currently, I think, 1300 different modules in Ansible 120 00:10:46,520 --> 00:10:50,506 so there's likely a big chance that whatever you want to do, 121 00:10:50,506 --> 00:10:53,768 there's already a module for that, that just does that thing. 122 00:10:54,664 --> 00:11:02,929 But those two modules also have several problems and that is 123 00:11:02,929 --> 00:11:09,640 the shell module, of course, gets interpreted by your actual shell, 124 00:11:09,640 --> 00:11:12,526 so if you have any special variables in there, 125 00:11:12,526 --> 00:11:21,912 you'd actually also have to take care of any variables you interpret in the shell string. 126 00:11:24,552 --> 00:11:31,459 Then, one of the biggest problems is if you run your playbook in check mode, 127 00:11:31,459 --> 00:11:34,263 the shell and the command modules won't get run. 128 00:11:34,711 --> 00:11:38,044 So if you're actually doing anything with that, they just get skipped 129 00:11:38,044 --> 00:11:47,595 and that would cause that your actual check mode and the real mode, 130 00:11:47,595 --> 00:11:51,567 they will start diverging if you use a lot of shell module. 131 00:11:55,594 --> 00:12:01,283 The worst, also, a bad part about this is that these two modules, 132 00:12:01,283 --> 00:12:03,597 they'll always ??? changed 133 00:12:03,597 --> 00:12:06,117 like, you run a command and it exits 0 134 00:12:06,117 --> 00:12:07,659 it's like "Oh, it changed" 135 00:12:10,909 --> 00:12:17,855 To get the reporting right on that module, you'd actually have to define for yourself 136 00:12:17,855 --> 00:12:21,073 when this is actually a change or not. 137 00:12:21,607 --> 00:12:29,333 So you'd have to probably get the output and then check, for example, 138 00:12:29,333 --> 00:12:35,303 if there's something on stderr or something to report an actual error or change. 139 00:12:38,395 --> 00:12:40,592 Then I'll get to the actual examples. 140 00:12:41,201 --> 00:12:46,237 The left is a bad example for using the shell module, 141 00:12:46,237 --> 00:12:48,636 I've seen that a lot, it's basically 142 00:12:48,636 --> 00:12:56,567 "Yeah, I actually want this file, so just use 'cat /path/file' and I'll use 143 00:12:56,567 --> 00:12:59,825 the register parameter to get the output". 144 00:13:06,166 --> 00:13:10,965 The actual output goes into the "shell_cmd" and then 145 00:13:10,965 --> 00:13:16,201 we want to copy it to some other file somewhere else and 146 00:13:16,201 --> 00:13:25,657 so we use the Jinja "{{ }}" to define the actual content of the file 147 00:13:25,657 --> 00:13:30,626 and then put it into that destination file 148 00:13:31,563 --> 00:13:37,333 That is problematic because, first of all if you run it in check mode, 149 00:13:37,333 --> 00:13:40,583 this gets skipped and then this variable is undefined and 150 00:13:40,583 --> 00:13:45,499 Ansible will fail with an error, so you won't be able to actually 151 00:13:45,499 --> 00:13:47,081 run that in check mode. 152 00:13:48,219 --> 00:13:51,019 The other problem is this will always ??? 153 00:13:51,995 --> 00:13:54,962 so you'd probably have to… 154 00:13:56,995 --> 00:14:01,386 the most sensible thing would probably be to say just "changed when false" 155 00:14:01,712 --> 00:14:06,349 and just acknowledge that that shell command won't change anything on this system 156 00:14:07,609 --> 00:14:13,824 The good example would be to use the actual "slurp" module that will 157 00:14:13,824 --> 00:14:17,092 just slurp the whole file and base64encode it 158 00:14:18,279 --> 00:14:28,146 and you can access the actual content with "path_file.contents" and you then just 159 00:14:28,146 --> 00:14:30,709 base64decode it and write in there. 160 00:14:31,931 --> 00:14:39,247 The nice thing is slurp will never return any change, so it won't say it changed 161 00:14:39,247 --> 00:14:42,783 and it also works great in check mode. 162 00:14:46,482 --> 00:14:48,432 Here's an other quick example. 163 00:14:49,893 --> 00:14:52,655 The example on the left, oh yeah wget. 164 00:14:53,876 --> 00:14:59,602 Here's the problem, every time your playbook runs, this file will get downloaded 165 00:14:59,602 --> 00:15:07,610 and of course if the file can't be retrieved from that URL 166 00:15:07,610 --> 00:15:12,766 it will throw an error and that will happen all the time. 167 00:15:14,600 --> 00:15:19,077 The right example is a more clean example using the uri module. 168 00:15:20,417 --> 00:15:27,569 You define a URL to retrieve a file from, you define where you want to write it to 169 00:15:27,569 --> 00:15:31,470 and you use the "creates" parameter to say 170 00:15:31,470 --> 00:15:34,889 "Just skip the whole thing if the file is already there". 171 00:15:40,048 --> 00:15:43,458 "set_facts", that's my pet peeve. 172 00:15:44,718 --> 00:15:49,553 set_facts is a module that allows you to define variables 173 00:15:49,553 --> 00:15:56,945 during your playbook run, so you can say set_facts and then 174 00:15:56,945 --> 00:16:02,922 this variable = that variable + a third variable or whatever 175 00:16:02,922 --> 00:16:04,914 you can do things with that. 176 00:16:06,375 --> 00:16:13,117 It's very problematic, though, because you end up having your variables 177 00:16:13,117 --> 00:16:15,394 changed during the playbook run 178 00:16:15,394 --> 00:16:24,781 and that is a problem when you use the "--start-at" parameter 179 00:16:24,781 --> 00:16:26,403 from ansible-playbook. 180 00:16:29,979 --> 00:16:36,436 Because this parameter allows you to skip forward to a certain task in a role 181 00:16:36,436 --> 00:16:40,133 so it skips everything until that point and then continues running there 182 00:16:40,133 --> 00:16:41,882 and that's really great for debugging 183 00:16:41,882 --> 00:16:48,874 but if you define a variable with set_facts and you skip over it, 184 00:16:48,874 --> 00:16:50,856 that variable would just not be defined. 185 00:16:53,587 --> 00:17:02,028 If you heavily use set_facts, that makes prototyping really horrible. 186 00:17:04,914 --> 00:17:07,557 Another point is that you can use 187 00:17:07,557 --> 00:17:13,411 "ansible -m setup" and then the hostname to check what variables are actually defined 188 00:17:13,411 --> 00:17:18,521 for a specific host and everything set with set_facts is just not there. 189 00:17:22,227 --> 00:17:27,025 In summary, avoid the shell module, avoid the command module, 190 00:17:27,025 --> 00:17:29,872 avoid set_facts as much as you can, 191 00:17:29,872 --> 00:17:36,622 and don't hide changes with "changed_when" 192 00:17:36,622 --> 00:17:41,538 so the clean approach is always to use one task to check something 193 00:17:41,538 --> 00:17:46,099 and then a second task to actually execute something for example. 194 00:17:48,458 --> 00:17:52,327 Also, a bad idea in my opinion is when people say 195 00:17:52,327 --> 00:17:55,948 "Oh well, it's not important if this throws an error or not, 196 00:17:55,948 --> 00:17:58,874 I'll just say 'fails when false'" 197 00:18:00,177 --> 00:18:06,476 That might work sometimes, but the problem there is, if something really breaks, 198 00:18:06,476 --> 00:18:08,059 you'll never find out. 199 00:18:09,196 --> 00:18:10,697 Advanced topics. 200 00:18:13,748 --> 00:18:17,316 This is about the templating. 201 00:18:18,870 --> 00:18:21,919 The usual approach, for example for postfix role, 202 00:18:21,919 --> 00:18:24,686 would be to do the following templating. 203 00:18:25,461 --> 00:18:36,479 You define certain variables in for example group_vars/postfix_servers 204 00:18:36,479 --> 00:18:40,868 so any host in that group would inherit these variables, 205 00:18:41,558 --> 00:18:47,902 so this is sort of a list of parameters for smtp recipient restrictions 206 00:18:48,923 --> 00:18:54,247 and this is just the smtp helo required. 207 00:18:55,142 --> 00:18:58,152 So the usual approach would be to define variables 208 00:18:58,152 --> 00:19:02,730 in the host_vars or group_vars, or even in the defaults 209 00:19:02,730 --> 00:19:08,074 and then you have a template where you just check every single variable 210 00:19:08,074 --> 00:19:15,226 If it exists, you actually sort of put the actual value there in place. 211 00:19:18,033 --> 00:19:23,717 Here, I check if this variable is set true and if yes, put the string there 212 00:19:23,717 --> 00:19:26,778 else, put this string there 213 00:19:27,824 --> 00:19:34,130 and for example, smtpd_recipient_restrictions I just iterate over this array 214 00:19:34,130 --> 00:19:38,435 and just output these values in order in that list. 215 00:19:41,846 --> 00:19:47,290 The problem here is that every time upstream defines a new variable 216 00:19:47,290 --> 00:19:56,684 you'll end up having to touch the actual template file and touch the actual variables 217 00:19:56,952 --> 00:20:04,102 so, I thought, "Well, you actually have keys and values and strings and arrays 218 00:20:04,102 --> 00:20:09,468 and hashes on one side, and actually, a config file is nothing else than that, 219 00:20:09,947 --> 00:20:11,664 just in a different format". 220 00:20:12,428 --> 00:20:16,750 So I came up with… 221 00:20:18,092 --> 00:20:24,354 With Jinja2, you can also define functions 222 00:20:24,354 --> 00:20:29,475 I'll have to cut short a little bit on explaining it but 223 00:20:29,475 --> 00:20:36,229 basically, up here, a function is defined and it's called here in the bottom 224 00:20:36,229 --> 00:20:43,589 Basically, what it just does, it iterates over the whole dictionary defined here, 225 00:20:43,833 --> 00:20:47,003 "postfix.main", and it just goes… 226 00:20:48,672 --> 00:20:51,511 It iterates over all the keys and values and it goes… 227 00:20:53,302 --> 00:20:57,933 If the value is a string, I'll just put "key = value" and 228 00:20:57,933 --> 00:21:04,063 if it's an array, I just iterate over it and put it there in the format that 229 00:21:04,063 --> 00:21:05,702 postfix actually wants. 230 00:21:07,889 --> 00:21:11,808 Basically, you can do the same, for example, for haproxy and 231 00:21:11,808 --> 00:21:18,428 you can just deserialize all the variables you actually defined. 232 00:21:20,258 --> 00:21:22,576 The advantages of this is, 233 00:21:22,576 --> 00:21:27,966 your template file just stays the same and it doesn't get messy 234 00:21:27,966 --> 00:21:29,725 if you start adding things. 235 00:21:30,703 --> 00:21:34,524 You have complete whitespace control, usually if you edit stuff, 236 00:21:34,524 --> 00:21:39,076 you kind of get an extra space, a new line in there, and that changes 237 00:21:39,076 --> 00:21:42,492 the template files for all machines. 238 00:21:43,629 --> 00:21:49,319 You have all the settings in alphabetical order, so if you actually run it and 239 00:21:49,319 --> 00:21:54,723 you see the diff, you don't end up having things going back and forth. 240 00:21:56,711 --> 00:22:00,564 If you get the syntax on the template file right, you don't have to touch it after that 241 00:22:00,564 --> 00:22:05,965 and you also don't get any syntax errors by editing them. 242 00:22:13,889 --> 00:22:16,003 That follows to the next one. 243 00:22:17,915 --> 00:22:23,889 You can actually set a "hash_behaviour" merge in the Ansible config and 244 00:22:23,889 --> 00:22:26,974 that allows you to do the following. 245 00:22:28,241 --> 00:22:39,333 On the left here, you define for example a dictionary and this is, like, in a group 246 00:22:39,333 --> 00:22:45,350 and then in a specific machine, you define an other setting in this dictionary. 247 00:22:46,325 --> 00:22:51,206 If you wouldn't use merge, the second setting would just override the first one 248 00:22:51,206 --> 00:22:53,684 and you'd end up with that, but if you actually do the merge, 249 00:22:53,684 --> 00:22:55,591 it does a deep merge of the hash. 250 00:22:56,608 --> 00:23:03,592 So the previous thing I showed would actually benefit from that 251 00:23:03,805 --> 00:23:06,410 so the combination of both is really good. 252 00:23:08,438 --> 00:23:09,864 I'll skip that. 253 00:23:10,312 --> 00:23:16,001 Further resources. Ansible has just a really good documentation, 254 00:23:16,001 --> 00:23:22,824 there's the IRC and there's also debops which is a project that is 255 00:23:22,824 --> 00:23:27,571 specific to Debian and derivatives. 256 00:23:30,341 --> 00:23:31,475 That's it. 257 00:23:31,765 --> 00:23:37,165 [Applause] 258 00:23:39,284 --> 00:23:40,906 Thank you very much.