由于Python本身是没有类型的,而DNS对每一项的字长有严格要求(这种协议用C实现相当直接),因此要先生成固定长度的数据。有多种实现方法,比如:
bytearray, struct.pack, binascii.unhexlify, 直接在string中使用\x转义
StackOverflow上面比较推崇用struct.pack, 它的返回值是一个字符串,参数如下:
sturct.pack(fmt, v1, v2, ...)
fmt是一个字符串,用于给定结果的顺序格式,
The optional first format char indicates byte order, size and alignment:几个例子:
@: native order, size & alignment (default)
=: native order, std. size & alignment
<: alignment="" amp="" br="" little-endian="" size="" std.=""> >: big-endian, std. size & alignment
The remaining chars indicate types of args and must match exactly;
these can be preceded by a decimal repeat count:
x: pad byte (no data); c:char; b:signed byte; B:unsigned byte;
h:short; H:unsigned short; i:int; I:unsigned int;
l:long; L:unsigned long; f:float; d:double.
Special cases (preceding decimal count indicates length):
s:string (array of char); p: pascal string (with count byte).
Whitespace between formats is ignored.
pack(">BB", 1, 1) -> 0x0101
pack(">HH", 2, 2) -> 0x00020002
pack(">3s", 'abcdefgh') -> 'abc'
接下来就是实现DNS协议了,DNS协议的标准在https://tools.ietf.org/html/rfc1035. 一个DNS数据包由头部和其后数据组成:
+---------------------+
| Header |
+---------------------+
| Question | the question for the name server
+---------------------+
| Answer | RRs answering the question
+---------------------+
| Authority | RRs pointing toward an authority
+---------------------+
| Additional | RRs holding additional information
+---------------------+
头部是定长的,格式是:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ID |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR| Opcode |AA|TC|RD|RA| Z | RCODE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QDCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ANCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| NSCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| ARCOUNT |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
ID是一个随机产生的,用于response和request有相同的ID,其后的8位统称Flags。QR用于区分Answer和Request;Opcode用于指定查询类型(正向,逆向);RD是Recursive Desired客户端一般置1. 因此,常见的Request中的Flag是0x0100.
Flags后面接的分别是是Question,Answer,Authority,Additional类型的Resource Records的个数。在请求中一般是1,0,0,0.
头部之后是Question:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ QNAME /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QTYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| QCLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
QNAME是经过处理后的域名。a domain name represented as a sequence of labels, each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root. No padding is needed.
比如,www.baidu.com.的表示是 0x03 w w w 0x05 b a i d u 0x03 c o m 0x00
QTYPE指定查询类型,有A记录,AAAA记录,MX记录等等。A记录对应0x01.
QCLASS通常为0x01, 表示在Internet上查询。
到此为止,已经可以生成一个完整的DNS请求了。
接下来就要处理接收到的应答了。应答和查询有相同的头部。可以用struct.unpack拆分返回的数据包。需要注意,当RCODE=0时说明查询没有出错,RCODE=3时表示域名不存在。
Question的格式与请求时完全相同。其他RR的格式是:
1 1 1 1 1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| |
/ /
/ NAME /
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TYPE |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| CLASS |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| TTL |
| |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| RDLENGTH |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
/ RDATA /
/ /
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
NAME和Question中的格式相同,不过标准中定义了一种使用指针避免冗余数据的方法:
The pointer takes the form of a two octet sequence:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1| OFFSET |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
The first two bits are ones. This allows a pointer to be distinguished from a label, since the label must begin with two zero bits because labels are restricted to 63 octets or less. (The 10 and 01 combinations are reserved for future use.) The OFFSET field specifies an offset from the start of the message (i.e., the first octet of the ID field in the domain header). A zero offset specifies the first byte of the ID field,etc.
如果遇到最高两位是1的NAME,需要将其作为指针,其后的OFFSET是相对于Header的第一字节的偏移字节数。
NAME之后的TYPE是RR的类型,CLASS一般为1. TTL是数据的有效时限,以秒计,此处可以不考虑。RDLENGTH是RDATA的长度,以字节为单位。
如果Type=1,则该RR是一个A记录,RDLENGTH=4,RDATA是该NAME对应的IP;如果Type=5,则是CNAME记录,RDLENGTH可变,RDATA包含这个NAME对应的Canonical Name. 此外还有NS, SOA, MX, TXT记录。暂时不考虑
至此,已经可以从返回数据中读取域名对应的IP地址了。
代码如下:
#!/usr/bin/env python #-*- coding:utf-8 -*- import sys import os import socket import time import random from struct import pack, unpack, calcsize from binascii import hexlify, unhexlify timeout = 1 #Timeout in seconds class DNSReply(object): def __init__(self, raw_data): self.raw_data = raw_data fmt = '>HHHHHH' self.tid, self.flags, self.qdcount, self.ancount, self.nscount, self.arcount = unpack(fmt, self.raw_data[:calcsize(fmt)]) self.rcode = self.flags & 0x000F self.offset = calcsize(fmt) if self.rcode == 0: self.process_question() self.process_answer() def unformat_name(self, offset): count = 0 hostname = '' for i, x in enumerate(self.raw_data[offset:]): if count == 0: count = ord(x) if count & 0xc0: #This name is compressed ptr = ((ord(x) << 8) + ord(self.raw_data[offset+i+1])) & 0x3fff #print 'pointer = %x' % ptr hostname += self.unformat_name(ptr)[0] return hostname, offset + i + 2 elif count == 0: #End of name return hostname, offset + i + 1 else: hostname += '.' else: hostname += x count -= 1 #Control never reaches here raise ValueError def process_question(self): for i in range(self.qdcount): qname, self.offset = self.unformat_name(self.offset) qtype, qclass = unpack('>HH', self.raw_data[self.offset:self.offset + 4]) self.offset += 4 self.question = (qname[1:], qtype, qclass) #print self.question def process_rdata(self, ty, cl, offset, rdlen): if cl != 1: raise ValueError, 'Resource record type is not supported' if ty == 1: assert rdlen == 4 return '.'.join(map(str, unpack(">BBBB", self.raw_data[offset:offset+rdlen]))) #A RDATA elif ty == 5: return self.unformat_name(offset)[0] #CNAME DATA def process_answer(self): self.answer = [] fmt = '>HHLH' for i in range(self.ancount): aname, self.offset = self.unformat_name(self.offset) atype, aclass, ttl, rdlen = unpack(fmt, self.raw_data[self.offset:self.offset + calcsize(fmt)]) self.offset += calcsize(fmt) rr = {'name':aname, 'type':atype, 'class':aclass, 'ttl':ttl, 'rdlen':rdlen, 'value':self.process_rdata(atype, aclass, self.offset, rdlen)} self.offset += rdlen self.answer.append(rr) def printip(self): if self.rcode == 0 and self.answer: for t in self.answer: if t['type'] == 1: print 'HIT', t['value'], self.question[0] output.write("%s %s\n" % (t['value'], self.question[0])) break def format_name(hostname): qname = b'' for x in hostname.split('.'): qname += pack(">B%ds" % len(x), len(x), x) qname += '\x00' #QNAME terminator return qname def form_query(hostname): a = int(random.uniform(1, 0xFFFF)) tid = pack(">H", a) flags = pack(">BB", 0x01, 0x00) #Recursive desired header = pack(">HHHH", 1, 0, 0, 0) question = format_name(hostname) question += pack(">HH", 1, 1) #A record, IN Class query = tid + flags + header + question return tid, query def do_query(hostname_list, server_ip = '10.10.0.21', server_port = 53): ss = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) ss.settimeout(0.1) for ht in hostname_list: try: print ht tid, query = form_query(ht) #start = time.time() ss.sendto(query, (server_ip, server_port)) raw_reply, server_addr = ss.recvfrom(4096) #stop = time.time() #print "Reply in %.2f msecs" % int((stop - start) * 1000) if raw_reply[:2] == tid: reply = DNSReply(raw_reply) reply.printip() except Exception as e: print e ss.close()
No comments:
Post a Comment