Saturday, May 16, 2015

Python实现A记录查询

最近在学计算机网络,用的教材是Computer Networking: A Top Down Approach. 第二章节介绍了DNS。以前用Python实现过基于TCP和文本应答的协议,不过没有处理过DNS这样结构紧凑的协议。

由于Python本身是没有类型的,而DNS对每一项的字长有严格要求(这种协议用C实现相当直接),因此要先生成固定长度的数据。有多种实现方法,比如:
bytearray, struct.pack, binascii.unhexlify, 直接在string中使用\x转义


StackOverflow上面比较推崇用struct.pack, 它的返回值是一个字符串,参数如下:

sturct.pack(fmt, v1, v2, ...)

fmt是一个字符串,用于给定结果的顺序格式,
The optional first format char indicates byte order, size and alignment:
      @: native order, size & alignment (default)
      =: native order, std. size & alignment
      <: alignment="" amp="" br="" little-endian="" size="" std.="">      >: big-endian, std. size & alignment

The remaining chars indicate types of args and must match exactly;
    these can be preceded by a decimal repeat count:
      x: pad byte (no data); c:char; b:signed byte; B:unsigned byte;
      h:short; H:unsigned short; i:int; I:unsigned int;
      l:long; L:unsigned long; f:float; d:double.
    Special cases (preceding decimal count indicates length):
      s:string (array of char); p: pascal string (with count byte).
Whitespace between formats is ignored.
几个例子:
pack(">BB", 1, 1)   ->   0x0101
pack(">HH", 2, 2)   ->   0x00020002
pack(">3s", 'abcdefgh')    ->   'abc'

接下来就是实现DNS协议了,DNS协议的标准在https://tools.ietf.org/html/rfc1035. 一个DNS数据包由头部和其后数据组成:

    +---------------------+
    |        Header       |
    +---------------------+
    |       Question      | the question for the name server
    +---------------------+
    |        Answer       | RRs answering the question
    +---------------------+
    |      Authority      | RRs pointing toward an authority
    +---------------------+
    |      Additional     | RRs holding additional information
    +---------------------+

头部是定长的,格式是:
                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      ID                       |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    QDCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ANCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    NSCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                    ARCOUNT                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+


ID是一个随机产生的,用于response和request有相同的ID,其后的8位统称Flags。QR用于区分Answer和Request;Opcode用于指定查询类型(正向,逆向);RD是Recursive Desired客户端一般置1. 因此,常见的Request中的Flag是0x0100.

Flags后面接的分别是是Question,Answer,Authority,Additional类型的Resource Records的个数。在请求中一般是1,0,0,0.

头部之后是Question:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                     QNAME                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QTYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     QCLASS                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+


QNAME是经过处理后的域名。a domain name represented as a sequence of labels, each label consists of a length octet followed by that number of octets. The domain name terminates with the zero length octet for the null label of the root. No padding is needed.
比如,www.baidu.com.的表示是 0x03 w w w 0x05 b a i d u 0x03 c o m 0x00

QTYPE指定查询类型,有A记录,AAAA记录,MX记录等等。A记录对应0x01.

QCLASS通常为0x01, 表示在Internet上查询。

到此为止,已经可以生成一个完整的DNS请求了。

接下来就要处理接收到的应答了。应答和查询有相同的头部。可以用struct.unpack拆分返回的数据包。需要注意,当RCODE=0时说明查询没有出错,RCODE=3时表示域名不存在。

Question的格式与请求时完全相同。其他RR的格式是:

                                    1  1  1  1  1  1
      0  1  2  3  4  5  6  7  8  9  0  1  2  3  4  5
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                                               |
    /                                               /
    /                      NAME                     /
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TYPE                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                     CLASS                     |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                      TTL                      |
    |                                               |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    |                   RDLENGTH                    |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--|
    /                     RDATA                     /
    /                                               /
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+


NAME和Question中的格式相同,不过标准中定义了一种使用指针避免冗余数据的方法:

The pointer takes the form of a two octet sequence:

    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
    | 1  1|                OFFSET                   |
    +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

The first two bits are ones.  This allows a pointer to be distinguished from a label, since the label must begin with two zero bits because labels are restricted to 63 octets or less.  (The 10 and 01 combinations are reserved for future use.)  The OFFSET field specifies an offset from the start of the message (i.e., the first octet of the ID field in the domain header).  A zero offset specifies the first byte of the ID field,etc.


如果遇到最高两位是1的NAME,需要将其作为指针,其后的OFFSET是相对于Header的第一字节的偏移字节数。

NAME之后的TYPE是RR的类型,CLASS一般为1. TTL是数据的有效时限,以秒计,此处可以不考虑。RDLENGTH是RDATA的长度,以字节为单位。

如果Type=1,则该RR是一个A记录,RDLENGTH=4,RDATA是该NAME对应的IP;如果Type=5,则是CNAME记录,RDLENGTH可变,RDATA包含这个NAME对应的Canonical Name. 此外还有NS, SOA, MX, TXT记录。暂时不考虑

至此,已经可以从返回数据中读取域名对应的IP地址了。

代码如下:
#!/usr/bin/env python
#-*- coding:utf-8 -*-

import sys
import os
import socket
import time
import random
from struct import pack, unpack, calcsize
from binascii import hexlify, unhexlify

timeout = 1  #Timeout in seconds

class DNSReply(object):
    def __init__(self, raw_data):
        self.raw_data = raw_data
        fmt = '>HHHHHH'
        self.tid, self.flags, self.qdcount, self.ancount, self.nscount, self.arcount = unpack(fmt, self.raw_data[:calcsize(fmt)])
        self.rcode = self.flags & 0x000F
        self.offset = calcsize(fmt)

        if self.rcode == 0:
            self.process_question()
            self.process_answer()

    def unformat_name(self, offset):
        count = 0
        hostname = ''
        for i, x in enumerate(self.raw_data[offset:]):
            if count == 0:
                count = ord(x)
                if count & 0xc0:
                    #This name is compressed
                    ptr = ((ord(x) << 8) + ord(self.raw_data[offset+i+1])) & 0x3fff
                    #print 'pointer = %x' % ptr
                    hostname += self.unformat_name(ptr)[0]
                    return hostname, offset + i + 2
                elif count == 0:
                    #End of name
                    return hostname, offset + i + 1
                else:
                    hostname += '.'
            else:
                hostname += x
                count -= 1
        #Control never reaches here
        raise ValueError

    def process_question(self):
        for i in range(self.qdcount):
            qname, self.offset = self.unformat_name(self.offset)
            qtype, qclass = unpack('>HH', self.raw_data[self.offset:self.offset + 4])
            self.offset += 4
            self.question = (qname[1:], qtype, qclass)
            #print self.question

    def process_rdata(self, ty, cl, offset, rdlen):
        if cl != 1:
            raise ValueError, 'Resource record type is not supported'
        if ty == 1:
            assert rdlen == 4
            return '.'.join(map(str, unpack(">BBBB", self.raw_data[offset:offset+rdlen]))) #A RDATA
        elif ty == 5:
            return self.unformat_name(offset)[0]    #CNAME DATA

    def process_answer(self):
        self.answer = []
        fmt = '>HHLH'
        for i in range(self.ancount):
            aname, self.offset = self.unformat_name(self.offset)
            atype, aclass, ttl, rdlen = unpack(fmt, self.raw_data[self.offset:self.offset + calcsize(fmt)])
            self.offset += calcsize(fmt)
            rr = {'name':aname, 'type':atype, 'class':aclass, 'ttl':ttl, 'rdlen':rdlen, 'value':self.process_rdata(atype, aclass, self.offset, rdlen)}
            self.offset += rdlen
            self.answer.append(rr)

    def printip(self):
        if self.rcode == 0 and self.answer:
            for t in self.answer:
                if t['type'] == 1:
                    print 'HIT', t['value'], self.question[0]
                    output.write("%s %s\n" % (t['value'], self.question[0]))
                    break

def format_name(hostname):
    qname = b''
    for x in hostname.split('.'):
        qname += pack(">B%ds" % len(x), len(x), x)
    qname += '\x00'    #QNAME terminator
    return qname

def form_query(hostname):
    a = int(random.uniform(1, 0xFFFF))
    tid = pack(">H", a)
    flags = pack(">BB", 0x01, 0x00)   #Recursive desired
    header = pack(">HHHH", 1, 0, 0, 0)
    question = format_name(hostname)
    question += pack(">HH", 1, 1) #A record, IN Class
    query = tid + flags + header + question
    return tid, query

def do_query(hostname_list, server_ip = '10.10.0.21', server_port = 53):
    ss = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    ss.settimeout(0.1)
    for ht in hostname_list:
        try:
            print ht
            tid, query = form_query(ht)
            #start = time.time()
            ss.sendto(query, (server_ip, server_port))
            raw_reply, server_addr = ss.recvfrom(4096)
            #stop = time.time()
            #print "Reply in %.2f msecs" % int((stop - start) * 1000)
            if raw_reply[:2] == tid:
                reply = DNSReply(raw_reply)
                reply.printip()
        except Exception as e:
            print e
    ss.close()

No comments:

Post a Comment