当使用Docker在EC2上发送UDP数据包时,我有时会发现这个奇怪的错误(并非所有发送的消息都有异常),这在使用OpenNebula的内部集群中永远不会发生.我已经允许所有EC2实例上的每个端口上的所有入站/出站流量.这是一个例外:
2017-01-19 10:01:53,170 - ERROR: Exception caught for address: 10.99.0.153 Traceback (most recent call last): File "./server.py", line 56, insock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) OSError: [Errno 22] Invalid argument
我用Ubuntu服务器16.04和Docker 1.12.6运行5个c4.xlarge实例.他们都在同一个码头群中.
我使用覆盖驱动程序创建服务和网络子网.此服务具有挂载点以从每个对等方获取日志.我运行150个对等体,每个对等体的内存限制为300MB.
我的Dockerfile:
FROM debian:jessie RUN echo 'deb http://mirror.switch.ch/ftp/mirror/debian/ jessie-backports main' >> /etc/apt/sources.list && \ apt-get -yqq update && \ apt-get -yqq dist-upgrade && \ apt-get -yqq install --no-install-recommends dnsutils wget curl ntp python3 && \ apt-get -yqq clean CMD ["/opt/epto/container-start-script.sh"]
我使用以下shell脚本作为我的CMD:
#!/usr/bin/env bash MY_IP_ADDR=$(/bin/hostname -i) MY_IP_ADDR=($MY_IP_ADDR) ./server.py ${MY_IP_ADDR[0]}
这是运行的实际python脚本:
#!/usr/bin/env python3 import socketserver import sys import logging import threading import urllib.request import time import socket from random import randint PORT = 15342 class MyUDPHandler(socketserver.BaseRequestHandler): """ This class works similar to the TCP handler class, except that self.request consists of a pair of data and client socket, and since there is no connection the client address must be given explicitly when sending data back via sendto(). """ def handle(self): data = self.request[0].strip().decode("utf-8") logging.info("Message received from {} during loop {}".format(self.client_address[0], data)) class ThreadedUDPServer(socketserver.ThreadingMixIn, socketserver.UDPServer): pass if __name__ == "__main__": HOST = sys.argv[1] logging.basicConfig(format='%(asctime)s - %(levelname)s: %(message)s', level=logging.INFO, filename='/data/{}.test'.format(HOST)) server = ThreadedUDPServer((HOST, PORT), MyUDPHandler) server.allow_reuse_address = True logging.info("Create server listening on {}:{}".format(HOST, PORT)) logging.info("Server allow_reuse_address: {}".format(server.allow_reuse_address)) server_thread = threading.Thread(target=server.serve_forever) server_thread.daemon = True server_thread.start() sleep_delay = randint(10, 180) logging.info("Sleeping for {}s".format(sleep_delay)) time.sleep(sleep_delay) logging.info("Finished sleeping") sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) content = urllib.request.urlopen('http://epto-tracker:4321/REST/v1/admin/get_view').read() content = content.decode("utf-8") addresses = content.split('|') logging.info("View size: {}".format(len(addresses))) i = 0 while True: logging.info("Loop {}".format(i)) for address in addresses: try: logging.info("Sending to {}".format(address)) sock.sendto(bytes('{}'.format(i), "utf-8"), (address, PORT)) except: logging.exception("Exception caught for address: {}".format(address)) time.sleep(5) i += 1
我在同一个覆盖网络上创建了第二个服务.这个包含跟踪器,节点将联系以获取网络视图:
Dockerfile:
FROM python:3.5.2-alpine RUN pip install pydevd COPY tracker.py /code/ WORKDIR /code EXPOSE 4321 CMD [ "python", "./tracker.py" ]
代码文件:
# import pydevd import random import logging import time from http.server import HTTPServer, BaseHTTPRequestHandler available_peers = {} K = 25 logging.basicConfig(format='%(levelname)s: %(message)s', level=logging.INFO) def florida_string(ip): available_peers[ip] = int(time.time()) to_choose = list(available_peers.keys()) logging.info("View size: {:d}".format(len(to_choose))) to_choose.remove(ip) if len(to_choose) > K: to_send = random.sample(to_choose, K) else: to_send = to_choose return '|'.join(to_choose).encode() class FloridaHandler(BaseHTTPRequestHandler): def do_GET(self): if self.path == '/REST/v1/admin/get_view': self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(florida_string(self.client_address[0])) elif self.path == '/terminate': if self.client_address[0] in available_peers: del available_peers[self.client_address[0]] logging.info("Removed {:s}".format(self.client_address[0])) logging.info("View size: {:d}".format(len(available_peers))) else: logging.error("IP already removed or was never here") self.send_response(200) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Success") else: self.send_response(404) self.send_header("Content-type", "text/plain") self.end_headers() self.wfile.write(b"Nothing here, content is at /REST/v1/admin/get_view\n") class FloridaServer: def __init__(self): self.server = HTTPServer(('', 4321), FloridaHandler) self.server.serve_forever() FloridaServer()
有没有人在EC2上遇到同样的错误?