【天天播资讯】Linux网卡驱动程序

手机

iphone11大小尺寸是多少？苹果iPhone11和iPhone13的区别是什么？

警方通报辅警执法直播中被撞飞：犯罪嫌疑人已投案

【天天播资讯】Linux网卡驱动程序

2023-03-08 16:00:00 来源：博客园

我们都知道网络是Linux内核所固有的。一些年以前，Linux由于其网络性能而被使用，但现在情况已经发生了变化；Linux不仅仅是一个服务器，它可以在数十亿个嵌入式设备上运行。多年来，Linux已经获得了最好的网络操作系统的声誉。尽管如此，Linux不能做所有的事情。考虑到存在的各种各样的以太网控制器，Linux除了向需要为其网络设备编写驱动程序的开发人员或需要以一般方式执行内核网络开发的开发人员公开API之外，没有其他方法。这个API提供了一个足够的抽象层，保证了所开发代码的丰富性，以及在其他架构上的可移植性。本章将介绍该API中处理网络接口卡(NIC)驱动程序开发的部分，并讨论其数据结构和方法。

我们将讨论以下主题:

(资料图片)

网卡驱动程序的数据结构和主要的套接字缓冲结构
网卡驱动架构和方法描述，以及数据包的传输和接收
为测试目的开发一个虚拟网卡驱动程序

驱动程序数据结构

在处理网卡设备时，需要使用两种数据结构：

struct sk_buff结构，定义在include/linux/skbuff.h中，它是linux网络代码中的基本数据结构，需要包含的头文件:
```
#include 
```
发送或接收的每个包都使用此数据结构进行处理。
struct net_device结构，这是一种在内核中表示任何网卡设备的结构。它是发生数据传输的接口。它在include/linux/netdevice.h中定义，需要包含的头文件:
```
#include 
```
在代码中包含的其他头文件include/linux/etherdevice.h用于MAC和以太网相关的函数(如alloc_etherdev())，include/linux/ethtool.h用于ethtool支持：
```
#include #include 
```

套接字缓冲区

这个结构包装了任何经过网卡的数据包:

struct sk_buff {　　struct sk_buff * next;　　struct sk_buff * prev;　　ktime_t tstamp;　　struct rb_node rbnode; /* used in netem & tcp stack */　　struct sock * sk;　　struct net_device * dev;　　unsigned int len;　　unsigned int data_len;　　__u16 mac_len;　　__u16 hdr_len;　　unsigned int len;　　unsigned int data_len;　　__u16 mac_len;　　__u16 hdr_len;　　__u32 priority;　　dma_cookie_t dma_cookie;　　sk_buff_data_t tail;　　sk_buff_data_t end;　　unsigned char * head;　　unsigned char * data;　　unsigned int truesize;　　atomic_t users;};

下面是结构中的元素解释:

next和prev: 它们表示列表中的下一个和上一个缓冲区。
sk: 与此数据包相关的套接字。
tstamp: 这是数据包到达/离开的时间。
rbnode: 这是next/prev的可选方案，用红黑树表示。
dev: 这表示数据包到达/离开的设备。该字段与这里没有列出的其他两个字段相关联。它们是input_dev和real_dev。它们跟踪与数据包相关的设备。因此，input_dev总是指向接收数据包的设备。
len:这是数据包中的总字节数。套接字缓冲区(skb)由线性数据缓冲区和一组称为房间(rooms)的区域(可选)组成。如果存在这样的房间，data_len将保存数据区域的总字节数。
mac_len: 保存MAC头的长度。
csum:它包含数据包的校验和。
priority: QoS中的报文优先级。
truesize: 它跟踪一个包占用了多少字节的系统内存，包括struct sk_buff结构本身占用的内存。
users: 用于对SKB对象进行引用计数。
head: head、data和tail是指向套接字缓冲区中不同区域(rooms)的指针。
end: 指向套接字缓冲区的结束。

这里只讨论了这个结构的几个元素。完整的描述可以在include/linux/skbuff.h中找到。这是处理套接字缓冲区时应该包含的头文件。

套接字缓冲区分配

套接字缓冲区的分配有点麻烦，因为它至少需要三个不同的函数:

首先，整个内存分配应该使用netdev_alloc_skb()函数完成
使用skb_reserve()函数增加并对齐报头空间
使用skb_put()函数扩展缓冲区中使用的数据区域(其中将包含数据包)。

我们通过netdev_alloc_skb()函数分配一个足够大的缓冲区来包含一个数据包和以太网头:
```
struct sk_buff *netdev_alloc_skb(struct net_device *dev, unsigned int length)
```
该函数失败时返回NULL。因此，即使它分配内存，也可以从原子上下文调用netdev_alloc_skb()。由于以太网报头有14个字节长，它需要有一些对齐，以便CPU在访问缓冲区的那一部分时不会遇到任何性能问题。header_len参数的适当名称应该是header_alignment，因为这个参数用于对齐。通常的值是2，这就是为什么内核在include/linux/skbuff.h中定义了一个专用宏NET_IP_ALIGN的原因:
```
#define NET_IP_ALIGN 2
```
第二步通过减少尾部空间为头部保留对齐的内存。执行此操作的函数是skb_reserve():
```
void skb_reserve(struct sk_buff *skb, int len)
```
最后一步是通过skb_put()函数扩展缓冲区中使用的数据区域，使其与数据包大小一样大。这个函数返回一个指向数据区的第一个字节的指针:
```
unsigned char *skb_put(struct sk_buff *skb, unsigned int len)
```
分配的套接字缓冲区应该转发到内核网络层。这是套接字缓冲区生命周期的最后一步。我们应该使用netif_rx_ni()函数:
```
int netif_rx_ni(struct sk_buff *skb)
```

我们将讨论如何使用上面的步骤处理数据包的接收。

网络接口结构

网络接口在内核中表示为struct net_device结构的实例，该结构在include/linux/netdevice.h中定义:

struct net_device {　　char name[IFNAMSIZ];　　char *ifalias;　　unsigned long mem_end;　　unsigned long mem_start;　　unsigned long base_addr;　　int irq;　　netdev_features_t features;　　netdev_features_t hw_features;　　netdev_features_t wanted_features;　　int ifindex;　　struct net_device_stats stats;　　atomic_long_t rx_dropped;　　atomic_long_t tx_dropped;　　const struct net_device_ops *netdev_ops;　　const struct ethtool_ops *ethtool_ops;　　unsigned int flags;　　unsigned int priv_flags;　　unsigned char link_mode;　　unsigned char if_port;　　unsigned char dma;　　unsigned int mtu;　　unsigned short type;　　/* Interface address info. */　　unsigned char perm_addr[MAX_ADDR_LEN];　　unsigned char addr_assign_type;　　unsigned char addr_len;　　unsigned short neigh_priv_len;　　unsigned short dev_id;　　unsigned short dev_port;　　unsigned long last_rx;　　/* Interface address info used in eth_type_trans() */　　unsigned char *dev_addr;　　struct device dev;　　struct phy_device *phydev;};

struct net_device结构属于需要动态分配的内核数据结构，有自己的分配函数。通过alloc_etherdev()函数在内核中分配网卡:

struct net_device *alloc_etherdev(int sizeof_priv);

函数失败时返回NULL。sizeof_priv参数表示要分配给附加到该网卡的私有数据结构的内存大小，可以使用netdev_priv()函数提取:

void *netdev_priv(const struct net_device *dev)

给定struct priv_struct结构，这是我们的私有结构，下面是如何分配网络设备和私有数据结构的实现：

struct net_device *net_dev;struct priv_struct *priv_net_struct;net_dev = alloc_etherdev(sizeof(struct priv_struct));my_priv_struct = netdev_priv(net_dev);

应该使用free_netdev()函数释放未使用的网络设备，该函数还释放为私有数据分配的内存。你应该只在设备从内核中注销后才调用这个方法:

void free_netdev(struct net_device *dev)

在net_device结构完成并填充之后，你应该对其调用register_netdev()。请记住，这个函数将我们的网络设备注册到内核中，以便可以使用它。也就是说，在调用这个函数之前，你应该确保设备真的可以处理网络操作:

int register_netdev(struct net_device *dev)

设备方法

网络设备属于不在/dev目录中出现的设备类别(与块、输入或字符设备不同)。因此，像所有这些类型的设备一样，网卡驱动程序公开了一组工具来执行。内核通过struct net_device_ops结构公开了可以在网络接口上执行的操作，该结构是struct net_device结构的一个字段，表示网络设备(dev->netdev_ops)。struct net_device_ops字段说明如下:

struct net_device_ops {　　int (*ndo_init)(struct net_device *dev);　　void (*ndo_uninit)(struct net_device *dev);　　int (*ndo_open)(struct net_device *dev);　　int (*ndo_stop)(struct net_device *dev);　　netdev_tx_t (*ndo_start_xmit) (struct sk_buff *skb, struct net_device *dev);　　void (*ndo_change_rx_flags)(struct net_device *dev, int flags);　　void (*ndo_set_rx_mode)(struct net_device *dev);　　int (*ndo_set_mac_address)(struct net_device *dev, void *addr);　　int (*ndo_validate_addr)(struct net_device *dev);　　int (*ndo_do_ioctl)(struct net_device *dev, struct ifreq *ifr, int cmd);　　int (*ndo_set_config)(struct net_device *dev, struct ifmap *map);　　int (*ndo_change_mtu)(struct net_device *dev, int new_mtu);　　void (*ndo_tx_timeout) (struct net_device *dev);　　struct net_device_stats* (*ndo_get_stats)(struct net_device *dev);};

让我们看看结构中每个元素的含义是什么：

int (*ndo_init)(struct net_device *dev) 和void(*ndo_uninit)(struct net_device *dev):这些是额外的初始化/取消初始化函数，分别在驱动程序调用register_netdev()/unregister_netdev()时执行，以便向内核注册/取消注册网络设备。大多数驱动程序不提供这些函数，因为真正的工作是由ndo_open()和ndo_stop()函数完成的。
int (*ndo_open)(struct net_device *dev):这将准备并打开接口。只要ip或ifconfig程序激活该接口，它就会打开。在这种方法中，驱动程序应该请求/映射/注册它需要的任何系统资源(I/O端口、IRQ、DMA等等)，打开硬件，并执行设备所需的任何其他设置。
int (*ndo_stop)(struct net_device *dev):当接口关闭时，内核执行这个函数(例如，ifconfig down;等等)。这个函数应该执行与ndo_open()中所做的相反的操作。
int (*ndo_start_xmit) (struct sk_buff *skb, struct net_device *dev):当内核希望通过此接口发送数据包时，将调用此方法。
void (*ndo_set_rx_mode)(struct net_device *dev):此方法用于更改接口地址列表过滤模式、组播或混杂。建议提供该功能。
void (*ndo_tx_timeout)(struct net_device *dev):当数据包传输在合理的时间内未能完成时，内核会调用此方法，通常用于dev->watchdog ticks。驱动程序应该检查发生了什么，处理问题，并恢复数据包传输。
struct net_device_stats *(*get_stats)(struct net_device *dev):该方法返回设备统计信息。它是运行netstat -i或ifconfig时可以看到的内容。

前面的描述遗漏了许多要素。完整的结构描述可以在include/linux/netdevice.h文件中找到。实际上，只有ndo_start_xmit是强制的，但是提供尽可能多的helper钩子是一个很好的实践，因为你的设备具有各种特性。

打开和关闭

每当授权用户(例如admin)使用任何用户空间实用程序(如ifconfig或ip)配置这个网络接口时，内核就会调用ndo_open()函数。

像其他网络设备操作一样，ndo_open()函数接收一个struct net_device对象作为其参数，在分配net_device对象时，驱动程序应该从中获取存储在priv字段中的设备特定对象。

网络控制器通常在接收或完成包传输时引发中断。驱动程序需要注册一个中断处理程序，该处理程序将在控制器引发中断时被调用。驱动程序可以在init()/probe()例程或open函数中注册中断处理程序。有些设备需要通过在硬件中的一个特殊寄存器中设置中断来启用中断。在这种情况下，可以在探测函数中请求中断，并在打开/关闭方法中设置/清除使能位。

下面总结一下open函数应该做什么:

更新接口MAC地址(如果用户更改了它，并且您的设备允许这样做)。
如有必要，复位硬件，并将其从低功耗模式中退出。
请求各种资源(I/O内存，DMA通道，IRQ)。
映射IRQ和注册中断处理程序。
检查接口link状态。
在设备上调用netif_start_queue()，以便让内核知道您的设备已经准备好传输数据包了。

open函数的示例如下:

/* * This routine should set everything up new at each open, even * registers that should only need to be set once at boot, so that * there is non-reboot way to recover if something goes wrong. */static int enc28j60_net_open(struct net_device *dev){　　struct priv_net_struct *priv = netdev_priv(dev);　　if (!is_valid_ether_addr(dev->dev_addr)) {　　　　[...] /* Maybe print a debug message ? */　　　　return -EADDRNOTAVAIL;　　}　　/*　　 * Reset the hardware here and take it out of low　　 * power mode　　 */　　my_netdev_lowpower(priv, false);　　if (!my_netdev_hw_init(priv)) {　　　　[...] /* handle hardware reset failure */　　　　return -EINVAL;　　}　　/* Update the MAC address (in case user has changed it)　　 * The new address is stored in netdev->dev_addr field　　 */　　set_hw_macaddr_registers(netdev, MAC_REGADDR_START, netdev->addr_len, netdev->dev_addr);　　/* Enable interrupts */　　my_netdev_hw_enable(priv);　　/* We are now ready to accept transmit requests from　　 * the queueing layer of the networking.　　 */　　netif_start_queue(dev);　　return 0;}

netif_start_queue()只是允许上层调用设备ndo_start_xmit例程。换句话说，它通知内核此网络设备已经准备好处理传输请求。

另一边的关闭方法只需要做与设备打开时所做的操作相反的操作:

/* The inverse routine to net_open(). */static int enc28j60_net_close(struct net_device *dev){    struct priv_net_struct *priv = netdev_priv(dev);    my_netdev_hw_disable(priv);    my_netdev_lowpower(priv, true);    /**    * netif_stop_queue - stop transmitted packets    *    * Stop upper layers calling the device ndo_start_xmit routine.    * Used for flow control when transmit resources are unavailable.    */    netif_stop_queue(dev);    return 0;}

netif_stop_queue()只是与netif_start_queue()相反，告诉内核停止调用设备ndo_start_xmit例程。我们不能再处理传输请求了。

包处理

数据包处理包括数据包的传输和接收。这是任何网络接口驱动程序的主要任务。传输指的是发送出去的帧，而接收指的是进来的帧。

有两种方法去驱动网络数据交换:轮询或中断。轮询是一种定时器驱动的中断，由内核以给定的时间间隔连续检查设备的任何更改组成。另一方面，中断模式下内核什么都不做，侦听IRQ线，并通过IRQ等待设备更改的通知。中断驱动的数据交换在高流量期间会增加系统开销；这就是为什么有些驱动程序混合使用这两种方法。内核中允许混合使用这两种方法的部分称为New API (NAPI)，它包括在高流量期间使用轮询，以及在流量恢复正常时使用中断irq驱动的管理。如果硬件支持，新的驱动程序应该使用NAPI。这里不讨论NAPI，将重点讨论中断驱动的方法。

包接收

当数据包到达网卡时，驱动程序必须围绕它构建一个新的套接字缓冲区，并将数据包复制到sk_buff->data字段中。复制的类型并不重要，也可以使用DMA。驱动程序通常知道通过中断到达的新数据。当网卡接收到一个数据包时，它会引发一个中断，这个中断将由驱动程序处理，驱动程序必须检查设备的中断状态寄存器，并检查引发中断的真正原因(可能是RX ok, RX error等等)。与引发中断的事件对应的位将在状态寄存器中被设置。

棘手的部分是分配和构建套接字缓冲区。下面是一个RX处理程序示例。驱动程序必须执行与它接收到的数据包数量相同的sk_buff分配:

/* * RX handler * This function is called in the work responsible of packet * reception (bottom half) handler. We use work because access to * our device (which sit on a SPI bus) may sleep */static int my_rx_interrupt(struct net_device *ndev){    struct priv_net_struct *priv = netdev_priv(ndev);    int pk_counter, ret;    /* Let"s get the number of packet our device received */    pk_counter = my_device_reg_read(priv, REG_PKT_CNT);    if (pk_counter > priv->max_pk_counter) {        /* update statistics */        priv->max_pk_counter = pk_counter;    }    ret = pk_counter;    /* set receive buffer start */    priv->next_pk_ptr = KNOWN_START_REGISTER;    while (pk_counter-- > 0)        /*         * By calling this internal helper function in a "while"         * loop, packets get extracted one by one from the device         * and forwarder to the network layer.         */        my_hw_rx(ndev);        return ret;}

下面的方法负责从设备获取一个数据包，将其转发到内核网络，并减少数据包计数器:

/* * Hardware receive function. * Read the buffer memory, update the FIFO pointer to * free the buffer. * This function decrements the packet counter. */static void my_hw_rx(struct net_device *ndev){　　struct priv_net_struct *priv = netdev_priv(ndev);　　struct sk_buff *skb = NULL;　　u16 erxrdpt, next_packet, rxstat;　　u8 rsv[RSV_SIZE];　　int packet_len;　　　　packet_len = my_device_read_current_packet_size();　　/* Can"t cross boundaries */　　if ((priv->next_pk_ptr > RXEND_INIT)) {　　　　/* packet address corrupted: reset RX logic */　　　　[...]　　　　/* Update RX errors stats */　　　　ndev->stats.rx_errors++;　　　　return;　　}　　　　/* Read next packet pointer and rx status vector　　 * This is device-specific　　 */　　my_device_reg_read(priv, priv->next_pk_ptr, sizeof(rsv), rsv);　　　　/* Check for errors in the device RX status reg,　　 * and update error stats accordingly　　 */　　if(an_error_is_detected_in_device_status_registers())　　　　/* Depending on the error,　　　　 * stats.rx_errors++;　　　　 * ndev->stats.rx_crc_errors++;　　　　 * ndev->stats.rx_frame_errors++;　　　　 * ndev->stats.rx_over_errors++;　　　　 */　　} else {　　　　skb = netdev_alloc_skb(ndev, len + NET_IP_ALIGN);　　　　if (!skb) {　　　　　　ndev->stats.rx_dropped++;　　　　} else {　　　　　　skb_reserve(skb, NET_IP_ALIGN);　　　　　　/*　　　　　　 * copy the packet from the device" receive buffer　　　　　　 * to the socket buffer data memory.　　　　　　 * Remember skb_put() return a pointer to the　　　　　　 * beginning of data region.　　　　　　 */　　　　　　my_netdev_mem_read(priv, rx_packet_start(priv->next_pk_ptr), len, skb_put(skb, len));　　　　　　/* Set the packet"s protocol ID */　　　　　　skb->protocol = eth_type_trans(skb, ndev);　　　　　　/* update RX statistics */　　　　　　ndev->stats.rx_packets++;　　　　　　ndev->stats.rx_bytes += len;　　　　　　/* Submit socket buffer to the network layer */　　　　　　netif_rx_ni(skb);　　　　}　　}　　/* Move the RX read pointer to the start of the next　　 * received packet. 　　*/　　priv->next_pk_ptr = my_netdev_update_reg_next_pkt();}

当然，我们从被延迟的任务中调用RX处理程序的唯一原因是我们位于SPI总线上。对于MMIO设备，所有上述操作都可以在hwriq中执行。看看drivers/net/ethernet/freescale/fec.c中的NXP FEC驱动程序，看看这是如何实现的。

包传输

当内核需要从接口发送数据包时，它调用驱动程序的ndo_start_xmit方法，成功时该方法应该返回NETDEV_TX_OK，失败时返回NETDEV_TX_BUSY，在这种情况下，您不能对套接字缓冲区做任何事情，因为当错误返回时，它仍然属于网络队列层。这意味着你不能修改任何SKB字段，或释放SKB，等等。自旋锁保护该函数不受并发调用的影响。

包传输在大多数情况下是异步完成的。传输报文的sk_buff由上层填充。它的数据字段包含要发送的数据包。驱动程序应该从sk_buff->data中提取一个数据包，并将其写入设备硬件FIFO，或者将其放入临时TX缓冲区(如果设备在发送之前需要一定数量的数据)，然后再将其写入设备硬件FIFO。只有当FIFO达到一个阈值(通常由驱动程序定义，或在设备数据手册中提供)，或者当驱动程序通过在设备的特殊寄存器中设置位(一种触发器)有意地开始传输时，才真正发送数据。也就是说，驱动程序需要通知内核在硬件准备好接受新数据之前不要启动任何传输。这个通知是通过netif_stop_queue()函数完成的:

void netif_stop_queue(struct net_device *dev)

在发送数据包后，网卡将引发中断。中断处理程序应该检查中断发生的原因。在传输的中断上下文中，它应该更新它的统计信息(net_device->stats.tx_errors和net_device->stats.tx_packets)，并通知内核该设备可以发送新数据包。该通知是通过netif_wake_queue()完成的:

void netif_wake_queue(struct net_device *dev)

综上所述，数据包传输分为两部分:

ndo_start_xmit操作，它通知内核设备正忙(设备“我”已经在传输了，别再给我派任务)，设置一切，并开始传输
TX中断处理程序，它更新TX统计数据并通知内核设备再次可用

ndo_start_xmit函数必须大致包含以下步骤:

在网络设备上调用netif_stop_queue()，以通知内核该设备在数据传输中将处于繁忙状态。
将sk_buff->data内容写入设备FIFO。
触发传输(指示设备启动传输)。

操作(2)和(3)可能导致位于低速总线上的设备处于休眠状态(例如SPI)，并且工作结构可能会延期。这就是下面我们例子的情况。

一旦数据包被传输，TX中断处理程序应该执行以下步骤:

4.根据设备的内存映射，或者它是否位于访问函数可能处于休眠状态的总线上，以下操作应该直接在hwirq处理程序中执行或在工作队列(或线程化IRQ)中调度执行:

检查中断是否为传输中断
读取描述传输状态的寄存器，看看数据包的状态是什么　
如果传输中有任何问题，则增加错误统计数据
递增发送成功数据包的统计信息

5. 启动传输队列，允许内核根据netif_wake_queue()函数再次调用驱动程序的ndo_start_xmit方法。

让我们在一个简短的示例代码片段中总结这些:

/* Somewhere in the code */INIT_WORK(&priv->tx_work, my_netdev_hw_tx);static netdev_tx_t my_netdev_start_xmit(struct sk_buff *skb, struct net_device *dev){　　struct priv_net_struct *priv = netdev_priv(dev);　　/* Notify the kernel our device will be busy */　　netif_stop_queue(dev);　　/* Remember the skb for deferred processing */　　priv->tx_skb = skb;　　　　/* This work will copy data from sk_buffer->data to　　 * the hardware"s FIFO and start transmission　　 */　　schedule_work(&priv->tx_work);　　/* Everything is OK */　　return NETDEV_TX_OK;}

此my_netdev_hw_tx工作描述如下：

/* * Hardware transmit function. * Fill the buffer memory and send the contents of the * transmit buffer onto the network */static void my_netdev_hw_tx(struct priv_net_struct *priv){　　/* Write packet to hardware device TX buffer memory */　　my_netdev_packet_write(priv, priv->tx_skb->len, priv->tx_skb->data);　　/*　　 * does this network device support write-verify?　　 * Perform it　　 */　　[...];　　/* set TX request flag,　　 * so that the hardware can perform transmission.　　 * This is device-specific　　 */　　my_netdev_reg_bitset(priv, ECON1, ECON1_TXRTS);}

TX中断管理将在下面讨论。

驱动例子

我们可以在下面的伪以太网驱动程序中总结前面讨论的概念:

#include #include #include #include #include #include #include #include #include #include  /* For DT*/#include  /* For platform devices */struct eth_struct {　　int bar;　　int foo;　　struct net_device *dummy_ndev;};static int fake_eth_open(struct net_device *dev){　　printk("fake_eth_open called\n");　　　　/* We are now ready to accept transmit requests from　　 * the queueing layer of the networking.　　 */　　netif_start_queue(dev);　　return 0;}static int fake_eth_release(struct net_device *dev){　　pr_info("fake_eth_release called\n");　　netif_stop_queue(dev);　　return 0;}static int fake_eth_xmit(struct sk_buff *skb, struct net_device *ndev){　　pr_info("dummy xmit called...\n");　　ndev->stats.tx_bytes += skb->len;　　ndev->stats.tx_packets++;　　skb_tx_timestamp(skb);　　dev_kfree_skb(skb);　　return NETDEV_TX_OK;}static int fake_eth_init(struct net_device *dev){　　pr_info("fake eth device initialized\n");　　return 0;}static const struct net_device_ops my_netdev_ops = {　　.ndo_init = fake_eth_init,　　.ndo_open = fake_eth_open,　　.ndo_stop = fake_eth_release,　　.ndo_start_xmit = fake_eth_xmit,　　.ndo_validate_addr = eth_validate_addr,};static const struct of_device_id fake_eth_dt_ids[] = {　　{ .compatible = "packt,fake-eth", },　　{ /* sentinel */ }};static int fake_eth_probe(struct platform_device *pdev){　　int ret;　　struct eth_struct *priv;　　struct net_device *dummy_ndev;　　priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);　　if (!priv)　　　　return -ENOMEM;　　dummy_ndev = alloc_etherdev(sizeof(struct eth_struct));　　dummy_ndev->if_port = IF_PORT_10BASET;　　dummy_ndev->netdev_ops = &my_netdev_ops;　　/* If needed, dev->ethtool_ops = &fake_ethtool_ops; */　　ret = register_netdev(dummy_ndev);　　if(ret) {　　　　pr_info("dummy net dev: Error %d initializing card ...", ret);　　　　return ret;　　}　　priv->dummy_ndev = dummy_ndev;　　platform_set_drvdata(pdev, priv);　　return 0;}static int fake_eth_remove(struct platform_device *pdev){　　struct eth_struct *priv;　　priv = platform_get_drvdata(pdev);　　pr_info("Cleaning Up the Module\n");　　unregister_netdev(priv->dummy_ndev);　　free_netdev(priv->dummy_ndev);　　return 0;}static struct platform_driver mypdrv = {　　.probe = fake_eth_probe,　　.remove = fake_eth_remove,　　.driver = {　　　　.name = "fake-eth",　　　　.of_match_table = of_match_ptr(fake_eth_dt_ids),　　　　.owner = THIS_MODULE,　　},};module_platform_driver(mypdrv);MODULE_LICENSE("GPL");MODULE_DESCRIPTION("Fake Ethernet driver");

一旦加载了模块并且匹配了设备，系统上就会创建一个以太网接口。首先，看看dmesg命令显示了什么:

# dmesg[...][146698.060074] fake eth device initialized[146698.087297] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

如果执行ifconfig -a命令，此网卡接口将显示在屏幕上:

# ifconfig -a[...]eth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00BROADCAST MULTICAST MTU:1500 Metric:1RX packets:0 errors:0 dropped:0 overruns:0 frame:0TX packets:0 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

最后，你可以配置接口，分配IP地址，以便使用ifconfig显示:

# ifconfig eth0 192.168.1.45# ifconfig[...]eth0 Link encap:Ethernet HWaddr 00:00:00:00:00:00inet addr:192.168.1.45 Bcast:192.168.1.255 Mask:255.255.255.0BROADCAST MULTICAST MTU:1500 Metric:1RX packets:0 errors:0 dropped:0 overruns:0 frame:0TX packets:0 errors:0 dropped:0 overruns:0 carrier:0collisions:0 txqueuelen:1000RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

状态和控制

设备控制指的是内核需要主动改变接口的属性，或者响应用户的操作。然后，它可以使用通过struct net_device_ops结构公开的操作(如上所述)，也可以使用另一个控制工具ethtool，这需要驱动程序引入一组新的钩子，我们将在下面讨论。相反，状态报告接口的状态。

中断处理程序

到目前为止，我们只处理了两种不同的中断:当新数据包到达时，以及当传出数据包的传输完成时。但是现在硬件接口变得越来越智能，并且能够报告它们的状态，无论是出于健全的目的，还是出于数据传输的目的。这样，网络接口还可以产生信号错误中断、链路状态变化等。它们都应该在中断处理程序中处理。

这是我们的hwirq处理程序的样子:

static irqreturn_t my_netdev_irq(int irq, void *dev_id){　　struct priv_net_struct *priv = dev_id;　　/*　　 * Can"t do anything in interrupt context because we need to　　 * block (spi_sync() is blocking) so fire of the interrupt 　　* handling workqueue. 　　* Remember, we access our netdev registers through SPI bus 　　* via spi_sync() call. 　　*/　　schedule_work(&priv->irq_work);　　　　return IRQ_HANDLED;}

因为我们的设备位于SPI总线上，所以所有内容都被延迟到work_struct中，它的定义如下:

static void my_netdev_irq_work_handler(struct work_struct *work){　　struct priv_net_struct *priv = container_of(work, struct priv_net_struct, irq_work);　　struct net_device *ndev = priv->netdev;　　int intflags, loop;　　/* disable further interrupts */　　my_netdev_reg_bitclear(priv, EIE, EIE_INTIE);　　do {　　　　loop = 0;　　　　intflags = my_netdev_regb_read(priv, EIR);　　　　/* DMA interrupt handler (not currently used) */　　　　if ((intflags & EIR_DMAIF) != 0) {　　　　　　loop++;　　　　　　handle_dma_complete();　　　　　　clear_dma_interrupt_flag();　　　　}　　　　　　　　/* LINK changed handler */　　　　if ((intflags & EIR_LINKIF) != 0) {　　　　　　loop++;　　　　　　my_netdev_check_link_status(ndev);　　　　　　clear_link_interrupt_flag();　　　　}　　　　/* TX complete handler */　　　　if ((intflags & EIR_TXIF) != 0) {　　　　　　bool err = false;　　　　　　loop++;　　　　　　priv->tx_retry_count = 0;　　　　　　if (locked_regb_read(priv, ESTAT) & ESTAT_TXABRT)　　　　　　　　clear_tx_interrupt_flag();　　　　}　　　　/* TX Error handler */　　　　if ((intflags & EIR_TXERIF) != 0) {　　　　　　loop++;　　　　　　/*　　　　　　 * Reset TX logic by setting/clearing appropriate　　　　　　 * bit in the right register　　　　　　 */　　　　　　[...]　　　　　　　　/* Transmit Late collision check for retransmit */　　　　　　if (my_netdev_cpllision_bit_set())　　　　　　　　/* Handlecollision */　　　　　　　　[...]　　　　}　　　　/* RX Error handler */　　　　if ((intflags & EIR_RXERIF) != 0) {　　　　　　loop++;　　　　　　/* Check free FIFO space to flag RX overrun */　　　　　　[...]　　　　}　　　　/* RX handler */　　　　if (my_rx_interrupt(ndev))　　　　　　loop++;　　} while (loop);　　/* re-enable interrupts */　　my_netdev_reg_bitset(priv, EIE, EIE_INTIE);}

Ethtool支持

Ethtool是一个用于检查和调优基于以太网的网络接口设置的小工具。使用ethtool，可以控制各种参数，例如:

速度
媒体类型
双工操作
获取/设置EEPROM寄存器内容
硬件checksum
Wake-on-LAN

需要ethtool支持的驱动程序应该包括。它依赖于struct ethtool_ops结构，这是该特性的核心，并包含一组用于ethtool操作支持的方法。这些方法大多相对简单；详见include/linux/ethtool.h。

为了使ethtool支持完全成为驱动程序的一部分，驱动程序应该填写一个ethtool_ops结构，并将其分配给struct net_device结构的.ethtool_ops字段:

my_netdev->ethtool_ops = &my_ethtool_ops;

宏SET_ETHTOOL_OPS也可用于此目的。请注意，即使在接口关闭时，ethtool方法也可以被调用。

例如，以下驱动实现了ethtool支持:

drivers/net/ethernet/microchip/enc28j60.c
drivers/net/ethernet/freescale/fec.c
drivers/net/usb/rtl8150.c

驱动方法

驱动方法是probe()和remove()函数。它们负责向内核注册和取消注册网络设备。驱动程序必须通过struct net_device结构体的设备方法向内核提供它的功能。以下是可以在网口上执行的操作:

static const struct net_device_ops my_netdev_ops = {　　.ndo_open = my_netdev_open,　　.ndo_stop = my_netdev_close,　　.ndo_start_xmit = my_netdev_start_xmit,　　.ndo_set_rx_mode = my_netdev_set_multicast_list,　　.ndo_set_mac_address = my_netdev_set_mac_address,　　.ndo_tx_timeout = my_netdev_tx_timeout,　　.ndo_change_mtu = eth_change_mtu,　　.ndo_validate_addr = eth_validate_addr,};

以上操作是大多数驱动实现的操作。

探测函数

探测功能非常基本，只需要执行设备的早期init，然后向内核注册我们的网络设备。

换句话说，探测函数必须:

使用alloc_etherdev()函数(由netdev_priv()帮助)分配网络设备及其私有数据。
初始化私有数据字段(互斥锁、自旋锁、work_queue等)。如果设备位于访问功能可能处于休眠状态(例如SPI)的总线上，则应该使用工作队列(和互斥)。在这种情况下，hwirq只需要确认内核代码，并调度将在设备上执行操作的工作。另一种解决方案是使用线程化irq。如果设备是MMIO，您可以使用自旋锁来保护临界区并摆脱工作队列。
初始化特定于总线的参数和功能(SPI、USB、PCI等)。
请求和映射资源(I/O内存、DMA通道和IRQ)。
如果需要，可以随机生成一个MAC地址分配给设备。
填写强制的(或有用的)netdev属性:if_port、irq、netdev_ops、ethtool_ops，等等。
将设备置于低功耗状态(open()函数将其从该模式中移除)。
最后，在设备上调用register_netdev()。

使用SPI网络设备，探测函数看起来像这样:

static int my_netdev_probe(struct spi_device *spi){　　struct net_device *dev;　　struct priv_net_struct *priv;　　int ret = 0;　　/* Allocate network interface */　　dev = alloc_etherdev(sizeof(struct priv_net_struct));　　if (!dev)　　　　[...] /* handle -ENOMEM error */　　/* Private data */　　priv = netdev_priv(dev);　　/* set private data and bus-specific parameter */　　[...]　　/* Initialize some works */　　INIT_WORK(&priv->tx_work, data_tx_work_handler);　　[...]　　/* Devicerealy init, only few things */　　if (!my_netdev_chipset_init(dev))　　　　[...] /* handle -EIO error */　　/* Generate and assign random MAC address to the device */　　eth_hw_addr_random(dev);　　my_netdev_set_hw_macaddr(dev);　　/* Board setup must set the relevant edge trigger type;　　 * level triggers won"t currently work.　　 */　　ret = request_irq(spi->irq, my_netdev_irq, 0, DRV_NAME, priv);　　if (ret < 0)　　　　[...]; /* Handle irq request failure */　　/* Fill some netdev mandatory or useful properties */　　dev->if_port = IF_PORT_10BASET;　　dev->irq = spi->irq;　　dev->netdev_ops = &my_netdev_ops;　　dev->ethtool_ops = &my_ethtool_ops;　　/* Put device into sleep mode */　　My_netdev_lowpower(priv, true);　　/* Register our device with the kernel */　　if (register_netdev(dev))　　　　[...]; /* Handle registration failure error */　　dev_info(&dev->dev, DRV_NAME " driver registered\n");　　return 0;}

本文参考的是Microchip公司的enc28j60驱动程序。可以在drivers/net/ethernet/microchip/enc28j60.c中查看它的代码。

register_netdev()函数接受一个完整的结构net_device对象，并将其添加到内核接口中；成功时返回0，失败时返回负错误码。structnet_device对象应该存储在总线设备结构中，以便以后可以访问它。也就是说，如果你的网络设备是全局私有结构的一部分，那么你就应该注册这个结构。

注意：重复的设备名称可能会导致注册失败。

模块卸载

这是一个cleanup函数，它依赖于两个函数。我们的驱动release函数应该是这样的:

static int my_netdev_remove(struct spi_device *spi){　　struct priv_net_struct *priv = spi_get_drvdata(spi);　　unregister_netdev(priv->netdev);　　free_irq(spi->irq, priv);　　free_netdev(priv->netdev);　　return 0;}

unregister_netdev()函数从系统中移除接口，内核不能再调用它的方法； free_netdev()释放struct net_device结构本身使用的内存，以及为私有数据分配的内存，以及与网络设备相关的任何内部分配的内存。请注意，永远不要自己释放netdev->priv。

本文解释了编写NIC(Network Interface Card)设备驱动程序所需的一切。虽然本文依赖于位于SPI总线上的网络接口，但USB或PCI网络接口的原理是相同的。读完此文，NIC驱动程序对你来说应该不再陌生了。

关键词：